In this case the hardware seems to copy the bits that actually fit
in the destination instead of clamping to the maximum value allowed
by the bit size of the destination components like Vulkan expects.
Fix this by adding code to clamp the color results to the bit size
of the destination components.
It should be noted that this is a general issue with the hardware,
and while we can fix it here for blits done by the driver, user
shaders writing outside the range of the destination bit size will
have the same issue and we probably don't want to add code to clamp
every single render target write in every shader with integer format.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The pipeline stages have a reference to the NIR code produced from
the SPIR-V shader modules, but they never destroy it.
It should also be noted that our coordinate shader stage was sharing
the NIR with the vertex shader stage, which is kind of tricky to handle
and probably very error prone. Just make sure each pipeline stage has
owns it NIR shader and that we always free it when the stage is
destroyed.
Also, for the case of NIR modules created by the driver internally,
we always need to clone them, since the driver will destroy the NIR
as soon as it is done creating pipelines with it. We could also not
clone it and let the pipeline stage take ownership of the NIR code for
NIR modules, but that would be inconsistent with how ownership works for
SPIR-V modules.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This needs to be updated everytime we bind a new pipeline, but we can
bind a pipeline and not have an actual job yet, so we want to postpone
this until we actually need to emit CFG_BITS, during the pre-draw
setup.
Also, rename the update helper to be about the job rather the command
buffer, since it is updating state that we track per job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were asserting that we had a valid subpass index, but we can have
meta operations that run outside a render pass, such as for blitting.
If we allow this, then we also need to account for the fact that
pipelines can be bound outside a render pass too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While very limited in scope, this might be the most efficient way to blit
when applicable. In fact, we might also want to use this for the image copy
commands when possible instead of the TLB.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is similar to the scenario where we have a submit without
any command buffers, even if we don't have any actual GPU work to do
we still might need to signal fences/semaphored and possibly wait on
previous jobs to finish, so we need to submit something to the kernel
to get all that done right.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.
The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.
Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Most of our state doesn't carry over across jobs, so it needs to be re-emitted.
For example, if we have two render passes running back to back using the
same pipeline, the application could decide to only bind the vertex buffer
or/and the pipeline just once, but as soon as we record the second render
pass and create a new job for it we will need to re-emit the shader state
record for it just because it is a new job.
We could probably only do this for jobs inside a render pass, since those
are the only ones that actually draw something and need to care about
dirty state, however, there is no harm in doing this for all jobs, for the
same reason.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were enabling VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT for
any format valid for texturing, but for example, right now we don't
support linear filtering on any depth format.
This is needed to get some hundreds of tests like this:
dEQP-VK.pipeline.sampler.view_type.1d.format.r32g32_sfloat.mag_filter.linear
properly skipped (those were all Crashes with the simulator, and
almost all Fails with the real device).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When testing if we could merge the new subpass into the previous one
We were taking the subpass index from the state (which isn't updated
to the new subpass until a bit later when the job for the new subpass
has been settled). This means that we were doing the merge checks with
the previous subpass, not the current one.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some texture operations (like mipmap query levels) that
doesn't require a sampler. In fact, you should ignore it. So we need
to take it into account when combining the
indexes. nir_tex_instr_src_index is returning a negative value to
identify that case, but as we are using a uint32_t to pack both values
(for convenience, easy to pack/unpack the hash table key), we just use
a uint value big enough to be a wrong sampler id.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This requires that we emit a specific draw command and that we emit
the base instance if not zero right before the instanced draw call.
Notice that we were already doing this for instanced indexed draw
calls, so here we are only adding this for non-indexed draw calls.
We also need to flag whether the vertex shader reads the base instance
in the shader record (which it will if it reads uses gl_InstanceIndex,
as that is lowered in Vulkan to base_instance + instance_id).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
OpenGL doesn't have the concept of individual texture and sampler, so
texture and sampler indexes have the same value. v3d compiler uses
this assumption, so for example, the texture info at the v3d key
include values that you need to use the texture format and the sampler
to fill (like the return_size).
One option would be to adapt the v3d compiler to handle both, but then
we would need to adapt to the lowerings it uses, like nir_lower_tex,
that also take the same assumption.
We deal with this on the Vulkan driver, by reassigning the texture and
sampler index to a combined one. We add a hash table to map the
combined texture idx and sampler idx to this combined idx, and a
simple array to the opposite map. On the driver we work with the
separate indices to fill up the data, while the v3d compiler works
with the combined one.
As mentioned, this is needed to properly fill up the texture return
size, so as we are here, we fix that. This gets tests like the
following working:
dEQP-VK.glsl.texture_gather.basic.2d.depth32f.base_level.level_2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead.
Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.{40,48}
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead, which we were already computing
but that we were not actually using for indexing by mistake.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We must update our check for whether the render area is tile-aligned for
each subpass, since the hardware will update tile sizes for each RCL.
Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.8
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While this was already being achieved by the scissort rect set on the
pipeline, we still want to limit the render area to we reduce the tile
coverage of the pass as much as possible and avoid unnecessar
tile load and store operations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is the same as the subpass start version, only that it won't
emit subpass clears. This is necessary when resuming a subpass
from a partial clear to make sure we don't try to clear subpass
attachments again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since a meta partial clear starts a new render pass, we need to store
all state that can be changed with vkCmdBeginRenderPass.
Also, since the meta clear pipeline sets dynamic state, we also
have to restore that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
They are bound at the set layout, and cannot be changed. From
VkDescriptorSetLayoutBinding spec:
"pImmutableSamplers affects initialization of samplers. If
descriptorType specifies a VK_DESCRIPTOR_TYPE_SAMPLER or
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER type descriptor, then
pImmutableSamplers can be used to initialize a set of immutable
samplers. Immutable samplers are permanently bound into the set
layout and must not be changed; updating a
VK_DESCRIPTOR_TYPE_SAMPLER descriptor with immutable samplers is
not allowed and updates to a
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor with immutable
samplers does not modify the samplers (the image views are updated,
but the sampler updates are ignored)"
We stored them as part of the set layout. It also means that when we
need the sampler (like for texture operations) we can't just ask for a
descriptor, as it would not have the sampler. A new method is created.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The problem with this is that TLB clears always clear and store full
tiles, so if our render area is not perfectly aligned to tile boundaries
we end up clearing all pixels in tiles that are only partially covered.
In this scenario we have to avoid using TLB clears and instead fallback
to clearing by rendering a scissored quad in the clear color, like we do
for partial clears in vkCmdClearAttachments.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some scenario where this won't happen and don't imply a bug.
For example, if we find a pipeline barrier, we will finish the current
job automatically and won't start a new one. There may be other
scenarios where we may want to do the same.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
And flag dirty scissor state if the render area is constraining the
current clip window, so that we emit a new clip window with the next
draw call.
Also, remove the early emission of a clip window for the render area
if we didn't have any scissor state. TLB clears ignore the clip
window, so this was doing nothing for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been caching the first pipeline we produced and always
reusing that, which is obviously incorrect.
This change implements a proper cache and also takes care of releasing
the cached resources when the device is destroyed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is achieved by rendering a quad in the clear color for each layer
of each attachment being cleared. Right now we emit each clear in a
separate job with a single attachment framebuffer, but in the future
we may be able to extend the solution to using multiple render targets
and clear multiple attachments with a single job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
According to the Vulkan 1.0 spec:
"attachmentCount is the number of VkPipelineColorBlendAttachmentState
elements in pAttachments. This value must equal the colorAttachmentCount
for the subpass in which this pipeline is used."
so let's assert exactly that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For arrays we are adding one entry on the map per array element. This
makes getting back the descriptor for each array element easier, as
for example, for ubo arrays, each array element can be bound to a
different descriptor buffer.
For samplers arrays this would also make sense.
Fixes crashes on tests like:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.combined_image_sampler_mutable.fragment.descriptor_array.2d
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Right now shader variant update on the cmd_buffer is based on populate
a new key using the descriptor bounds, assuming that we would get one
final descriptor for any usage on the shader. But if the descriptors
are being bound with more that one call to CmdBindDescriptorSet, that
would not be true, as the first calls would not bind all the
descriptors. Right now this was raising an assert.
Now we allow that as possible, and for the case of checking variants,
we just stop it, and we don't clean up the SHADER_VARIANT flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Specially after CmdBindDescriptorSets, it is likely that we would need
a new shader variant, like for example if sampler descriptor sets are
bound.
At that moment a new v3d key is populated, using as base the one used
at pipeline creation, so only cmd_buffer depending values are changed.
Then a new variant is requested. Note that internally it is handled
with a cache, so no new compilation will be done if not needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far, we were doing the compilation to qpu when the pipeline was
created (as part of vkCreateGraphicsPipeline).
But this would not be correct when some specific descriptors are
involved, like textures. For that case some nir lowerings depend on
the texture format, and that info is not available until the specific
descriptors are bound to the command buffer. In the same way, the same
command buffer with a given pipeline could get their descriptor bound
again.
So it would be needed to support compilation variants of the same
shader. So finally, the v3d_key would work as keys, as the variants
would be tracked with a hash table.
This commit introduces the new structures for that. What we were
building as the final qpu shader would become the initial default
variant for the pipeline. We are also saving the keys used at that
point, to avoid needing to fully regenerate them when a new variant is
created. Not just for performance, but also to avoid needing to track
the graphics pipeline create info structure.
The code to handle updating the current variant would be done on
following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This include SAMPLER, COMBINED_IMAGE_SAMPLER and SAMPLED_IMAGE
descriptors.
In order to support them we do the pre-packing of TEXTURE_SHADER_STATE
and SAMPLER_STATE when Images and Samplers (respectively) are
created. Those packets doesn't need to be tweaked later, so we upload
them to an bo.
A possible improvement of this would be that the descriptor pool
manages a bo for all descriptors, that suballocate for each descriptor
allocated. This is what other drivers do (and as far as I understand,
one of the reasons of having a descriptor pool).
Immutable samplers are not supported, will be handled on a following
patch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
v3dv_descriptor is going to be expanded with more data, so it doesn't
make sense anymore to handle a fake descriptor for the push
constants. Introducing a new struct, that is just a pair
bo/offset. Initially named v3dv_resource, as it could be the base to
reuse bos for different resources (like assembly bo)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were configuring the TLB to use ABGR1555, but that doesn't really
give us what we want. There were two issues:
* We were using the wrong Texture Data Format and Output Image
Format. In fact those we need to use were not included on the
packet file.
* Even using the correct one, we need to do a RB swap to match the
semantics of the Vulkan format.
This patch fixes both issues. As we are here we keep the formats we
were already used, that would provide support for r5g5b5a1.
So this patch makes tests like the following going from skip to pass:
dEQP-VK.texture.filtering.2d.formats.r5g5b5a1_unorm.nearest
And the following test from fail to pass:
dEQP-VK.texture.filtering.2d.formats.a1r5g5b5_unorm.nearest
Note that the R5G5B5A1_UNORM_PACK16 is not mandatory, but as we
already made the effort to understand them and get them working let's
just keep it on the list
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This fixes multi-layer vkCmdClearAttachments CTS tests. The underlying
problem here is that even though this command runs inside a render pass,
it is implemented as a separate job that emits its own RCL to program
render target color clears, so we should not emit the subpass RCL for it.
Fixes 250+ CTS tests (all but a1r5g5b5) in:
dEQP-VK.api.image_clearing.core.clear_color_attachment.cube_layers.*
dEQP-VK.api.image_clearing.core.clear_color_attachment.multiple_layers.*
dEQP-VK.api.image_clearing.core.clear_depth_stencil_attachment.multiple_layers.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had changed the interface for job starts so they take the subpass index
rather than a boolean indicating whether the job starts a new subpas, but we
forgot to update this accordingly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not considering that the depth of the image is minified according
to its miplevel. For some reason this only seemed to show up for tiled
images.
Fixes (except a1r5g5b5 format):
dEQP-VK.api.image_clearing.core.clear_color_image.3d.optimal.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If a format is not supported by the TLB, we can still use the TLB path
if we setup the render target using a compatible format. The only caveat
is that for clears we need to pack the clear value using the original
format of the underlying image, not the compatible format.
With this change we get to use the TLB path successfully for all supported
image formats (except a1r5g5b5, at least for now) so long as the region starts
at (0,0), and we only need to consider fallback paths for partial copies
and clears, not because of the format.
This gets us to pass a few extra hundreds of tests in:
dEQP-VK.api.image_clearing.core.clear_color_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is required to pass line rasterization tests in CTS while exposing
at least 4 bits of subpixel precision, which is the minimum required
by the spec. We are currently exposing 6 bits, however, if we select
diamond exit instead of perp end caps rasterization, then even if we
lower subpixel precision bits to 4 bits, we'd still fail one of the tests.
Fixes:
dEQP-VK.rasterization.flatshading.line_strip
dEQP-VK.rasterization.flatshading.lines
dEQP-VK.rasterization.interpolation.basic.line_strip
dEQP-VK.rasterization.interpolation.basic.lines
dEQP-VK.rasterization.interpolation.projected.line_strip
dEQP-VK.rasterization.interpolation.projected.lines
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The name suggests that this method emits the full graphics pipeline,
but that is not the case (ie: scissor is emitted at a different
point).
Right now that method is mostly emitting the gl_shader state plus some
other packets. So we just renamed it to emit_gl_shader_state, and move
the other packet emission to new emission methods.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
First this makes it so we only clear dirty stencil state if we actually
emit the stencil packets. Second, now we check if we need to emit stencil
whenever a new pipeline is bound, since a new pipeline may not change the
dynamic stencil state but might still be changing other aspects of stencil,
which means that even if the dynamic stencil state is not dirty, we might
still need to emit new stencil packets.
This fixes a regression in VkRunner test depth-buffer.vk_shader_test after
we dropped the redundant emission of stencil state, since that redundant
emission was happening unconditionally whenever we had a new pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The current implementation assumed that we would clear all dirty state
after we have emitted a pipeline, but that is not always true. In
particular, we don't emit blend constants unless we need them, so we
can't clear its dirty bit until we have bound a pipeline that actually
requires them.
The change implemented here has individual emit functions clear pipeline
states they hadle as they emit the corresponding state and we clear
the dirty pipeline bit at the end.
This fixes some CTS pipeline blend tests where we have multiple draws
with blending and only some of them require blend constants. In this case,
the original behavior would clear the blend constants dirty bit on draw
calls that don't actually emit blend constants (because they don't need
them), making the later draw calls that do need them fail because they
don't emit them either (since the previous draw calls cleared the dirty
bit).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Not quite sure why this is required though. Conversion from/to
sRGB happens on tile loads and stores, with the tile buffer
being always linear, so there should be no difference.
Fixes all test failures in:
dEQP-VK.pipeline.blend.format.r8g8b8a8_srgb.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In this mode, which can be activated with V3D_DEBUG=always_flush like
in the GL driver, we flush every draw call separately. For now this
is useful for debugging, but we can also set the flag internally on
specific jobs when we identify scenarios where we need the same behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is valid to submit with an empty list ofcommand buffers, however,
we still need to wait on the pWaitSemaphores provided and only signal
the pSignalSemaphores and fence once we have finished waiting on them
to honor the semantics of the submission.
Because waiting and signaling happens in the kernel, the easiest way
to do this is to submit a trivial no-op job to the GPU. To do this,
we need to refactor some of our code so that code that might have been
operating on a command buffer starts operating on a job instead, so we
can resuse most of our infrastructure to create the no-op job.
Additionally, because no-op jobs are created internally by the driver,
we are responsible for destroying them too. For this, we bind a fence
to each no-op job we submit and we test for completion of in-flight
no-op jobs (and destory them if completed) every time vkQueueSubmit
is called.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are a handful of tests that simulate 'out of memory' situations
during swapchain image creation, and these can lead to failed job
allocations when the driver is running on the prime blit path, as that
involves creating a command buffer. The tests expect us to handle this
scenario gracefully and return an appropriate OOM error as a result.
This make sure we don't try to dereference a job if we failed to allocate
it so we don't crash and can return the OOM error gracefully in the
process.
Fixes:
dEQP-VK.wsi.xlib.swapchain.simulate_oom.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Vulkan has Z NDC range in [0, 1], we where using OpenGL's [-1, 1].
Fixes:
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_deltasmall
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_deltaone
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In this scenario we can end up generating a clip window where
the max coordinates are smaller than the min coordinates and the simulator
asserts.
Fixes:
dEQP-VK.draw.scissor.dynamic_scissor_outside_viewport
dEQP-VK.draw.scissor.static_scissor_outside_viewport
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not doing this right for images created with VK_IMAGE_TILING_LINEAR.
Also, only assign a DRM modifier if the image has been created for WSI.
This fixes a bunch of CTS tests that use copies to linear images to verify
the result of rendering.
Fixes multiple failures in:
dEQP-VK.draw.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This triggers when dumping CLIF because the dump process involves
internally mapping all the BOs. We could unmap them there after we
are done, but there is really no reason why we need to assert on this,
so let's just keep things simple and unmap. If the user is really
double mapping, that should be caught by the validation layers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is possible to specify a depth/stencil clear but then have no
actual depth/stencil attachment used in the subpass. In that case
we are already skipping the store.
Fixes:
dEQP-VK.renderpass.suballocation.unused_clear_attachments.*depthonly_unused
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason our backend compiler doesn't have an implementation for
usubborrow, only for uaddcarry. We could add it, however, the existing
uaddcarry implementation also seems to fail some of the CTS tests,
which pass if we lower.
Fixes:
dEQP-VK.glsl.builtin.function.integer.uaddcarry.*
dEQP-VK.glsl.builtin.function.integer.usubborrow.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
They still need some review to get some real final values, but what we
had before were somewhat too low. Increasing them a little. This
allows to get some CTS tests from skip to pass, which afais they are
using reasonable values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This gets us to split matrix vertex inputs with direct access to
vectors, which the backend compiler expects. The GL driver seems to
rely on this too, which is called by the Mesa state tracker.
Cases with indirect cases seem to be handled at link time via
nir_lower_io_arrays_to_elements, which we are already calling.
Fixes crashes in:
dEQP-VK.glsl.conversions.matrix_to_matrix.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Push constant tests were still working without taking this into
account because we can't really fine graine how much we allocate for
them.
For now we only use them to know if we should allocate/fill the ubo
for push constants or not.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Also, Vulkan uses the same provoking vertex rules are Direct3D, which
changes from OpenGL's default. Make sure we honor that.
Fixes:
dEQP-VK.spirv_assembly.instruction.graphics.cross_stage.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had 4 really small functions for the io lowering (which main
purpose is getting locations assigned).
This commit merge them to 2 slightly bigger functions. It also fix a
typo were a vs lowering was calling nir_assign_io_var_locations using
MESA_SHADER_FRAGMENT.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The main reason is getting nir_lower_io_to_temporaries and
nir_opt_peephole_select to get some switch/ifelse with store outputs
simplified out, as some tests were failing because the v3d compiler
was not able to handle them.
As this needed some extra lowerings to get that working, we are also
revamping the full nir processing.
Heavily based on intel preprocess/optimize nir passes.
As mentioned on some other v3dv commits, some of this work could be
done by adding those passes to the v3d compiler, allowing to avoid
some duplication. But at this point we prefer to keep the v3d compiler
untouched as much as possible. This could be revisited on the future.
We also remove some passes that are unnedeed or we already know are
called by v3d_compiler. Although we try to not add too many passes,
we are already adding passes in advance that we think that would be
useful in the near-term.
Among others, gets the following tests working:
dEQP-VK.binding_model.descriptor_copy.graphics.storage*
dEQP-VK.binding_model.descriptor_copy.graphics.uniform*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This was intended to check that we only got
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
on secondary command buffers, but the spec states that this flag
should be ignored for primary command buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were clearing memory to 0 on create and reset, including the
loader data, which is not correct on reset since it would cleat
the loader dispatch table for the command buffer. We should only
clear driver data.
Also, don't use vk_zalloc for the command buffer allocation, since
we are already clearing on reset and we always reset when we begin
recording.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were getting driver location 0 on all output variables, which meant
that our color writes were always being redirected to render target 0.
Fixes dEQP-VK.pipeline.framebuffer_attachment.diff_* which uses multiple
render targets.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This puts all the information required to setup frame tiling into
v3dv_frame_tiling so we no longer need a framebuffer to start a
frame. This makes the code simpler, since frame tiling calculations
happen automatically when we start a new frame and simplifies
the implementation of copy and clear operations that used to
requiere that we setup a fake framebuffer with no actual attachments,
which was a bit of a kludge.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with computing frame tiling information for
the framebuffer object, but that is not correct, since different subpasses
may access different subsets of the framebuffer, with each requiring a
different configuration because the number of render targets and the maximum
bpp can change for each subpass.
This adds a v3dv_frame_tiling struct to keep the frame tiling information and
rewrites the code to compute this for every new job we start.
Fixes a bunch of tests in dEQP-VK.pipeline.render_to_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When we create a new job for a new subpass, we might have to finish
the current job for the previous subpass, so it is important that we
we don't get ahead of ourselves and increment the current subpass index
in the command buffer state until we are really done finishing the
previous job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were storing the format of the base miplevel in the image view and
we were typically using that instead of the taking the format from
the appropriate image slice. This was a problem when loading or storing
a miplevel other than the base which happened to have a different format.
This also removed the tiling field from the image view to avoid repeating
the same mistake in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This corresponds to PIPE_FORMAT_B5G6R5_UNORM, which is the format that
is natively supported. Also, we can't swap R/B on 3-channel images!
Also, we should rely on the v3dv format table for this rather than
pipe format descriptions since we specify the expected correct swizzles
there for all supported formats. This, for example, gets us correct
beahvior for things like VK_FORMAT_B4G4R4A4_UNORM_PACK16 without
needing to special case it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is actually mandatory for any implementation so there is no
point in not supporting it.
This probably doesn't work yet and we might need to patch the
compiler to emit bounds testing code for TMU accesses.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We don't support 4-bit multisample yet, but we will at some point.
Also, remove point size granularity/range since we were not meeting the
minimum requires, we might want to review that in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
I am not quite certain that this is the way to go though. Here, we are
expecting that the GPU can set/reset the event inside a command buffer
as a 1x1 pixel clear for example, however, there is still the question
of how we get to implement the command buffer wait on an event, since
reading the docs I haven't found any such functionality to be available.
We could think of implementing this by splitting the command buffer
into multiple jobs at the wait command, and then using a separate
thread for job submissions that would poll the event UBO before sending
it to the kernel, but that looks like a bit of a kludge.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not doing this and that could lead to the kernel refusing the
job if this happened to be gargabe. Make it zero, meaning that we don't
want to keep our bin jobs waiting for anything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
I accidentally tried to clear D+S of a depth-only image which was not caught
by the validation layers in my environment. This made the simulator crash, but
tracking down the crash to the actual error was not trivial. This should make
it immediately obvious.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
By default they are trivially lowered to load_uniform.
We still need to allocate an UBO for push constants, used for those
that are accessed using a non-const index. This is automatically
handled by the compiler, as it cames back as asking a
QUNIFORM_UBO_ADDR. This is what already does for gallium.
Note that if needing the UBO, we are uploading the full push constant
data. An improvement would be to try to upload only the data that
needs to rely on the UBO (so non-const accesses to uniforms).
Also, the code is not handling getting out of space from the UBO
bo. This would be tackled at a different commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>