If the meta operation did not change descriptor state then we should keep it,
not reset it.
Fixes:
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We are treating them as a special case of texture, so the commit is
mostly about integrating them with the existing
SAMPLER/SAMPLER_IMAGE/COMBINED_IMAGE_SAMPLER infrastructure.
This commit doesn't use in any special way the render pass
information, including the dependencies, so it is possible that we
would need to do something else. But this commit gets several CTS
tests, and two Sascha Willem Vulkan demos, so let's start with this
commit and handle any other use case for following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were using nir->data.num_textures to fill the default values
for the textures used on the shader, and set the value for the number
of textures used.
But nir->data.num_textures doesn't take into account input
attachments, even after nir_lower_input_attachments. Although that
could make sense from a general pov, in our case we are treating input
attachments mostly as textures.
This commit count the number of textures interating through the
pipeline combined index map, as it includes both. This also makes the
populate of the shader key for default values more similar to the one
done at cmd_buffer with real values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were not asking the driver for the sampler state if we could
just use the default P1 values. But even if we need to fill P1 (for
example to fill up the output type of the format), if the texture
operation doesn't need a sampler, we can let that field as NULL (so
default values) and avoid calling back the driver for a sampler.
This is not mandatory for OpenGL (as we always have a sampler object),
although still a good to have. For Vulkan this is needed, as we don't
have a sampler object in that case.
v2: reword comment (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When rasterization is disabled there are a number of CreateInfo
structs that should be ignored. We were managing this correctly
for some cases, but not all of them. Specifically, viewport state
must be ignored and we weren't doing that.
Fixes:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Only free the underlying BO when the exported memory object is freed
to avoid multiple frees of the same memory.
The only exception is winsys BOs where we import a BO created in the
display device into the render device. In this case, we only have one
memory object referencing the BO and we want to destroy it with that
memory object.
Fixes:
dEQP-VK.api.external.memory.dma_buf.*
dEQP-VK.api.external.memory.opaque_fd.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware can't do this, so we need to record a CPU job that will
map the indirect buffer at queue submission time, read the dispatch
parameters and then submit a regular dispatch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
New jobs need to re-emit all state. Typically, this is achieved
by resetting all dirty state flags when we start a new job, but
for index buffers we were not using a dirty bit because we always
emit them immediately. This patch adds the bit and only tries
to skip index buffer state if the bit is not dirty, which will
ensure that we will always emit it for new jobs.
This fixes a regression in the shadowmapping demo from Sascha Willems
introduced with "v3dv: try harder to skip emission of redundant state".
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Focused specially on the cache, how many BOs and how much bo_size is
store on the cache, freed bos, bos moved to cache etc.
Initially not configured with V3D_DEBUG (like v3d) to avoid a runtime
check on most of v3dv_bo functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
V3DV_MAX_BO_CACHE_SIZE can be used to configure it.
So one way to disable the bo cache is setting V3DV_MAX_BO_CACHE_SIZE
to zero. This would still run all the bo_cache size checks, but having
another envvar just to ensure that anything related to the cache is
used seemed like an overkill.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on the already existing for the v3d OpenGL driver, but
without references, and with some extra OOM checks (Vulkan CTS has
several OOM tests).
With this commit v3dv_bo_alloc and v3dv_bo_free became frontends to
the bo_cache. The former tries to get a BO from the cache if possible,
and the latter stores the BO on the cache if possible. The former also
adds a new parameter to point if the BO to allocate is private.
As v3d we are only caching private BOs, those created by the driver
for internal use (like CLs, tile_alloc, etc). They are the ones with
the highest change of being reused (for example, CL BOs are always
4KB, so they can always be reused). User-created BOs can have any
size, including some very large ones for buffers and images, which
makes them far less likely to be reused and would add a lot of memory
pressure if we decided to cache them.
In any case, in practice, we found that we could get a performance
improvement by caching also user-created BOs, but that would need more
care and an analysis to decide which ones makes sense. Would also
require to change how the cached BOs are stored by size. Right now
there are an array of list_head, that doesn't work well with big
BOs. If done, that would be handled on a separate commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Both the API user and the driver may attempt to map a BO, possibly
only partially and using different ranges. This is a problem because
we only have a single map per BO. Fix this by making sure that when
a BO is mapped, we always map its entire range. This way if a BO
has been mapped before, we know that map is still valid no matter the
region we need to access now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The blit shader path for buffer to image copies is pretty bad,
since it needs to produce a tiled image from the linear buffer
prior to emitting the blit copy.
This patch adds a new preferential path where we implement the
copy using the CPU, similar to what the GL driver does for
texture uploads. This makes vkQuake2 at least 4x faster when
dynamic lights are enabled (which triggers dynamic texture
updates).
We also tested a GPU path where we use a shader that takes the
linear buffer as a UBO and copies directly from it. This also
shows a clear performance gain, but still worse than the CPU
implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had done all the plumbing for this but EZ can be disabled in 3 places
and we were never setting the enable bit in the configuration bits packet.
Also, configuration bits must not enable EZ if this has been disabled in
the RCL for the whole frame, which we do if we don't have a depth
attachment at all.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Otherwise cloned BO lists point to the original list objects and not
the cloned ones, and that will confuse anything that tries to iterate over
them, such as list_length(), leading to infinite loops.
Fixes (in debug mode):
dEQP-VK.api.command_buffers.render_pass_continue
In that test we clone a full CL job from a secondary, and without this,
the BO lists in its CL lists will point to the bo_list field in the
original job, leading to an infinite loop as we assert the expected size
of these lists at queue submit time in handle_cl_job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asserting on them makes it easier to identify applications and tests that
try to use unimplemented features.
Also, there are some APIs that relate to optional features we don't
(or can't) support, such as sparse, so for these we just provide
the trivial implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This allows us to remove some individual bos for the image and
sampler, used to store the SAMPLER_STATE and TEXTURE_SHADER_STATE. Now
they are prepacked on static memory as part of the vulkan object
struct.
This commit introduces small descriptor structs, used to define what
the bo subregion would contain. It is used mostly to compute offsets
to that specific data, and define the size needed. Having said so, it
would be possible to replace them with some kind of flag (like anv) or
just compute the offset based on the context.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were saving all the descriptor info on the host memory. With
this commit we do the equivalent that other mesa vulkan drivers (Anvil
and Turnip) and create a bo on the descriptor pool that would be
suballocated for each descriptor.
This would allow to clean up individual bos from some vulkan objects,
reducing device memory fragmentation, and allowing to avoid to alloc
bos for that info. After all, pre-allocating needed memory is one of
the purposes of the descriptor pool.
This commit introduces all the infrastructure, but doesn't use it for
any descriptor yet, as if no descriptor needed data uploaded to a bo.
The idea to decide which info goes to the descriptor pool bo is info
that we would need to upload to a bo in any case, as it is referenced
as an address by any packet.
We could be more aggressive with that general rule, but that would be
enough for now. If in the future we support
VK_EXT_descriptor_indexing, we probably would need to store more info,
as under that extension, descriptors can be updated after being bound.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were assuming that if the command buffer state doesn't have any
attachments (as per the attachment count) the attachment state array
should not be allocated, however, during meta operations it is
possible that the attachment state grows (since meta operations can
emit render passes of their own). In that case, we would grow the
state for the meta operation but then pop the previous attachment
count and we would leak the state.
An example of that is a secondary command buffer which has no
attachment state by default since it doesn't execute a render pass
begin, but that executes one in a meta operation (for
vkCmdClearAttachments for example).
Fix this by making the attachment count an allocation count instead
and not popping it once we finish a meta operation. Also, always free
the state so long as there is a valid pointer, and assert that the
allocated count is not zero in that case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The main change we are introducing here is that now we allow secondary
command buffers that execute in a render pass to have a job list with
more than one job.
The main issue with vkCmdClearAttachments is that we currently need
this to spawn multiple jobs to clear multilayered framebuffers, as we
need to setup a different 2D framebuffer for each layer to clear and
therefore emit a different RCL for each. We could avoid this
completely by used layered rendering with the "clear rect" path to
redirect the clear rects to appropriate layers of the primary
framebuffer, however, our hardware only supports layered rendering
with geometry shaders, which we don't support at present.
Because vkCmdClearAttachments relies on having framebuffer state
available (something we would not need if we used the geometry shader
implementation), if this is not available in the secondary we need to
postpone emission of the command until the secondary is executed
inside a primary. We do this by using a new CPU job
V3DV_JOB_TYPE_CPU_CLEAR_ATTACHMENTS that is processed during
vkCmdExecuteCommands by calling vkCmdClearAttachments directly in the
primary.
As a consequence of these changes, it is now possible that a secondary
command buffer that runs inside a render pass have any kind of job in
its job list, including partial CLs that need to be branched to and
full CLs that need to be submitted to the GPU as is, so we introduced
a new GPU job type V3DV_JOB_TYPE_GPU_CL_SECONDARY to identify partial
CLs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Event waits can be safely moved before a render pass start, since
event setting and resetting commands cannot happen inside one. We
don't need to go that far, but we can use this to record the wait
in its own separate job and then execute this job before the
binning commands recorded in the secondary command buffer when
we execute the secondary into a primary.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.
For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.
For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.
This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments
Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We should always do this. So far we have been getting away with this
because we overallocate at v3dv_job_start_frame, but that won't do
for secondary command buffers for example, it is also unreliable
if CLs grow past that initial allocation.
In the future, we might want to fix our emit macros so they do the
allocation check implicitly, which would simplify the code and would
make this process a lot less error prone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We are currently able to clear any supported format using the TLB, but
this is more consistent with other parts of the code and is what we want
should we add any formats in the future where we can't get away with
TLB clears.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
During a command buffer reset we call cmd_buffer_init(), which will
add the command buffer to the pool, so make sure we remove it first
and that we use a safe iterator when resetting a pool.
Fixes:
dEQP-VK.api.command_buffers.pool_reset_reuse
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
That it is justconvert VkSpecializationInfo to
nir_spirv_specialization and pass it to spirv_to_nir.
The code is also basically the same used by anv, tu, and radv
Eventually it would make sense to move it to a common place.
Note that we are using calloc there to allocate the temporary
spec_entries. Trying to use vk_alloc2 causes some problems with the
nir_validate.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In order to properly check (and possibly compile) shader variants we
need a pipeline and a compatible descriptor set. So far we were trying
to do that check as early as possible, so we were trying to do it at
CmdBindPipeline or CmdBindDescriptorSets, and a combination of dirty
flags. This showed to not cover all the corners cases, and made the
code complex, as needed to handle cases where the descriptors were not
yet available, and return early. The latter also meant that we were
running several checks that failed in the middle.
This commit moves the variant check to CmdDraw, when we should have a
pipeline and compatible descriptor sets, and simplifies and makes more
strict the existing code.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>