Since vkCmdClearAttachments executes inside a render pass, we would
benefit from converting it to a draw within the current subpass job to
improve batching and avoid expensive tile load/store operations.
This can dramatically improve performance for applications using this
command, however, we can only use this if we are clearing the base
layers of framebuffer attachments, since otherwise we would need to
use layered rendering, which we don't support yet.
This improves vkQuake3 performance dramatically (almost 100%
performance improvement at 1080p), which calls this twice per frame.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This gets vkQuake to render correctly, which creates render passes with
stencil load operations even when the depth/stencil attachment format
doesn't have a stencil aspect. While this is a bit weird, it seems to
be allowed by the spec:
"If the format has depth and/or stencil components, loadOp and storeOp
apply only to the depth data, while stencilLoadOp and stencilStoreOp
define how the stencil data is handled."
In our case we were not ignoring it and this was causing that we emitted a
Z buffer load that seemed to clobber the Z clear, preventing all draw calls
from passing the depth test.
While we are at it, also change the depth/stencil store operation (which
was already handling this scenario correctly) to use the format of the
render pass attachment description rather than the underlying image
format.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with finishing the current job in the
presence of a pipeline barrier and relying on the RCL serialization,
but of course this is not always enough.
This patch addresses synchronization across different GPU units
(i.e. draw indirect after compute), as well as cases where we need to
sync before binning.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.barrier.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The order in which we emit the attributes is relevant, since
GL_SHADER_STATE_ATTRIBUTE_RECORD packets don't include an explicit
attribute index. This means that we need to emit them in driver
location order, since the compiler uses that location to compute
attribute offsets in the VPM.
Fixes ~1300 CTS tests in:
dEQP-VK.pipeline.vertex_input.multiple_attributes.out_of_order.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason, CTS expects E5B9G9R9 and B10G11R11 with
transparent black border clamping produce alpha 1 instead of 0.
Since border color takes precedence over the texture state swizzle,
the only way to fix this is to lower the texture swizzle in the shader
to set alpha to 1.
Fixes:
dEQP-VK.pipeline.sampler.view_type.*b10g11r11*clamp_to_border_transparent_black
dEQP-VK.pipeline.sampler.view_type.*e5b9g9r9*.clamp_to_border_transparent_black
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It seems that we only want to set the texture state's depth to the
number of 2D layers divided by 6 when sampling, not wen doing
load/store.
This means that we need to generate two different states and choose
the one to use depending on the descriptor.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In OpenGL, unnormalized coordinates are implicit based on the sampler
type (rectangle textures), so the compiler can set the flag when needed.
In Vulkan, however, this is configured explicitly in the sampler object,
so the compiler won't set it and we need to do it manually when we are
writing the P1 uniform.
Fixes:
dEQP-VK.pipeline.sampler.exact_sampling.*.unnormalized_coords
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Instead of asserting that users don't try to create images that
would require 4GB+ of memory, error out with the corresponding
OOM error when the user tries to actually allocate the memory
for the image.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the meta operation did not change descriptor state then we should keep it,
not reset it.
Fixes:
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Only free the underlying BO when the exported memory object is freed
to avoid multiple frees of the same memory.
The only exception is winsys BOs where we import a BO created in the
display device into the render device. In this case, we only have one
memory object referencing the BO and we want to destroy it with that
memory object.
Fixes:
dEQP-VK.api.external.memory.dma_buf.*
dEQP-VK.api.external.memory.opaque_fd.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware can't do this, so we need to record a CPU job that will
map the indirect buffer at queue submission time, read the dispatch
parameters and then submit a regular dispatch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
New jobs need to re-emit all state. Typically, this is achieved
by resetting all dirty state flags when we start a new job, but
for index buffers we were not using a dirty bit because we always
emit them immediately. This patch adds the bit and only tries
to skip index buffer state if the bit is not dirty, which will
ensure that we will always emit it for new jobs.
This fixes a regression in the shadowmapping demo from Sascha Willems
introduced with "v3dv: try harder to skip emission of redundant state".
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
V3DV_MAX_BO_CACHE_SIZE can be used to configure it.
So one way to disable the bo cache is setting V3DV_MAX_BO_CACHE_SIZE
to zero. This would still run all the bo_cache size checks, but having
another envvar just to ensure that anything related to the cache is
used seemed like an overkill.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on the already existing for the v3d OpenGL driver, but
without references, and with some extra OOM checks (Vulkan CTS has
several OOM tests).
With this commit v3dv_bo_alloc and v3dv_bo_free became frontends to
the bo_cache. The former tries to get a BO from the cache if possible,
and the latter stores the BO on the cache if possible. The former also
adds a new parameter to point if the BO to allocate is private.
As v3d we are only caching private BOs, those created by the driver
for internal use (like CLs, tile_alloc, etc). They are the ones with
the highest change of being reused (for example, CL BOs are always
4KB, so they can always be reused). User-created BOs can have any
size, including some very large ones for buffers and images, which
makes them far less likely to be reused and would add a lot of memory
pressure if we decided to cache them.
In any case, in practice, we found that we could get a performance
improvement by caching also user-created BOs, but that would need more
care and an analysis to decide which ones makes sense. Would also
require to change how the cached BOs are stored by size. Right now
there are an array of list_head, that doesn't work well with big
BOs. If done, that would be handled on a separate commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The blit shader path for buffer to image copies is pretty bad,
since it needs to produce a tiled image from the linear buffer
prior to emitting the blit copy.
This patch adds a new preferential path where we implement the
copy using the CPU, similar to what the GL driver does for
texture uploads. This makes vkQuake2 at least 4x faster when
dynamic lights are enabled (which triggers dynamic texture
updates).
We also tested a GPU path where we use a shader that takes the
linear buffer as a UBO and copies directly from it. This also
shows a clear performance gain, but still worse than the CPU
implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This allows us to remove some individual bos for the image and
sampler, used to store the SAMPLER_STATE and TEXTURE_SHADER_STATE. Now
they are prepacked on static memory as part of the vulkan object
struct.
This commit introduces small descriptor structs, used to define what
the bo subregion would contain. It is used mostly to compute offsets
to that specific data, and define the size needed. Having said so, it
would be possible to replace them with some kind of flag (like anv) or
just compute the offset based on the context.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were saving all the descriptor info on the host memory. With
this commit we do the equivalent that other mesa vulkan drivers (Anvil
and Turnip) and create a bo on the descriptor pool that would be
suballocated for each descriptor.
This would allow to clean up individual bos from some vulkan objects,
reducing device memory fragmentation, and allowing to avoid to alloc
bos for that info. After all, pre-allocating needed memory is one of
the purposes of the descriptor pool.
This commit introduces all the infrastructure, but doesn't use it for
any descriptor yet, as if no descriptor needed data uploaded to a bo.
The idea to decide which info goes to the descriptor pool bo is info
that we would need to upload to a bo in any case, as it is referenced
as an address by any packet.
We could be more aggressive with that general rule, but that would be
enough for now. If in the future we support
VK_EXT_descriptor_indexing, we probably would need to store more info,
as under that extension, descriptors can be updated after being bound.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were assuming that if the command buffer state doesn't have any
attachments (as per the attachment count) the attachment state array
should not be allocated, however, during meta operations it is
possible that the attachment state grows (since meta operations can
emit render passes of their own). In that case, we would grow the
state for the meta operation but then pop the previous attachment
count and we would leak the state.
An example of that is a secondary command buffer which has no
attachment state by default since it doesn't execute a render pass
begin, but that executes one in a meta operation (for
vkCmdClearAttachments for example).
Fix this by making the attachment count an allocation count instead
and not popping it once we finish a meta operation. Also, always free
the state so long as there is a valid pointer, and assert that the
allocated count is not zero in that case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The main change we are introducing here is that now we allow secondary
command buffers that execute in a render pass to have a job list with
more than one job.
The main issue with vkCmdClearAttachments is that we currently need
this to spawn multiple jobs to clear multilayered framebuffers, as we
need to setup a different 2D framebuffer for each layer to clear and
therefore emit a different RCL for each. We could avoid this
completely by used layered rendering with the "clear rect" path to
redirect the clear rects to appropriate layers of the primary
framebuffer, however, our hardware only supports layered rendering
with geometry shaders, which we don't support at present.
Because vkCmdClearAttachments relies on having framebuffer state
available (something we would not need if we used the geometry shader
implementation), if this is not available in the secondary we need to
postpone emission of the command until the secondary is executed
inside a primary. We do this by using a new CPU job
V3DV_JOB_TYPE_CPU_CLEAR_ATTACHMENTS that is processed during
vkCmdExecuteCommands by calling vkCmdClearAttachments directly in the
primary.
As a consequence of these changes, it is now possible that a secondary
command buffer that runs inside a render pass have any kind of job in
its job list, including partial CLs that need to be branched to and
full CLs that need to be submitted to the GPU as is, so we introduced
a new GPU job type V3DV_JOB_TYPE_GPU_CL_SECONDARY to identify partial
CLs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Event waits can be safely moved before a render pass start, since
event setting and resetting commands cannot happen inside one. We
don't need to go that far, but we can use this to record the wait
in its own separate job and then execute this job before the
binning commands recorded in the secondary command buffer when
we execute the secondary into a primary.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.
For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.
For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.
This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments
Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In order to properly check (and possibly compile) shader variants we
need a pipeline and a compatible descriptor set. So far we were trying
to do that check as early as possible, so we were trying to do it at
CmdBindPipeline or CmdBindDescriptorSets, and a combination of dirty
flags. This showed to not cover all the corners cases, and made the
code complex, as needed to handle cases where the descriptors were not
yet available, and return early. The latter also meant that we were
running several checks that failed in the middle.
This commit moves the variant check to CmdDraw, when we should have a
pipeline and compatible descriptor sets, and simplifies and makes more
strict the existing code.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This reverts a previous half-attempt at an implementation of events
using a BO to hold the event state, and provides a full
implementation. V3D doesn't have any built-in GPU functionality to
wait on any kind of events, so we need to implement this in the driver
an therefore we no longer need to use a BO for the event state.
Instead, we implement GPU waits by using a CPU job for the wait
operation that spawns a wait thread if the wait operation doesn't have
all its events signaled by the time it is processed. To implement the
semantics of the wait correctly, any jobs in the same command buffer
that come after the wait will not be emitted until the wait thread
completes.
If a submit spawns any wait threads for a command buffer we can't
signal any semaphores for it until all the wait threads complete and
we know that all the jobs for those command buffers have been
submitted. The same applies to the submit fence, if present.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is generally very difficult to handle properly everywhere, but
at least this is good enough to make the few CTS tests for this happy.
Fixes (on Rpi4):
dEQP-VK.wsi.xlib.swapchain.simulate_oom.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were declaring the destroy callback function as taking a pointer for the
vulkan object handle and relying on an implicit conversion to the Vulkan
handle type, however that would be incorrect on 32-bit platforms, where
non-dispatchable Vulkan objects (the kind that we may allocate privately during
command buffer recording), are defined as uint64_t, so the signature of the
destry callback type doesn't match the signature of the actual Vulkan
function, leading to bogus results. Fix that by using uint64_t instead.
This fixes compilation warnings and also crashes in some tests when
compiling and executing natively in Rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were pre-packing the constants from the pipeline state and then
always emitting that at draw time, ignoring dynamic state. This makes
it so we don't prepack at pipeline creation time and we always emit
the correct constants directly the command buffer dynamic state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We can now implement all depth/stencil blits as compatible color blits,
so let's just have the blit shader interface simply convert any blit with
a depth/stencil format to a compatible color blit (like we were already
doing for s8d24) and get rid of the depth blitting path. This also allows
us to ignore the blit aspect in the blit pipeline cache key.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We implement this by blitting the requested region to a linear image
setup to use the buffer memory store at the requested offset.
Because we can't store linear depth/stencil images, we implement
copies of depth/stencil aspects by using a compatible color blit.
To do this, we also need to account for the fact that when we are
copying depth from a d24 format we need to copy them from the MSB
24-bits of each word as provided by the hardware and store them
in the LSB 24-bit of the buffer (as per Vulkan requirements). This
is achieved by expanding our blit interface to also accept a swizzle
to apply to the source texture.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
To do this we just implement the stencil blit as a masked color bit
with uint8 format. This allows us to support blitting on combined
depth/stencil formats, and therefore, also partial image copies
for these formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were just asserting or aborting if any of the internal
method used during the pipeline creation failed.
We needed to change the return value of several methods, in order to
bubble up the proper memory allocation error.
Note that as the pipeline creation is complex and splitted in several
methods, if an error happens in the middle, it returns back, and rely
on the higher level to call PipelineDestroy. This method needs to take
into account that some of the resources could have not been allocated
when freeing it.
Also note that v3dv_get_shader_variant is used during the pipeline
bind, as with the new resources bound, we need to check if we need to
recompile a new variant. At that moment we are not creating a new
vulkan object so we can really return a OOM error. For now we just
assert on that case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This uses the framework to register private commmand buffer objects
that get freed automatically when the command buffer is destroyed by
the application.
This change also moves the descriptor set pool that the meta blit path
uses to allocate descriptors for the blit source textures, from the
device to the command buffer, so we can have a descriptor pool per
command buffer. This is necessary to ensure correct behavior when
doing multi-threaded command buffer recording (alternatively, we would
have to lock around the descriptor set allocation code, which would be
undesirable).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This allows the driver to register private Vulkan objects it creates as part
of command buffer recording (usually for meta operations) in the command
buffer, so they can be destroyed together with it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While very limited in scope, this might be the most efficient way to blit
when applicable. In fact, we might also want to use this for the image copy
commands when possible instead of the TLB.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.
The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.
Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were enabling VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT for
any format valid for texturing, but for example, right now we don't
support linear filtering on any depth format.
This is needed to get some hundreds of tests like this:
dEQP-VK.pipeline.sampler.view_type.1d.format.r32g32_sfloat.mag_filter.linear
properly skipped (those were all Crashes with the simulator, and
almost all Fails with the real device).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some texture operations (like mipmap query levels) that
doesn't require a sampler. In fact, you should ignore it. So we need
to take it into account when combining the
indexes. nir_tex_instr_src_index is returning a negative value to
identify that case, but as we are using a uint32_t to pack both values
(for convenience, easy to pack/unpack the hash table key), we just use
a uint value big enough to be a wrong sampler id.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
OpenGL doesn't have the concept of individual texture and sampler, so
texture and sampler indexes have the same value. v3d compiler uses
this assumption, so for example, the texture info at the v3d key
include values that you need to use the texture format and the sampler
to fill (like the return_size).
One option would be to adapt the v3d compiler to handle both, but then
we would need to adapt to the lowerings it uses, like nir_lower_tex,
that also take the same assumption.
We deal with this on the Vulkan driver, by reassigning the texture and
sampler index to a combined one. We add a hash table to map the
combined texture idx and sampler idx to this combined idx, and a
simple array to the opposite map. On the driver we work with the
separate indices to fill up the data, while the v3d compiler works
with the combined one.
As mentioned, this is needed to properly fill up the texture return
size, so as we are here, we fix that. This gets tests like the
following working:
dEQP-VK.glsl.texture_gather.basic.2d.depth32f.base_level.level_2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We must update our check for whether the render area is tile-aligned for
each subpass, since the hardware will update tile sizes for each RCL.
Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.8
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is the same as the subpass start version, only that it won't
emit subpass clears. This is necessary when resuming a subpass
from a partial clear to make sure we don't try to clear subpass
attachments again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since a meta partial clear starts a new render pass, we need to store
all state that can be changed with vkCmdBeginRenderPass.
Also, since the meta clear pipeline sets dynamic state, we also
have to restore that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
They are bound at the set layout, and cannot be changed. From
VkDescriptorSetLayoutBinding spec:
"pImmutableSamplers affects initialization of samplers. If
descriptorType specifies a VK_DESCRIPTOR_TYPE_SAMPLER or
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER type descriptor, then
pImmutableSamplers can be used to initialize a set of immutable
samplers. Immutable samplers are permanently bound into the set
layout and must not be changed; updating a
VK_DESCRIPTOR_TYPE_SAMPLER descriptor with immutable samplers is
not allowed and updates to a
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor with immutable
samplers does not modify the samplers (the image views are updated,
but the sampler updates are ignored)"
We stored them as part of the set layout. It also means that when we
need the sampler (like for texture operations) we can't just ask for a
descriptor, as it would not have the sampler. A new method is created.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The problem with this is that TLB clears always clear and store full
tiles, so if our render area is not perfectly aligned to tile boundaries
we end up clearing all pixels in tiles that are only partially covered.
In this scenario we have to avoid using TLB clears and instead fallback
to clearing by rendering a scissored quad in the clear color, like we do
for partial clears in vkCmdClearAttachments.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been caching the first pipeline we produced and always
reusing that, which is obviously incorrect.
This change implements a proper cache and also takes care of releasing
the cached resources when the device is destroyed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is achieved by rendering a quad in the clear color for each layer
of each attachment being cleared. Right now we emit each clear in a
separate job with a single attachment framebuffer, but in the future
we may be able to extend the solution to using multiple render targets
and clear multiple attachments with a single job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Specially after CmdBindDescriptorSets, it is likely that we would need
a new shader variant, like for example if sampler descriptor sets are
bound.
At that moment a new v3d key is populated, using as base the one used
at pipeline creation, so only cmd_buffer depending values are changed.
Then a new variant is requested. Note that internally it is handled
with a cache, so no new compilation will be done if not needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far, we were doing the compilation to qpu when the pipeline was
created (as part of vkCreateGraphicsPipeline).
But this would not be correct when some specific descriptors are
involved, like textures. For that case some nir lowerings depend on
the texture format, and that info is not available until the specific
descriptors are bound to the command buffer. In the same way, the same
command buffer with a given pipeline could get their descriptor bound
again.
So it would be needed to support compilation variants of the same
shader. So finally, the v3d_key would work as keys, as the variants
would be tracked with a hash table.
This commit introduces the new structures for that. What we were
building as the final qpu shader would become the initial default
variant for the pipeline. We are also saving the keys used at that
point, to avoid needing to fully regenerate them when a new variant is
created. Not just for performance, but also to avoid needing to track
the graphics pipeline create info structure.
The code to handle updating the current variant would be done on
following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This include SAMPLER, COMBINED_IMAGE_SAMPLER and SAMPLED_IMAGE
descriptors.
In order to support them we do the pre-packing of TEXTURE_SHADER_STATE
and SAMPLER_STATE when Images and Samplers (respectively) are
created. Those packets doesn't need to be tweaked later, so we upload
them to an bo.
A possible improvement of this would be that the descriptor pool
manages a bo for all descriptors, that suballocate for each descriptor
allocated. This is what other drivers do (and as far as I understand,
one of the reasons of having a descriptor pool).
Immutable samplers are not supported, will be handled on a following
patch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
v3dv_descriptor is going to be expanded with more data, so it doesn't
make sense anymore to handle a fake descriptor for the push
constants. Introducing a new struct, that is just a pair
bo/offset. Initially named v3dv_resource, as it could be the base to
reuse bos for different resources (like assembly bo)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Not quite sure why this is required though. Conversion from/to
sRGB happens on tile loads and stores, with the tile buffer
being always linear, so there should be no difference.
Fixes all test failures in:
dEQP-VK.pipeline.blend.format.r8g8b8a8_srgb.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In this mode, which can be activated with V3D_DEBUG=always_flush like
in the GL driver, we flush every draw call separately. For now this
is useful for debugging, but we can also set the flag internally on
specific jobs when we identify scenarios where we need the same behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is valid to submit with an empty list ofcommand buffers, however,
we still need to wait on the pWaitSemaphores provided and only signal
the pSignalSemaphores and fence once we have finished waiting on them
to honor the semantics of the submission.
Because waiting and signaling happens in the kernel, the easiest way
to do this is to submit a trivial no-op job to the GPU. To do this,
we need to refactor some of our code so that code that might have been
operating on a command buffer starts operating on a job instead, so we
can resuse most of our infrastructure to create the no-op job.
Additionally, because no-op jobs are created internally by the driver,
we are responsible for destroying them too. For this, we bind a fence
to each no-op job we submit and we test for completion of in-flight
no-op jobs (and destory them if completed) every time vkQueueSubmit
is called.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are a handful of tests that simulate 'out of memory' situations
during swapchain image creation, and these can lead to failed job
allocations when the driver is running on the prime blit path, as that
involves creating a command buffer. The tests expect us to handle this
scenario gracefully and return an appropriate OOM error as a result.
This make sure we don't try to dereference a job if we failed to allocate
it so we don't crash and can return the OOM error gracefully in the
process.
Fixes:
dEQP-VK.wsi.xlib.swapchain.simulate_oom.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Push constant tests were still working without taking this into
account because we can't really fine graine how much we allocate for
them.
For now we only use them to know if we should allocate/fill the ubo
for push constants or not.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This puts all the information required to setup frame tiling into
v3dv_frame_tiling so we no longer need a framebuffer to start a
frame. This makes the code simpler, since frame tiling calculations
happen automatically when we start a new frame and simplifies
the implementation of copy and clear operations that used to
requiere that we setup a fake framebuffer with no actual attachments,
which was a bit of a kludge.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with computing frame tiling information for
the framebuffer object, but that is not correct, since different subpasses
may access different subsets of the framebuffer, with each requiring a
different configuration because the number of render targets and the maximum
bpp can change for each subpass.
This adds a v3dv_frame_tiling struct to keep the frame tiling information and
rewrites the code to compute this for every new job we start.
Fixes a bunch of tests in dEQP-VK.pipeline.render_to_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When we create a new job for a new subpass, we might have to finish
the current job for the previous subpass, so it is important that we
we don't get ahead of ourselves and increment the current subpass index
in the command buffer state until we are really done finishing the
previous job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were storing the format of the base miplevel in the image view and
we were typically using that instead of the taking the format from
the appropriate image slice. This was a problem when loading or storing
a miplevel other than the base which happened to have a different format.
This also removed the tiling field from the image view to avoid repeating
the same mistake in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
I am not quite certain that this is the way to go though. Here, we are
expecting that the GPU can set/reset the event inside a command buffer
as a 1x1 pixel clear for example, however, there is still the question
of how we get to implement the command buffer wait on an event, since
reading the docs I haven't found any such functionality to be available.
We could think of implementing this by splitting the command buffer
into multiple jobs at the wait command, and then using a separate
thread for job submissions that would poll the event UBO before sending
it to the kernel, but that looks like a bit of a kludge.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
By default they are trivially lowered to load_uniform.
We still need to allocate an UBO for push constants, used for those
that are accessed using a non-const index. This is automatically
handled by the compiler, as it cames back as asking a
QUNIFORM_UBO_ADDR. This is what already does for gallium.
Note that if needing the UBO, we are uploading the full push constant
data. An improvement would be to try to upload only the data that
needs to rely on the UBO (so non-const accesses to uniforms).
Also, the code is not handling getting out of space from the UBO
bo. This would be tackled at a different commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Some of these may need additional work to work for real, but we should
be able to support them.
We also include some formats that are not supported for images, but
that we want to support for buffers, such as R32G32B32 for a vertex
buffer. In the future we might want to expand the format table to
specify which formats are supported for buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
From the Vulkan spec:
"When an image view of a depth/stencil image is used as a
depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So in that scenario, we ignore the aspect mask on the view and go
check the actual format of the underlying image to decide if we
have depth or depth+stencil aspects.
This gets VkRunner's depth-buffer.shader_test to pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For now this only implements a fast path using the tile buffer, so it
can only be used when clearing full images, but this is good enough
for VkRunner.
The implementation is a bit tricky because this command executes
inside a render pass, and yet, since we are using the tile buffer to
clear, this needs to go in its own job. This means that with this, we
need to be able to split a subpass into multiple jobs which creates
some issues.
For example, certain operations, such as the subpass load operation
(particularly if it is a clear) should only happen on the first job of
the subpass and subsequent jobs in the same subpass should always
load.
Similarly, we should not discard the last store on an attachment
unless we know it is the last job for the last subpass that uses the
attachment.
To handle these cases we add two new flags to the job, one to know if
the job is not the first in a subpass (is_subpass_continue) and
another one to know if a job is the last in a subpass
(is_subpass_finish).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For that we include the array_index when asking for a ubo/ssbo index
from the descriptor_map.
Until now, array_index was not included, but the descriptor_map took
into account the array_size. This had the advantage that you only need
a entry on the descriptor map, and the index was properly return.
But this make it complex to get back the set, binding and array_index
back from the ubo/ssbo binding. So it was more easy to just add
array_index. Somehow now the "key" on the descriptor map is the
combination of (set, binding, array_index).
Note that this also make sense as the vulkan api identifies each array
index as a descriptor, so for example, from spec,
VkDescriptorSetLayoutBinding:descriptorCount
"descriptorCount is the number of descriptors contained in the
binding, accessed in a shader as an array"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Focused on getting the basic UBO and SSBO cases implemented. So no
dynamic offset, push contanst, samplers, and so on.
This include a initial implementation for CreatedescriptorPool,
CreateDescriptorSetLayout, AllocateDescriptorSets,
UpdateDescriptorSets, CreatePipelineLayout, and CmdBindDescriptorSets.
Also introduces lowering vulkan intrinsics. For now just
vulkan_resource_index.
We also introduce a descriptor_map, in this case for the ubos and
ssbos, used to assign a index for each set/binding combination, that
would be used when filling back the details of the ubo or ssbo on
other places (like QUNIFORM_UBO_ADDR or QUNIFORM_SSBO_OFFSET).
Note that at this point we don't need a bo for the descriptor pool, so
descriptor sets are not getting a piece of it. That would likely
change as we start to support more descriptor set types.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The general idea is that we always emit from our dynamic state, and when
a particular piece of state is not dynamic, we just set our dynamic state
from the pipeline state, however, the implementation was quite confusing:
the mask of dynamic states flagged states that were not dynamic and some
places woud mix dirty flags and dynamic state flags. We also were not
updating the dynamic state mask in the command buffer, etc.
This patch, hopefully, simplifies all this and makes it less confusing,
starting by making the dynamic state mask flag dynamic states, fixing
the places where we would confuse dirty state flags with dynamic state
flags, making sure that our command buffer state is setup correctly
and that we only emit state when it is actually dirty.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not computing viewport transform for static viewports provided with
the pipeline state. Also, we were not copying the transform into the command
buffer state when we bound the pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Otherwise we would lose updates relevant to subsequent subpasses in
the same renderpass that read or partially write the attachment.
The only scenario where we can safely do this is on the last subpass
that uses the attachment, so long as we don't need to emit the store
for the clear.
This also fixes a bug in the computation of the first subpass that
uses an attachment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This ignores stencil for now and focuses on depth testing without
support for early depth testing.
To implement this we need to start considering how many of our
framebuffer attachments are color attachments, since some of the
computations we use to determine tile sizes and binning configuration
depend on this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When running on the real hardware we have two devices: the v3d render
node and the vc4 display node. We need the latter to allocate
winsys BOs for v3d to render into. Since exporting these BOs is
a privileged operation, we need to obtain the fd for this device
through the display server. For now we only support doing this through
the XCB DRI3 platform.
Also, do not duplicate or re-open the DRM devices when creating logical
devices. The simulator checks that the file descriptor is exactly
the same we used to initialize it when we created the physical device
and aborts if it sees a different fd number, even if it points to the
same device.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This includes:
* Implementation for CmdBindVertexBuffers
* Gather vertex input info during CreateGraphicsPipelines
(pipeline_init) and SHADER_STATE_ATTRIBUTE_RECORD prepacking
* Final emission of such packet during CmdDraw
(cmd_buffer_emit_graphics_pipeline)
Default attributes values will be handled on a following patch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We can't prepack all the record, as addresses need the job, and
uniforms depend on dynamic value.
Also due cl_emit_with_prepacked and v3dv_pack asserting correct
values, we need to define two values twice, that lead to move
vpm_config to the pipeline. In any case, the latter will be useful
when we start to prepack more stuff.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Generally, we can do this when they render to the same collection of
attachments and we only need to emit a single RCL for them.
To implement this, we need to track the first subpass that is included
in the job and rewrite our loads and stores in the RCL to refer to that
subpass instead of the current subpass (which would be the last included
in the RCL).
When we merge jobs we also reuse the tile state/alloc BOs and we only
emit the binning setup once.
The environment variable V3DV_NO_MERGE_JOBS can be set to disable
job merging and have each subpass be in a separate job. This can be
useful for debugging issues spawning from incorrect subpass merges.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we make progress towards more complex submissions we will need to split
our command buffers into smaller executable units (jobs) that we can
submit indepdently to the kernel. This will be required to implement
pipeline barriers, split subpasses that have depedencies on previous
subpasses, split render passes that use more than 4 render targets, etc.
For now we keep things simple and we only keep one job as current
recording target in the command buffer, and we generate a new one
with every subpass or with any commands we see outside of a render pass
(only vkCopyImageToBuffer for now). In the future we probably want to
optimize this by merging subpasses into the same job when possible,
etc.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We will use this when we implement copying images to buffers using the
TLB, where we'll need to setup a framebuffer and tiling configuration
for the TLB store to the destination buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Initial port of the equivalent v3d_write_uniforms, to be used by the
cmd_buffer when emitting the drawing packets.
Initially doesn't include all the quniform types, only those needed by
the initial basic vulkan tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Before that commit we were calling get_viewport_xform to get those
values twice (to emit scissor and viewport), and we found that we
would need that info even more times. So let's just compute that info
when setting the viewport, and reuse the values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Values still doesn't take into account having vertex elements data,
but keeps some of that half-done code in comments. It would be better
to do that when we get an example using it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Starting with Viewport/Scissor data from VkGraphicsPipelineCreateInfo.
Note that initially this can be somewhat counter-intuitive. What we
are really doing it is filling up the structs with the dynamic stuff
from the pipeline, when such is not defined as dynamic. This is what
anv/radv does, and basically means that we treat both in the same way,
so easier after that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The basic to get the spirv built to nir, including calling some common
nir passes. Pending deep review if all those are needed or if we miss
some, but for that it would be better to be able to run existing
tests.
Enough to get assembly generated for simple tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So we have a lower level representation of a buffer object that we can
manipulate that is not tied to a Vulkan representation of memory. This
will be useful as we start allocating driver internal buffers, such as
command lists.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>