Subpass color clear pipelines are those used to emit partial attachment
clears as draw calls inside the render pass currently bound by the
application in the command buffer, leading to a huge performance improvement
compared to the case where we emit them in their own render pass.
Unfortunately, because the pipeline references the render pass
object in which it is used and the render pass object is owned by the
application (and can be destroyed at any point), we can't cache these
pipelines (unless we implement a refcounting mechanism or other
similar strategy).
Performance impact looks negligible based on experiments with vkQuake3,
probably because the underlying pipeline cache is preventing the
redundant shader recompiles.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The lowering will get all the interpolateAt() functions from GLSL lowered to
the corresponding intrinsics we have just implemented in the compiler backend,
which was the last piece we needed to enable the feature.
This gets us to pass all the relevant tests in:
dEQP-VK.pipeline.multisample_interpolation.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
as we have just set proper values for point granularity etc, we can
enable largePoints. With this change tests like this:
dEQP-VK.rasterization.primitive_size.points.point_size_*
goes from Skip to Pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we are here, we also tweak some line-related limits, as some use
the same value that for point, and in order to use the enum we added
recently at common/v3d_limits.h
Fixes the following test:
dEQP-VK.glsl.builtin_var.simple.pointcoord
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asking the simulator the total memory it is using, instead of sysinfo
(that returned the host system memory).
Fixes the following CTS tests when using the simulator:
dEQP-VK.memory.allocation.basic.percent_1.forward.count_12
dEQP-VK.memory.allocation.basic.percent_1.reverse.count_12
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we understand that texture accesses should be aligned to the UIF
block size.
Fixes several of the CTS tests under this pattern:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*.offset_nonzero
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_texel_buffer.*.offset_nonzero
Note: for those tests, using a lower value (64) was enough to get them
working, but again, we understand that the real alignment is the UIF
block size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
That it would be used as fallback. Three advantages:
* Having a cache for user operations even if the user doesn't
provide it.
* Having a cache for internal operations. v3dv_meta_copy creates
pipelines for some copy path, so it is interesting to have them
cached.
* Testing: so now the pipeline cache is tested by more CTS tests.
As any other pipeline cache, it can be disabled with the
V3DV_ENABLE_PIPELINE_CACHE. It was suggested that would make sense to
have a specific envvar for the default pipeline cache, but for now
just one envvar is enough.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on anv nir caching. One of the bigger difference is that
we don't create the nir shader using a ralloc_context local to the
main compile graphics method. On anv, after compiling the shader, they
discard the nir shader. We need it as we could need it to build shader
variants later.
As anv, we introduce a environment variable to disable the cache:
V3DV_ENABLE_PIPELINE_CACHE
By default is enabled. The main purpose for this envvar is debugging,
in order to provide a easy way to discard a bug on the cache.
It is pending to serialize/deserialize the NIR shaders as part of
GetPipelineCacheData and PipelineCacheCreate. We also plan is to cache
too shader variants. We would do that on following patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since vkCmdClearAttachments executes inside a render pass, we would
benefit from converting it to a draw within the current subpass job to
improve batching and avoid expensive tile load/store operations.
This can dramatically improve performance for applications using this
command, however, we can only use this if we are clearing the base
layers of framebuffer attachments, since otherwise we would need to
use layered rendering, which we don't support yet.
This improves vkQuake3 performance dramatically (almost 100%
performance improvement at 1080p), which calls this twice per frame.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason, CTS expects E5B9G9R9 and B10G11R11 with
transparent black border clamping produce alpha 1 instead of 0.
Since border color takes precedence over the texture state swizzle,
the only way to fix this is to lower the texture swizzle in the shader
to set alpha to 1.
Fixes:
dEQP-VK.pipeline.sampler.view_type.*b10g11r11*clamp_to_border_transparent_black
dEQP-VK.pipeline.sampler.view_type.*e5b9g9r9*.clamp_to_border_transparent_black
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In OpenGL, unnormalized coordinates are implicit based on the sampler
type (rectangle textures), so the compiler can set the flag when needed.
In Vulkan, however, this is configured explicitly in the sampler object,
so the compiler won't set it and we need to do it manually when we are
writing the P1 uniform.
Fixes:
dEQP-VK.pipeline.sampler.exact_sampling.*.unnormalized_coords
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Instead of asserting that users don't try to create images that
would require 4GB+ of memory, error out with the corresponding
OOM error when the user tries to actually allocate the memory
for the image.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Only free the underlying BO when the exported memory object is freed
to avoid multiple frees of the same memory.
The only exception is winsys BOs where we import a BO created in the
display device into the render device. In this case, we only have one
memory object referencing the BO and we want to destroy it with that
memory object.
Fixes:
dEQP-VK.api.external.memory.dma_buf.*
dEQP-VK.api.external.memory.opaque_fd.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on the already existing for the v3d OpenGL driver, but
without references, and with some extra OOM checks (Vulkan CTS has
several OOM tests).
With this commit v3dv_bo_alloc and v3dv_bo_free became frontends to
the bo_cache. The former tries to get a BO from the cache if possible,
and the latter stores the BO on the cache if possible. The former also
adds a new parameter to point if the BO to allocate is private.
As v3d we are only caching private BOs, those created by the driver
for internal use (like CLs, tile_alloc, etc). They are the ones with
the highest change of being reused (for example, CL BOs are always
4KB, so they can always be reused). User-created BOs can have any
size, including some very large ones for buffers and images, which
makes them far less likely to be reused and would add a lot of memory
pressure if we decided to cache them.
In any case, in practice, we found that we could get a performance
improvement by caching also user-created BOs, but that would need more
care and an analysis to decide which ones makes sense. Would also
require to change how the cached BOs are stored by size. Right now
there are an array of list_head, that doesn't work well with big
BOs. If done, that would be handled on a separate commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Both the API user and the driver may attempt to map a BO, possibly
only partially and using different ranges. This is a problem because
we only have a single map per BO. Fix this by making sure that when
a BO is mapped, we always map its entire range. This way if a BO
has been mapped before, we know that map is still valid no matter the
region we need to access now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asserting on them makes it easier to identify applications and tests that
try to use unimplemented features.
Also, there are some APIs that relate to optional features we don't
(or can't) support, such as sparse, so for these we just provide
the trivial implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This allows us to remove some individual bos for the image and
sampler, used to store the SAMPLER_STATE and TEXTURE_SHADER_STATE. Now
they are prepacked on static memory as part of the vulkan object
struct.
This commit introduces small descriptor structs, used to define what
the bo subregion would contain. It is used mostly to compute offsets
to that specific data, and define the size needed. Having said so, it
would be possible to replace them with some kind of flag (like anv) or
just compute the offset based on the context.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.
For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.
For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.
This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments
Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This reverts a previous half-attempt at an implementation of events
using a BO to hold the event state, and provides a full
implementation. V3D doesn't have any built-in GPU functionality to
wait on any kind of events, so we need to implement this in the driver
an therefore we no longer need to use a BO for the event state.
Instead, we implement GPU waits by using a CPU job for the wait
operation that spawns a wait thread if the wait operation doesn't have
all its events signaled by the time it is processed. To implement the
semantics of the wait correctly, any jobs in the same command buffer
that come after the wait will not be emitted until the wait thread
completes.
If a submit spawns any wait threads for a command buffer we can't
signal any semaphores for it until all the wait threads complete and
we know that all the jobs for those command buffers have been
submitted. The same applies to the submit fence, if present.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
To do this we just implement the stencil blit as a masked color bit
with uint8 format. This allows us to support blitting on combined
depth/stencil formats, and therefore, also partial image copies
for these formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This uses the framework to register private commmand buffer objects
that get freed automatically when the command buffer is destroyed by
the application.
This change also moves the descriptor set pool that the meta blit path
uses to allocate descriptors for the blit source textures, from the
device to the command buffer, so we can have a descriptor pool per
command buffer. This is necessary to ensure correct behavior when
doing multi-threaded command buffer recording (alternatively, we would
have to lock around the descriptor set allocation code, which would be
undesirable).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.
The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.
Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
OpenGL doesn't have the concept of individual texture and sampler, so
texture and sampler indexes have the same value. v3d compiler uses
this assumption, so for example, the texture info at the v3d key
include values that you need to use the texture format and the sampler
to fill (like the return_size).
One option would be to adapt the v3d compiler to handle both, but then
we would need to adapt to the lowerings it uses, like nir_lower_tex,
that also take the same assumption.
We deal with this on the Vulkan driver, by reassigning the texture and
sampler index to a combined one. We add a hash table to map the
combined texture idx and sampler idx to this combined idx, and a
simple array to the opposite map. On the driver we work with the
separate indices to fill up the data, while the v3d compiler works
with the combined one.
As mentioned, this is needed to properly fill up the texture return
size, so as we are here, we fix that. This gets tests like the
following working:
dEQP-VK.glsl.texture_gather.basic.2d.depth32f.base_level.level_2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead, which we were already computing
but that we were not actually using for indexing by mistake.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been caching the first pipeline we produced and always
reusing that, which is obviously incorrect.
This change implements a proper cache and also takes care of releasing
the cached resources when the device is destroyed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>