If we're doing a layout transition of depth-stencil aspects, we need to ensure all potential
accesses are made visible.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
- Honor resource barriers for resource states which cannot automatically
decay or promote. This includes COLOR_ATTACHMENT, UNORDERED_ACCESS and
VRS image. If SIMULTANEOUS_ACCESS is used, we can still promote, and
we handle that by setting common layout to GENERAL for these resources.
- Avoid redundant barriers in render passes since normal resource
barriers will always make sure we are already in
COLOR_ATTACHMENT_OPTIMAL.
- Do not force GENERAL layout if resource has UNORDERED_ACCESS flag set.
As this is not a promotable state, we have to explicitly transition
into it. I tested this on validation layers, where even COMMON state
refuses to promote to UAV state. The exception here of course is
SIMULTANOUS_ACCESS, but we handle that properly now.
- Verify that UAV or SIMULTANEOUS access is not used together with DSV
state. This is explicitly banned in the API docs.
- Actually emit image barriers. Batch the image transitions as that's
what D3D12 docs encourage app developers to do, and it also expects
that drivers can optimize this. Ensure that we respect the in-order
resource barrier rules by splitting batches if there are overlaps in
the transitions.
- Ensure that correct image layout is used when clearing a suspended
render pass attachment.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Avoid using the separate layouts if we're only using formats with one
aspects. This makes it more likely to match layouts with common layout,
and we can avoid awkward transition barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We recently dropped this from Mesa because ACO is the default
compiler since August 2020, so it's implicit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The spec is pretty clear that it's invalid usage. Return E_INVALIDARG
like native drivers.
This is a workaround for the inventory GPU hang with Cyberpunk 2077
which is actually a game bug. Luckily the game handles this error
properly.
The problem is that the game always assume that an image with 2 mips
is smaller than the same image but with 6 mips. This is not always
true if the swizzle mode is different and a recent Mesa update changed
that. Then the game creates a D3D12 heap that is too small and this
triggered a memory violation and then a GPU hang with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In control flow, we can force LOD 0.0 to avoid undefined result when
games sample with implicit LOD in non-quad uniform control flow.
Behavior on different implementations is:
- Helper lanes come to life and interpolate shader input.
- LOD is clamped to 0.0 in divergent control flow.
This hack is not safe in general, since we force 0.0 even when the
control flow is quad uniform.
This is the most practical solution for the problem for now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Our internal copy shaders are fine, but we get benign errors about
sample count being wrong since we alias descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We cannot use the memory requirement output, since we will zero-clear
memory with a size that might be larger than the VkBuffer size.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is needed for VK_NVX_binary_import and VK_NVX_image_view_handle.
Signed-off-by: Roshan Chaudhari <rochaudhari@nvidia.com>
Reviewed-by: Liam Middlebrook <lmiddlebrook@nvidia.com>
Some games end up writing the wrong descriptor type when using null
descriptors, and to be robust against that, we have to clear out
all descriptors when creating null descriptors.
If we copy a null descriptor, we will also have to copy from all sets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In cases where acquire image is blocking, we should call that after
presentation to avoid latency when the app calls present.
This avoids weird inverse frame cadences with Mesa WSI right now,
as acquiring an image is always a blocking call until it is complete.
In cases when we aren't blocking, this kicks off the acquisition so
it can be waited upon by the next present blit pass.
Use another set of semaphores to wait for the image acquisition on the
GPU.
In the non-blocking vkAcquireNextImageKHR case, this means that a
potential bubble of time between waiting on the fence and submitting
the blit + presentation is eliminated.
Runaway presentation in this setup is avoided by frame latency objects
and normal frame latency which is always 3 according to documentation.
Be careful about handling SUBOPTIMAL. Semaphores will be signaled, but
we might want to tear down the swapchain. In these cases, we need to
wait for the semaphore to be signaled first, which can only be done by
submitting a wait, since QueueWaitIdle or DeviceWaitIdle don't cover
WSI.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Documentation says that this should always be 3 without WAITABLE_OBJECT
unlike in D3D11 where it will use the DXGI device's frame latency.
This stops runaway presentations in the non-blocking acquire image case
with the new semaphore setup.
Signed-off-by: Joshua Ashton <joshua@froggi.es>