There are titles clearing the same descriptors constantly.
This leads to unnecessary updates that can become costly.
This commit introduces a new flag to track when D3D12 descriptors are
not null, and skips clearing them if they are already null.
Descriptors are assumed to be null by default.
This fixes a performance regression introduced by
9983a1720f
Signed-off-by: Rodrigo Locatti <rlocatti@nvidia.com>
Psychonauts 2 uses a SAMPLE_DESC.Count of 0 for some things, which
previously was forcing it down the MSAA alignment placement path.
Found from playing a native D3D12 apitrace back and seeing
the log spam.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Goal here is to avoid unnecessary image layout transitions when render
passes toggle depth-stencil PSO states. Since we cannot know which
states a resource is in, we have to be conservative, and assume that
shader reads *could* happen.
The best effort we can do is to detect when writes happen to a DSV
resource. In this scenario, we can deduce that the aspect cannot be
read, since DEPTH_WRITE | RESOURCE state is not allowed.
To make the tracking somewhat sane, we only promote to OPTIMAL if an
entire image's worth of subresources for a given aspect is transitioned.
The common case for depth-stencil images is 1 mip / 1 layer anyways.
Some other changes are required here:
- Instead of common_layout for the depth image, we need to consult the
command list, which might promote the layout to optimal.
- We make use of render pass compatibility rules which state that we can
change attachment reference layouts as well as initial/finalLayout.
To make this change, a pipeline will fill in a
vkd3d_render_pass_compat struct.
- A command list has a dsv_plane_optimal_mask which keeps track
of the plane aspects we have promoted to OPTIMAL, and we know cannot
be read by shaders.
The desired optimal mask is (existing optimal | PSO write).
The initial existing optimal is inherited from the command list's
tracker.
- RTV/DSV/views no longer keep track of VkImageLayout. This is
unnecessary since we always deduce image layout based on context.
Overall, this shows a massive gain in HZD benchmark (RADV, 1440p ultimate, ~16% FPS on RX 6800).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The clamp is absolute, not relative to baseMip. Also avoids validation
error and potential crash when LODClamp > numLevels.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Not correct, will need spec additions to handle it properly.
Fixes ground rendering in DIRT 5.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
- Honor resource barriers for resource states which cannot automatically
decay or promote. This includes COLOR_ATTACHMENT, UNORDERED_ACCESS and
VRS image. If SIMULTANEOUS_ACCESS is used, we can still promote, and
we handle that by setting common layout to GENERAL for these resources.
- Avoid redundant barriers in render passes since normal resource
barriers will always make sure we are already in
COLOR_ATTACHMENT_OPTIMAL.
- Do not force GENERAL layout if resource has UNORDERED_ACCESS flag set.
As this is not a promotable state, we have to explicitly transition
into it. I tested this on validation layers, where even COMMON state
refuses to promote to UAV state. The exception here of course is
SIMULTANOUS_ACCESS, but we handle that properly now.
- Verify that UAV or SIMULTANEOUS access is not used together with DSV
state. This is explicitly banned in the API docs.
- Actually emit image barriers. Batch the image transitions as that's
what D3D12 docs encourage app developers to do, and it also expects
that drivers can optimize this. Ensure that we respect the in-order
resource barrier rules by splitting batches if there are overlaps in
the transitions.
- Ensure that correct image layout is used when clearing a suspended
render pass attachment.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The spec is pretty clear that it's invalid usage. Return E_INVALIDARG
like native drivers.
This is a workaround for the inventory GPU hang with Cyberpunk 2077
which is actually a game bug. Luckily the game handles this error
properly.
The problem is that the game always assume that an image with 2 mips
is smaller than the same image but with 6 mips. This is not always
true if the swizzle mode is different and a recent Mesa update changed
that. Then the game creates a D3D12 heap that is too small and this
triggered a memory violation and then a GPU hang with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Some games end up writing the wrong descriptor type when using null
descriptors, and to be robust against that, we have to clear out
all descriptors when creating null descriptors.
If we copy a null descriptor, we will also have to copy from all sets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Replaces d3d12_device_get_vkd3d_queue when mapping D3D12
command queues to Vulkan device queues.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Bindless CBV is *pretty* bad on NVIDIA, so add a code path which can
promote descriptor table CBVs into push descriptors.
We can safely do this with Root Signature 1.1 STATIC or
the somewhat obscure STATIC_KEEPING_BUFFER_BOUNDS_CHECKS.
With VOLATILE, which basically all titles are using,
we can still force this behavior through a config flag,
but this is an incorrect speed hack. It works in most
titles however, since bindless CBV is exceptionally rare.
We only hoist descriptors when the root signature range has 1 descriptor
anyway, so we should avoid any reasonable bindless scenario.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There is no need to scan through the Vulkan format list,
especially since texel buffer creation happens in the hot path
in cases where we know we need to create R32UI texel buffer views.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
As per MSDN, SetName is just a wrapper around SetPrivateData and a specific GUID.
Some apps and tools will use this to retrieve their name back.
So instead, just forward the name to Vulkan in the SetPrivateData call.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
When building acceleration structures, we need to have an
VkAccelerationStructureKHR object, but the D3D12 API just uses a plain
VA = ID3D12Resource::GetGPUVA() + offset.
For this to work, we need to resolve the VA back to VkBuffer + offset.
The only VkBuffer we can lookup is the original backing memory
allocation in the VA map, and that allocation itself must own a view
map, since we cannot tie the VA to any specific ID3D12Resource.
Since creating an RTAS is not the common path, we allocate the view map
on-demand with CAS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
RTAS must stay in this resource state forever. The only way to
synchronize them is UAV barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Allows local root signatures to work correctly and is also a good
optimization since we no longer need to dereference memory (potentially
cold cache lines) to figure out heap offset in command buffer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise, when suballocating memory, GetHeapProperties may
not return the exact same set of flags if we ignore flags
when looking up suitable chunks.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>