Replaces d3d12_device_get_vkd3d_queue when mapping D3D12
command queues to Vulkan device queues.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The implemnentation is not complete enough to safely enable it, since
some games will try to create RTPSOs by default, leading to crashes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It's really supposed to load 4 components and ignore. RGB16 is not
mandatory, so just use the "expected" formats after all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Compacts ranges and only issues one bind for buffer ranges and
full subresource updates, rather than one bind per tile.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Bindless CBV is *pretty* bad on NVIDIA, so add a code path which can
promote descriptor table CBVs into push descriptors.
We can safely do this with Root Signature 1.1 STATIC or
the somewhat obscure STATIC_KEEPING_BUFFER_BOUNDS_CHECKS.
With VOLATILE, which basically all titles are using,
we can still force this behavior through a config flag,
but this is an incorrect speed hack. It works in most
titles however, since bindless CBV is exceptionally rare.
We only hoist descriptors when the root signature range has 1 descriptor
anyway, so we should avoid any reasonable bindless scenario.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There is no need to scan through the Vulkan format list,
especially since texel buffer creation happens in the hot path
in cases where we know we need to create R32UI texel buffer views.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We might have to emit to different bind point than our binding entry
suggests due to DXR, so pass down information explicitly to leaf
functions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Refactor push constant invalidation to SetPipelineState,
it is technically more correct to only invalidate when actually pushing
constants, but we need to do full state invalidation when transitioning
between RT pipelines and non-RT pipelines due to bind point aliasing
shenanigans in D3D12, so it makes more sense to invalidate state based
on active bind point there.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Enabling VK_EXT_debug_utils comes at some overhead in Wine due to the object tracking required. There is also likely a non-zero overhead in some native implementations also.
By enabling this conditionally, we can also avoid additional overhead from apps that set debug labels on both the Vulkan and front-end side.
The default condition is to enable it when building with Renderdoc integration or in debug builds.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
This thing has no right to exist.
We don't get this information in D3D12 and it's getting in the way of me refactoring config flags.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Simplifies this to make it easier to add new properties/features
so we don't have a bunch of pointers to things that are just a child
of the device info structure.
Fixes warnings when compiling without traces too.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Gives a massive boost on NVIDIA for some reason.
RADV defers push constant update, so ALL_STAGES doesn't have
that much of a perf hit.
~20% uplift in RE2, ~5% uplift in CP77 from some quick and dirty testing.
Seems to be heavily content dependent either way.
Also a bug fix, since we would clobber graphics push constants from
compute and vice versa if both graphics and compute used the same root
signature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
As per MSDN, SetName is just a wrapper around SetPrivateData and a specific GUID.
Some apps and tools will use this to retrieve their name back.
So instead, just forward the name to Vulkan in the SetPrivateData call.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Instead, infer the required stages from the D3D12 shader visibility
field from all root parameters that we map to push constants.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
There are pragmatic reasons for not following spec 100% here.
The only known case where UpdateAfterBind robustness is not exposed
seems to be somewhat bogus, and we cannot run D3D12 correctly without
robustness either way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Can only support a subset in Vulkan without extra heroics. The DXR API
lets you query things that you technically should know apriori in the
application. We might need to allocate some side-channel buffers on
demand, but let's defer that until actually needed ... :\
DXR is also very awkward in that we have a query which is resolved in
UNORDERED_ACCESS state instead of COPY_DEST state, so we'll have to
ping-pong through some barriers redundantly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When building acceleration structures, we need to have an
VkAccelerationStructureKHR object, but the D3D12 API just uses a plain
VA = ID3D12Resource::GetGPUVA() + offset.
For this to work, we need to resolve the VA back to VkBuffer + offset.
The only VkBuffer we can lookup is the original backing memory
allocation in the VA map, and that allocation itself must own a view
map, since we cannot tie the VA to any specific ID3D12Resource.
Since creating an RTAS is not the common path, we allocate the view map
on-demand with CAS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
RTAS must stay in this resource state forever. The only way to
synchronize them is UAV barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Allows local root signatures to work correctly and is also a good
optimization since we no longer need to dereference memory (potentially
cold cache lines) to figure out heap offset in command buffer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we're signalling and waiting on same physical queue (always true for
current SINGLE_QUEUE define), we can rely on submission boundary
synchronization which doesn't require any extra submissions to resolve.
Avoids awkward GPU driver bubbles with back to back signal -> wait pairs
with timeline.
Observed 2% GPU uplift on RE2 on AMD.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise, when suballocating memory, GetHeapProperties may
not return the exact same set of flags if we ignore flags
when looking up suitable chunks.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The difference between a range's offset and the aligned
offset may be greater than the size of that range.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is still useful as a low-level memory allocation function when
we don't want to bother with buffer offsets or D3D12 validation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Our clear code assume that this is NULL for allocations owned
by a chunk, so we should actually do it that way. Fixes some
issues where we do not wait for clears to complete if a chunk
gets destroyed.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Has one bit set for each vkd3d_queue_family that points to a
unique queue. This can be used to iterate over device queues
without having to check for duplicates manually.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is necessary to keep the amount of allocated memory manageable
in games that allocate a lot of small heaps or committed resources.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is designed to work with actual device addresses if supported by
the Vulkan implementation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Apparently the docs are lying and RTPSO does not hold references to the
root signatures after all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need it here since local root signatures need to know
the physical layout of the record buffer up front.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will use the same pointer buffer to handle acceleration structures,
so unify this buffer under a new name. Simplifies some of the binding
code since SRV path and UAV path looks more similar now.
Only difference is that UAV path uses BDA -> uint32_t,
and SRV uses BDA -> RTAccelerationStructure.
RT requires BDA, so the fallback descriptor set (storage texel buffer) is never used for RT.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We do not support bundles, but advertizing WriteBufferImmediate
support for bundles is required for Feature Level 12_2.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Fixes a validation error. With VK_QUERY_RESULT_64_BIT we need
to use 8-byte alignment, but ssbo_alignment may be less.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
No longer requires BDA support since it's easier now to work
around buffer alignment issues.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We will need separate descriptor sets to be able to handle typed vs
untyped buffer workarounds.
Also writes multiple descriptors for buffers views to make sure MUTABLE
and SSBO sets are filled (or TEXEL_BUFFER + SSBO for non-mutable).
Applications often get this wrong and use raw buffer in shader where
typed view was written and vice versa.
To mitigate this, just write a typed and untyped view together.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The first range will store the byte offset, the second one will
be the typed buffer range. Typed descriptors should write both.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This begins the refactor toward letting us to use both texel buffer and
SSBO descriptors for typed buffers, which is a better workaround than
force_bindless_texel_buffers.
In this new approach, we store a mask in metadata instead of
set/binding.
When copying a descriptor, we will iterate over the masks and look up
binding directly from device->bindless_state.set_info[].
The mask is represented in terms of info index rather than set index to
avoid needless lookups. Add some new helpers to make this process
easier.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We currently never reset occlusion queries. For some reason,
validation layers do not report this.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Unnecessary because the UAV counter buffer is a host memory
allocation anyway in case of host-only descriptor heaps, so
we will not read from uncached memory.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When reading GPU hang dumps, we can figure out what happened to
descriptor types along the way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Caused crash when using a driver that did not support
mutable_descriptor_type.
Was using the wrong enum bitfields ... Sigh, type safe enums would be nice.
Regression caused during refactor in review most likely.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The creation infos use the format, which potentially contains other
information as well.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
By resetting query pools in advance, we can reduce the number of
stalls between draw calls in passes with occlusion queries, which
is currently causing serious performance issues in some games.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Since we'll be inserting lots of single queries, we want to
avoid having to resize the range array since that is an O(n)
operation at worst.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The common case is that we find an entry, so taking a writer lock should
be the rare case. We need to optimize for the case where the application
hammers the view map with e.g. buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Official AMD drivers do not support VK_EXT_conditional_rendering,
so we'll use indirect draws instead to emulate the feature.
This also handles 64-bit predicates in combination with the
Vulkan extension, which was not possible previously.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The idea is to use indirect draws and dispatches to implement
predication. For predicated indirect draws, we'll use indirect
count.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Potentially avoids some unnecessary host memory access. Use BDA for
the compute shader so that we can ignore alignment restrictions on
some GPU architectures.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Command lists may need to allocate temporary device memory for
certain operations. In order to avoid frequent alloc/free calls,
we'll recycle these scratch buffers until a certain threshold.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Realign VBO strides and offsets if we have to, for sake of
robustness. Violating these rules is against D3D12 spec, but it does not
cause crashes on native drivers. On RDNA we can hit hangs with unaligned
vertex attributes. It appears that native drivers apply some kind of
fixup here to avoid the crash, even if the result is not what we expect.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
BDA cannot map to their hardware, and we observe a large performance
loss in games which use root CBVs. For this reason, fall back to push
descriptors here.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Ensures that queries are always available and initialized
in the correct order on the GPU timeline.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Game renders the map with wrong descriptor type, which means we must
implement everything as texel buffers to make this work.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We have observed a lot of large GPU bubbles when using back-to-back
timeline semaphores to synchronize GPU submissions. Use prebaked
pipeline barrier command buffers instead.
To resolve queue sparse serialization, use two binary semaphore pairs to
resolve this. There is no need to use timeline semaphores in this case.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The fix which enabled waveops detection broke HZD, since we never tested
with that feature enabled.
Keep it disabled until we can figure out what is going on.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
USE_PUSH_DESCRIPTORS may be misleading since it would be set even when
we're not using push descriptors at all due to root descriptors being
passed in via VAs. Instead, make the flag represent whether or not we
use a regular descriptor set for root parameters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We need to know the supported shader model to detect support
for certain features like wave ops correctly.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Previously this would make the user buffer count == 0, which obviously makes apps and assertions not happy.
Fixes a crash in Horizon Zero Dawn when minimized (therefore having a degenerate surface region)
Signed-off-by: Joshua Ashton <joshua@froggi.es>
The packed descriptor index is no longer needed, and causes issues in
case a game sets a root signature, then binds a root descriptor, and
then sets a different root signature which maps the given root parameter
index to a different descriptor since we may now read undefined data
when updating push descriptors.
Fixes#366.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
MSDN states that root signatures across multiple stages in a graphics
pipeline must be identical, but the D3D12 runtime does not validate
this and mixing different root signatures results in undefined
behaviour, so just taking this from the VS should be safe.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We only need to know the pipeline layout for pipeline variant
creation. We are not holding a strong reference to the root
signature anyway, which may be problematic, but this should
not introduce a regression.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Offset buffer state might be the only relevant difference between two
descriptors. We won't need to copy descriptors, but the offsets must be.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise, we may run into issues with an app accessing stale resource
or pointers. NULL descriptors are handled in OMSetRenderTargets.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The struct definitions were identical anyway, and unifying
these will prevent unnecessary code duplication.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The only currently known use case for this requires us to actually
perform the dispatch operation. Executing more than one indirect
dispatch command is not meaningful, however there might be
differences in behaviour in case the indirect count is zero.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This logic has to be the same as in d3d12_command_list_update_descriptor_table_offsets,
since not all active descriptor tables are necessarily used by the root signature.
Fixes an assert in the StarsX IrradianceMap demo (Github issue #347).
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This makes headers a dependency rather than a generator target.
This also means we get proper dependency tracking of them between projects.
Supercedes: #225
Signed-off-by: Joshua Ashton <joshua@froggi.es>