Gives a massive boost on NVIDIA for some reason.
RADV defers push constant update, so ALL_STAGES doesn't have
that much of a perf hit.
~20% uplift in RE2, ~5% uplift in CP77 from some quick and dirty testing.
Seems to be heavily content dependent either way.
Also a bug fix, since we would clobber graphics push constants from
compute and vice versa if both graphics and compute used the same root
signature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
As per MSDN, SetName is just a wrapper around SetPrivateData and a specific GUID.
Some apps and tools will use this to retrieve their name back.
So instead, just forward the name to Vulkan in the SetPrivateData call.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Instead, infer the required stages from the D3D12 shader visibility
field from all root parameters that we map to push constants.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
There are pragmatic reasons for not following spec 100% here.
The only known case where UpdateAfterBind robustness is not exposed
seems to be somewhat bogus, and we cannot run D3D12 correctly without
robustness either way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Can only support a subset in Vulkan without extra heroics. The DXR API
lets you query things that you technically should know apriori in the
application. We might need to allocate some side-channel buffers on
demand, but let's defer that until actually needed ... :\
DXR is also very awkward in that we have a query which is resolved in
UNORDERED_ACCESS state instead of COPY_DEST state, so we'll have to
ping-pong through some barriers redundantly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When building acceleration structures, we need to have an
VkAccelerationStructureKHR object, but the D3D12 API just uses a plain
VA = ID3D12Resource::GetGPUVA() + offset.
For this to work, we need to resolve the VA back to VkBuffer + offset.
The only VkBuffer we can lookup is the original backing memory
allocation in the VA map, and that allocation itself must own a view
map, since we cannot tie the VA to any specific ID3D12Resource.
Since creating an RTAS is not the common path, we allocate the view map
on-demand with CAS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
RTAS must stay in this resource state forever. The only way to
synchronize them is UAV barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Allows local root signatures to work correctly and is also a good
optimization since we no longer need to dereference memory (potentially
cold cache lines) to figure out heap offset in command buffer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we're signalling and waiting on same physical queue (always true for
current SINGLE_QUEUE define), we can rely on submission boundary
synchronization which doesn't require any extra submissions to resolve.
Avoids awkward GPU driver bubbles with back to back signal -> wait pairs
with timeline.
Observed 2% GPU uplift on RE2 on AMD.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise, when suballocating memory, GetHeapProperties may
not return the exact same set of flags if we ignore flags
when looking up suitable chunks.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The difference between a range's offset and the aligned
offset may be greater than the size of that range.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is still useful as a low-level memory allocation function when
we don't want to bother with buffer offsets or D3D12 validation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Our clear code assume that this is NULL for allocations owned
by a chunk, so we should actually do it that way. Fixes some
issues where we do not wait for clears to complete if a chunk
gets destroyed.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Has one bit set for each vkd3d_queue_family that points to a
unique queue. This can be used to iterate over device queues
without having to check for duplicates manually.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is necessary to keep the amount of allocated memory manageable
in games that allocate a lot of small heaps or committed resources.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is designed to work with actual device addresses if supported by
the Vulkan implementation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Apparently the docs are lying and RTPSO does not hold references to the
root signatures after all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need it here since local root signatures need to know
the physical layout of the record buffer up front.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will use the same pointer buffer to handle acceleration structures,
so unify this buffer under a new name. Simplifies some of the binding
code since SRV path and UAV path looks more similar now.
Only difference is that UAV path uses BDA -> uint32_t,
and SRV uses BDA -> RTAccelerationStructure.
RT requires BDA, so the fallback descriptor set (storage texel buffer) is never used for RT.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We do not support bundles, but advertizing WriteBufferImmediate
support for bundles is required for Feature Level 12_2.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Fixes a validation error. With VK_QUERY_RESULT_64_BIT we need
to use 8-byte alignment, but ssbo_alignment may be less.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
No longer requires BDA support since it's easier now to work
around buffer alignment issues.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We will need separate descriptor sets to be able to handle typed vs
untyped buffer workarounds.
Also writes multiple descriptors for buffers views to make sure MUTABLE
and SSBO sets are filled (or TEXEL_BUFFER + SSBO for non-mutable).
Applications often get this wrong and use raw buffer in shader where
typed view was written and vice versa.
To mitigate this, just write a typed and untyped view together.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The first range will store the byte offset, the second one will
be the typed buffer range. Typed descriptors should write both.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This begins the refactor toward letting us to use both texel buffer and
SSBO descriptors for typed buffers, which is a better workaround than
force_bindless_texel_buffers.
In this new approach, we store a mask in metadata instead of
set/binding.
When copying a descriptor, we will iterate over the masks and look up
binding directly from device->bindless_state.set_info[].
The mask is represented in terms of info index rather than set index to
avoid needless lookups. Add some new helpers to make this process
easier.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We currently never reset occlusion queries. For some reason,
validation layers do not report this.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Unnecessary because the UAV counter buffer is a host memory
allocation anyway in case of host-only descriptor heaps, so
we will not read from uncached memory.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When reading GPU hang dumps, we can figure out what happened to
descriptor types along the way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Caused crash when using a driver that did not support
mutable_descriptor_type.
Was using the wrong enum bitfields ... Sigh, type safe enums would be nice.
Regression caused during refactor in review most likely.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The creation infos use the format, which potentially contains other
information as well.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
By resetting query pools in advance, we can reduce the number of
stalls between draw calls in passes with occlusion queries, which
is currently causing serious performance issues in some games.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Since we'll be inserting lots of single queries, we want to
avoid having to resize the range array since that is an O(n)
operation at worst.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The common case is that we find an entry, so taking a writer lock should
be the rare case. We need to optimize for the case where the application
hammers the view map with e.g. buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Official AMD drivers do not support VK_EXT_conditional_rendering,
so we'll use indirect draws instead to emulate the feature.
This also handles 64-bit predicates in combination with the
Vulkan extension, which was not possible previously.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The idea is to use indirect draws and dispatches to implement
predication. For predicated indirect draws, we'll use indirect
count.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Potentially avoids some unnecessary host memory access. Use BDA for
the compute shader so that we can ignore alignment restrictions on
some GPU architectures.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Command lists may need to allocate temporary device memory for
certain operations. In order to avoid frequent alloc/free calls,
we'll recycle these scratch buffers until a certain threshold.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Realign VBO strides and offsets if we have to, for sake of
robustness. Violating these rules is against D3D12 spec, but it does not
cause crashes on native drivers. On RDNA we can hit hangs with unaligned
vertex attributes. It appears that native drivers apply some kind of
fixup here to avoid the crash, even if the result is not what we expect.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
BDA cannot map to their hardware, and we observe a large performance
loss in games which use root CBVs. For this reason, fall back to push
descriptors here.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Ensures that queries are always available and initialized
in the correct order on the GPU timeline.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Game renders the map with wrong descriptor type, which means we must
implement everything as texel buffers to make this work.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We have observed a lot of large GPU bubbles when using back-to-back
timeline semaphores to synchronize GPU submissions. Use prebaked
pipeline barrier command buffers instead.
To resolve queue sparse serialization, use two binary semaphore pairs to
resolve this. There is no need to use timeline semaphores in this case.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The fix which enabled waveops detection broke HZD, since we never tested
with that feature enabled.
Keep it disabled until we can figure out what is going on.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
USE_PUSH_DESCRIPTORS may be misleading since it would be set even when
we're not using push descriptors at all due to root descriptors being
passed in via VAs. Instead, make the flag represent whether or not we
use a regular descriptor set for root parameters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We need to know the supported shader model to detect support
for certain features like wave ops correctly.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Previously this would make the user buffer count == 0, which obviously makes apps and assertions not happy.
Fixes a crash in Horizon Zero Dawn when minimized (therefore having a degenerate surface region)
Signed-off-by: Joshua Ashton <joshua@froggi.es>
The packed descriptor index is no longer needed, and causes issues in
case a game sets a root signature, then binds a root descriptor, and
then sets a different root signature which maps the given root parameter
index to a different descriptor since we may now read undefined data
when updating push descriptors.
Fixes#366.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
MSDN states that root signatures across multiple stages in a graphics
pipeline must be identical, but the D3D12 runtime does not validate
this and mixing different root signatures results in undefined
behaviour, so just taking this from the VS should be safe.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We only need to know the pipeline layout for pipeline variant
creation. We are not holding a strong reference to the root
signature anyway, which may be problematic, but this should
not introduce a regression.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Offset buffer state might be the only relevant difference between two
descriptors. We won't need to copy descriptors, but the offsets must be.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise, we may run into issues with an app accessing stale resource
or pointers. NULL descriptors are handled in OMSetRenderTargets.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The struct definitions were identical anyway, and unifying
these will prevent unnecessary code duplication.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The only currently known use case for this requires us to actually
perform the dispatch operation. Executing more than one indirect
dispatch command is not meaningful, however there might be
differences in behaviour in case the indirect count is zero.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This logic has to be the same as in d3d12_command_list_update_descriptor_table_offsets,
since not all active descriptor tables are necessarily used by the root signature.
Fixes an assert in the StarsX IrradianceMap demo (Github issue #347).
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This makes headers a dependency rather than a generator target.
This also means we get proper dependency tracking of them between projects.
Supercedes: #225
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Rename so objects we build so we don't conflict with vkd3d and don't
accidentially attempt to be built against Wine natively (it won't work).
Not quite ready for a 2.0 release yet, but bump the version to reflect
the intent. This creates a new timeline, completely separate from vkd3d.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Version string is used in logging for information purposes, but pipelines blobs and libraries use uint64_t–based commit hash. Using fixed–size integer silences warnings about string length and makes storing build info a little more efficient.
The hash is obtained separately from version string and is shifted to the left by 4 bits if the working tree is dirty.
Signed-off-by: Krzysztof Bogacki <krzysztof.bogacki@leancode.pl>
We will not have offset information for root descriptors, so
we can still only use them with four-byte aligned SSBOs.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Introduces 'extra' bindings to bindless sets which can be used to
bind additional storage buffers to the pipeline, which will occur
before the bindless descriptor array in the descriptor set.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We cannot rely on alignment analysis since games are buggy and screw up
RAW vs structured on occasion.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If the image itself is sRGB or some other format that does not support
STORAGE, we need this flag.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This can happen on Windows when windows are minimized.
Might not happen in winevulkan, but Vulkan spec outlines this Win32 case
explicitly and it happens on native Windows.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It is considered a "success", in that fences must be signalled, so make
sure we wait and reset it so we don't risk calling vkAcquireNextImageKHR
later with an already signalled fence.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Only way to implement a D3D12 swapchain.
For now, disable compute paths, we'll introduce it properly after refactor.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Relevant for swapchain since a swapchain resource can be presented right
away without ever having been touched by an API call.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It is broken by design and won't be needed by a swapchain
implementation which uses user buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Buffer views do not necessarily cover the entire resource, so we
should not spawn more workgroups than necessary to clear the view.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This will allow us to use the same bindless descriptor set for
different types of descriptor ranges.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>