This is designed to work with actual device addresses if supported by
the Vulkan implementation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Apparently the docs are lying and RTPSO does not hold references to the
root signatures after all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need it here since local root signatures need to know
the physical layout of the record buffer up front.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will use the same pointer buffer to handle acceleration structures,
so unify this buffer under a new name. Simplifies some of the binding
code since SRV path and UAV path looks more similar now.
Only difference is that UAV path uses BDA -> uint32_t,
and SRV uses BDA -> RTAccelerationStructure.
RT requires BDA, so the fallback descriptor set (storage texel buffer) is never used for RT.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will need separate descriptor sets to be able to handle typed vs
untyped buffer workarounds.
Also writes multiple descriptors for buffers views to make sure MUTABLE
and SSBO sets are filled (or TEXEL_BUFFER + SSBO for non-mutable).
Applications often get this wrong and use raw buffer in shader where
typed view was written and vice versa.
To mitigate this, just write a typed and untyped view together.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The first range will store the byte offset, the second one will
be the typed buffer range. Typed descriptors should write both.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This begins the refactor toward letting us to use both texel buffer and
SSBO descriptors for typed buffers, which is a better workaround than
force_bindless_texel_buffers.
In this new approach, we store a mask in metadata instead of
set/binding.
When copying a descriptor, we will iterate over the masks and look up
binding directly from device->bindless_state.set_info[].
The mask is represented in terms of info index rather than set index to
avoid needless lookups. Add some new helpers to make this process
easier.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Unnecessary because the UAV counter buffer is a host memory
allocation anyway in case of host-only descriptor heaps, so
we will not read from uncached memory.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When reading GPU hang dumps, we can figure out what happened to
descriptor types along the way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Caused crash when using a driver that did not support
mutable_descriptor_type.
Was using the wrong enum bitfields ... Sigh, type safe enums would be nice.
Regression caused during refactor in review most likely.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The common case is that we find an entry, so taking a writer lock should
be the rare case. We need to optimize for the case where the application
hammers the view map with e.g. buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Official AMD drivers do not support VK_EXT_conditional_rendering,
so we'll use indirect draws instead to emulate the feature.
This also handles 64-bit predicates in combination with the
Vulkan extension, which was not possible previously.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The idea is to use indirect draws and dispatches to implement
predication. For predicated indirect draws, we'll use indirect
count.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Potentially avoids some unnecessary host memory access. Use BDA for
the compute shader so that we can ignore alignment restrictions on
some GPU architectures.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Command lists may need to allocate temporary device memory for
certain operations. In order to avoid frequent alloc/free calls,
we'll recycle these scratch buffers until a certain threshold.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Realign VBO strides and offsets if we have to, for sake of
robustness. Violating these rules is against D3D12 spec, but it does not
cause crashes on native drivers. On RDNA we can hit hangs with unaligned
vertex attributes. It appears that native drivers apply some kind of
fixup here to avoid the crash, even if the result is not what we expect.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Ensures that queries are always available and initialized
in the correct order on the GPU timeline.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Game renders the map with wrong descriptor type, which means we must
implement everything as texel buffers to make this work.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We have observed a lot of large GPU bubbles when using back-to-back
timeline semaphores to synchronize GPU submissions. Use prebaked
pipeline barrier command buffers instead.
To resolve queue sparse serialization, use two binary semaphore pairs to
resolve this. There is no need to use timeline semaphores in this case.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
USE_PUSH_DESCRIPTORS may be misleading since it would be set even when
we're not using push descriptors at all due to root descriptors being
passed in via VAs. Instead, make the flag represent whether or not we
use a regular descriptor set for root parameters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The packed descriptor index is no longer needed, and causes issues in
case a game sets a root signature, then binds a root descriptor, and
then sets a different root signature which maps the given root parameter
index to a different descriptor since we may now read undefined data
when updating push descriptors.
Fixes#366.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
MSDN states that root signatures across multiple stages in a graphics
pipeline must be identical, but the D3D12 runtime does not validate
this and mixing different root signatures results in undefined
behaviour, so just taking this from the VS should be safe.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We only need to know the pipeline layout for pipeline variant
creation. We are not holding a strong reference to the root
signature anyway, which may be problematic, but this should
not introduce a regression.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The struct definitions were identical anyway, and unifying
these will prevent unnecessary code duplication.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Version string is used in logging for information purposes, but pipelines blobs and libraries use uint64_t–based commit hash. Using fixed–size integer silences warnings about string length and makes storing build info a little more efficient.
The hash is obtained separately from version string and is shifted to the left by 4 bits if the working tree is dirty.
Signed-off-by: Krzysztof Bogacki <krzysztof.bogacki@leancode.pl>
We will not have offset information for root descriptors, so
we can still only use them with four-byte aligned SSBOs.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Introduces 'extra' bindings to bindless sets which can be used to
bind additional storage buffers to the pipeline, which will occur
before the bindless descriptor array in the descriptor set.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We cannot rely on alignment analysis since games are buggy and screw up
RAW vs structured on occasion.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It is broken by design and won't be needed by a swapchain
implementation which uses user buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This will allow us to use the same bindless descriptor set for
different types of descriptor ranges.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is no longer performance-critical, so in order to simplify changing
the binding model, remove hard-coded descriptor set numbers and instead
look them up based on the requested descriptor properties.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Ignore any indexed draw calls which uses a NULL index buffer.
This is not fully correct, but there is no easy way to emulate D3D12
behavior exactly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We cannot compare resource pointers or view pointers,
since the pointers might have been recycled.
This leads to a scenario where we're not updating descriptors we're
supposed to, and the GPU reads a stale descriptor.
Fixes a GPU hang in Death Stranding (and possibly lots of other weird
crashes as well).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For correctness, we will need to defer any initial resource state
handling to the queue timeline. Here, we will build an UNDEFINED ->
common layout barrier if (and only if):
- The resource is marked to care about initial layout transition.
- We are the first queue thread to observe that initial_transition
member is 1 (atomic exchange).
- The first use of the resource was not marked to be a discard.
E.g., if the first use of the resource is an alias barrier, we must
not emit an early barrier. The only we should do here is to clear the
initial_transition member, and leave it like that.
A command list maintains a list of d3d12_resources which *might* need a
transition. For the first frame a resource is used (or so), it will not
have the flag cleared yet, so multiple command lists might add the
d3d12_resource to its own transition list. This is fine, as the queue
will resolve it.
If multiple queues see the same initial transition, there might be
shenanigans, but the application must ensure there is either a
submission boundary or fence boundary between the uses. Any initial
layout transition will only be submitted after a Wait() is observed, as
submission of the transition command buffer will be in-order with other
submissions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>