This is necessary to keep the amount of allocated memory manageable
in games that allocate a lot of small heaps or committed resources.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is designed to work with actual device addresses if supported by
the Vulkan implementation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Apparently the docs are lying and RTPSO does not hold references to the
root signatures after all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need it here since local root signatures need to know
the physical layout of the record buffer up front.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will use the same pointer buffer to handle acceleration structures,
so unify this buffer under a new name. Simplifies some of the binding
code since SRV path and UAV path looks more similar now.
Only difference is that UAV path uses BDA -> uint32_t,
and SRV uses BDA -> RTAccelerationStructure.
RT requires BDA, so the fallback descriptor set (storage texel buffer) is never used for RT.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will need separate descriptor sets to be able to handle typed vs
untyped buffer workarounds.
Also writes multiple descriptors for buffers views to make sure MUTABLE
and SSBO sets are filled (or TEXEL_BUFFER + SSBO for non-mutable).
Applications often get this wrong and use raw buffer in shader where
typed view was written and vice versa.
To mitigate this, just write a typed and untyped view together.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The first range will store the byte offset, the second one will
be the typed buffer range. Typed descriptors should write both.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This begins the refactor toward letting us to use both texel buffer and
SSBO descriptors for typed buffers, which is a better workaround than
force_bindless_texel_buffers.
In this new approach, we store a mask in metadata instead of
set/binding.
When copying a descriptor, we will iterate over the masks and look up
binding directly from device->bindless_state.set_info[].
The mask is represented in terms of info index rather than set index to
avoid needless lookups. Add some new helpers to make this process
easier.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Unnecessary because the UAV counter buffer is a host memory
allocation anyway in case of host-only descriptor heaps, so
we will not read from uncached memory.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When reading GPU hang dumps, we can figure out what happened to
descriptor types along the way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Caused crash when using a driver that did not support
mutable_descriptor_type.
Was using the wrong enum bitfields ... Sigh, type safe enums would be nice.
Regression caused during refactor in review most likely.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The common case is that we find an entry, so taking a writer lock should
be the rare case. We need to optimize for the case where the application
hammers the view map with e.g. buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Official AMD drivers do not support VK_EXT_conditional_rendering,
so we'll use indirect draws instead to emulate the feature.
This also handles 64-bit predicates in combination with the
Vulkan extension, which was not possible previously.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The idea is to use indirect draws and dispatches to implement
predication. For predicated indirect draws, we'll use indirect
count.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Potentially avoids some unnecessary host memory access. Use BDA for
the compute shader so that we can ignore alignment restrictions on
some GPU architectures.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Command lists may need to allocate temporary device memory for
certain operations. In order to avoid frequent alloc/free calls,
we'll recycle these scratch buffers until a certain threshold.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Realign VBO strides and offsets if we have to, for sake of
robustness. Violating these rules is against D3D12 spec, but it does not
cause crashes on native drivers. On RDNA we can hit hangs with unaligned
vertex attributes. It appears that native drivers apply some kind of
fixup here to avoid the crash, even if the result is not what we expect.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Ensures that queries are always available and initialized
in the correct order on the GPU timeline.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Game renders the map with wrong descriptor type, which means we must
implement everything as texel buffers to make this work.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We have observed a lot of large GPU bubbles when using back-to-back
timeline semaphores to synchronize GPU submissions. Use prebaked
pipeline barrier command buffers instead.
To resolve queue sparse serialization, use two binary semaphore pairs to
resolve this. There is no need to use timeline semaphores in this case.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
USE_PUSH_DESCRIPTORS may be misleading since it would be set even when
we're not using push descriptors at all due to root descriptors being
passed in via VAs. Instead, make the flag represent whether or not we
use a regular descriptor set for root parameters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The packed descriptor index is no longer needed, and causes issues in
case a game sets a root signature, then binds a root descriptor, and
then sets a different root signature which maps the given root parameter
index to a different descriptor since we may now read undefined data
when updating push descriptors.
Fixes#366.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
MSDN states that root signatures across multiple stages in a graphics
pipeline must be identical, but the D3D12 runtime does not validate
this and mixing different root signatures results in undefined
behaviour, so just taking this from the VS should be safe.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We only need to know the pipeline layout for pipeline variant
creation. We are not holding a strong reference to the root
signature anyway, which may be problematic, but this should
not introduce a regression.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The struct definitions were identical anyway, and unifying
these will prevent unnecessary code duplication.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Version string is used in logging for information purposes, but pipelines blobs and libraries use uint64_t–based commit hash. Using fixed–size integer silences warnings about string length and makes storing build info a little more efficient.
The hash is obtained separately from version string and is shifted to the left by 4 bits if the working tree is dirty.
Signed-off-by: Krzysztof Bogacki <krzysztof.bogacki@leancode.pl>
We will not have offset information for root descriptors, so
we can still only use them with four-byte aligned SSBOs.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Introduces 'extra' bindings to bindless sets which can be used to
bind additional storage buffers to the pipeline, which will occur
before the bindless descriptor array in the descriptor set.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We cannot rely on alignment analysis since games are buggy and screw up
RAW vs structured on occasion.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It is broken by design and won't be needed by a swapchain
implementation which uses user buffers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This will allow us to use the same bindless descriptor set for
different types of descriptor ranges.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This is no longer performance-critical, so in order to simplify changing
the binding model, remove hard-coded descriptor set numbers and instead
look them up based on the requested descriptor properties.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Ignore any indexed draw calls which uses a NULL index buffer.
This is not fully correct, but there is no easy way to emulate D3D12
behavior exactly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We cannot compare resource pointers or view pointers,
since the pointers might have been recycled.
This leads to a scenario where we're not updating descriptors we're
supposed to, and the GPU reads a stale descriptor.
Fixes a GPU hang in Death Stranding (and possibly lots of other weird
crashes as well).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For correctness, we will need to defer any initial resource state
handling to the queue timeline. Here, we will build an UNDEFINED ->
common layout barrier if (and only if):
- The resource is marked to care about initial layout transition.
- We are the first queue thread to observe that initial_transition
member is 1 (atomic exchange).
- The first use of the resource was not marked to be a discard.
E.g., if the first use of the resource is an alias barrier, we must
not emit an early barrier. The only we should do here is to clear the
initial_transition member, and leave it like that.
A command list maintains a list of d3d12_resources which *might* need a
transition. For the first frame a resource is used (or so), it will not
have the flag cleared yet, so multiple command lists might add the
d3d12_resource to its own transition list. This is fine, as the queue
will resolve it.
If multiple queues see the same initial transition, there might be
shenanigans, but the application must ensure there is either a
submission boundary or fence boundary between the uses. Any initial
layout transition will only be submitted after a Wait() is observed, as
submission of the transition command buffer will be in-order with other
submissions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is used extensively by Horizon Zero Dawn, and allows us
to skip the compile screen after the initial first run.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Unused now, instead we should implement D3D12 caching primitives
correctly and rely on the Vulkan driver otherwise.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
There is no resource state associated with this, so emit the barrier at
the end of a command buffer based on trivial tracking.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to handle large (> 4G) jumps in timeline value, which is not
supported by all implementations.
There is no good way to handle that, so rewrite and clean up timeline
semaphore handling by separating the timeline into a virtual timeline
(which can rewind and jump around arbitrarely) and a physical timeline
which increments by one each time.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
These memory types might end up being used as fallback memory types,
which is problematic due to their tiny sizes, and unexpected performance
behavior. Generally, when we want to fallback, we should cleanly fall
back to system memory rather than a different device local type.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Manages unique static samplers for now, in order to reduce duplicates.
Can be extended to also manage descriptor pools for static samplers in
the future.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
D3D12 allows much larger pools to be created for heaps that are not
shader-visible, which some games make use of. Fixes crashes on Nvidia.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Stores info about where exactly the descriptor is stored in the
Vulkan descriptor pool, and whether we have to worry about an
additional UAV counter descriptor.
This is meant to replace all the other non-static data stored
inside d3d12_desc.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We're not using these anywhere because we need formats to be correct
for image views. Buffer views are used for root descriptors and null
UAV counters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When we're using extended dynamic state, we will often end up with dummy
pipeline binds, which we should try to avoid if we can.
Also avoids having to rebind dynamic state redundantly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Cleans up dynamic state such that we do not have to keep dynamic state
create infos around.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fall back when there is a mismatch, which can happen if application does
not declare inputs to hull shader (unlikely).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When using EXT_extended_dynamic_state, we will be able to compile a
master pipeline. Only in special cases will we have to fallback.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This can happen in the worst case where we have all bindless sets, and:
- Static samplers
- Packed descriptors (UAV counters on drivers without support for this)
- Root descriptors
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We need these for the upcoming swapchain factory implementation
for standalone D3D12.
They're also probably good to have around in future for the
d3d12 device.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
This commit moves the module handling code which was previously dumped in device.c and the code to retrieve the current executable path to its own file.
This also eliminates HAVE_DECL_PROGRAM_INVOCATION_NAME from config.h
Signed-off-by: Joshua Ashton <joshua@froggi.es>
debug_marker/debug_report are both deprecated in favor of debug_utils and vkd3d was using marker in a
buggy way anways, as debug_marker requires debug_report to work, but it was
only conditionally enabled.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Gets rid of the full barrier on command buffer end.
Instead, do what D3D12 wants, which is to serialize all
ExecuteCommandLists. Simplify the existing timeline sempahore setup for
sparse queues and use it for all submissions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We'll need this to more accurately select the memory type for D3D12
heaps based on which resources are allowed to be placed in it.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Need float16_int8 and subgroup with extended types to implement new SM
6.2 features. For now, skip over SM 6.1 features until someone makes use
of them.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Some sparse resource may have a metadata aspect on some drivers,
which needs to be bound before the image can be used in any way.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This will serve as a fallback if at least one queue family
does not support sparse binding.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This allows us to perform clears inside the render pass even if
the render pass hasn't been started at the time of the clear yet.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We need access to the resource in order to perform render pass
layout transitions, just the view handle isn't enough.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>