Rather than having to take writer lock on serialize calls from the
outside, we should just take locks when accessing the internal hashmaps
instead.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
On some implementations, it doesn't matter for performance what we use,
and we can avoid a lot of ugly barriers this way.
Opt-in to use this extensions on GPUs we know handles it well,
otherwise, keep using the tracking paths.
With VK_KHR_dynamic_rendering, this is now feasible to do since we no longer
have to deal with shenanigans related to VkRenderPass layouts and
complicated compatibility rules.
To make this work with the existing framework, just need to consider
that GENERAL can be a common layout alongside DEPTH_STENCIL_OPTIMAL,
which are both common layouts that do not need to be tracked at all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In pipeline libraries, the library holds on to private references of the
libraries so that they can be rapidly loaded on-demand.
This behavior is verifed by API tests.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When we store pipeline state in libraries we have to manage lifetime a
bit differently, which requires internal refcounts of some sort.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Required extension by VK_KHR_fragment_shading_rate and
VK_KHR_separate_depth_stencil_layouts, but we don't care about enabling
any features or use it directly.
Needed to silence validation errors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
These only existed for VRS attachment, which is no longer
necessary with VK_KHR_dynamic_rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)
Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Elden Ring in particular spam frees and allocates command pools despite
this being a very bad idea.
Add a simple 8-entry cache which seems to take care of it.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We can mark a descriptor as being SINGLE_DESCRIPTOR, which means we
only need one descriptor copy. This way, we can avoid doing somewhat
expensive work (every nanosecond counts here):
- Bitscan loop
- Read deep into d3d12_device guts (often a cache miss). The memory
index depends on the bitscan, which causes bubble.
When we have a single descriptor, we can just store the binding
information inline and avoid this jank.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Tune memory layout so that we can deduce various information without
making a single pointer dereference:
- d3d12_descriptor_heap*
- heap offset
- Pointer to various side data structures we need to keep around.
Instead of having one big 64 byte data structure with tons of padding,
tune it down to 32 + 8 bytes per descriptor of extra dummy data.
To make all of this work, use a somewhat clever encoding scheme for CPU
VA where lower bits store number of active bits used to encode
descriptor offset. From there, we can mask away bits to recover
d3d12_descriptor_heap. Metadata is stored inline in one big allocation,
and we can just offset from there based on extracted log2i_ceil(descriptor count).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is a more principled limit since that's the huge page size.
Avoids some allocation spam.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The common path that we really need to optimize for is CBV_SRV_UAV +
Simple + 1 descriptor.
Descriptor benchmark shows an almost 50% reduction in overhead now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This became basically a rewrite in the end, and it got too awkward to
split these commits in any meaningful way.
The goals here were primarily to:
- Support serializing SPIR-V and load SPIR-V.
To do this robustly requires a lot more validation and checks to make
sure end up compiling the same SPIR-V that we load from cache.
This is critical for performance when games have primed their pipeline
libraries and expect that loading a PSO should be fast. Without this,
we will hit vkd3d-shader for every PSO, causing very long load times.
- Implement the required validation for mismatched PSO descriptions.
- Rewrite the binary layout of the pipeline library for flexibility
concerns and performance.
If the pipeline library is mmap-ed from disk - which appears to be
the intended use - we only need to scan through the TOC to fully parse
the library contents.
From a flexibility concern, a blob needs to support inlined data,
but a library can use referential links. We introduce separate
hashmaps which store deduplicated SPIR-V and pipeline cache blobs,
which significantly drop memory and storage requirements.
For future improvements, it should be fairly easy to add information
which lets us avoid SPIR-V or pipeline cache data altogether if
relevant changes to Vulkan/drivers are made.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful when used together with pipeline library logging. Confirms that
we can load pipeline caches as expected.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For pipeline libraries and DXR to some extent later, we'll need an easy
way to compare root signature objects.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We did not test the scenario where we first render with depth enabled,
and then bind a NULL DSV with the same pipeline.
Also fix issues if we bind NULL RTVs with same pipeline bound.
Fixes crash in Guardians of the Galaxy.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This key represents the variations of SPIR-V which would be generated
from otherwise identical inputs like DXBC blobs and root signatures.
Typically, changing VKD3D_CONFIG flags or enabled extensions will affect
this key. This ensures that we will not attempt to use a cached SPIR-V
file unless we can trust that the SPIR-V interface will match.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Wraps the D3D12 struct with a pipeline library handle.
This is needed if the blob contains references to external data,
which then needs to be resolved.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 expects drivers to implicitly synchronize transfer operations,
since there is no TRANSFER barrier ala UAV barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In DEATHLOOP, there is a render pass which renders out a simple image,
which is then directly followed by a compute dispatch, reading that
image. The image is still in RENDER_TARGET state, and color buffers are
*not* flushed properly on at least RADV, manifesting as a very
distracting glitch pattern. This is a game bug, but for the time being,
we have to workaround it, *sigh*.
For a simple workaround, we can detect patterns where we see these
events in succession:
- Color RT is started
- StateBefore == RENDER_TARGET is not observed
- Dispatch()
In particular, when entering the options menu, highly distracting
glitches are observed in the background.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>