Commit Graph

2661 Commits

Author SHA1 Message Date
Hans-Kristian Arntzen 9a63df07b8 vkd3d: Add punchthrough path for descriptor copies.
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)

Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-04 13:34:18 +01:00
Mike Blumenkrantz 1d76803aff vkd3d: optimize memory access pattern for sampler descriptors
this removes them from the bitscan path

Signed-off-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
2022-03-01 22:50:45 +01:00
Hans-Kristian Arntzen dc622fc715 vkd3d: Recycle command pools in Elden Ring.
Very churny.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 18:40:52 +01:00
Hans-Kristian Arntzen 9817c52d24 vkd3d: Add workaround to ignore mismatch driver/device in PSO library.
Elden Ring does not detect the proper error code and create a new
pipeline library. Instead, create a fresh new library, which works
around the issue.

The game has a pattern of LoadPipeline -> if fail -> CreatePSO ->
StorePipeline. Sometimes, in the same process it will LoadLibrary from
its own cache (could explain some stutters),
so it's very useful to have this either way.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 14:50:57 +01:00
Hans-Kristian Arntzen a8229390f9 vkd3d: Add more pipeline_library_log snippets.
Hook GetCachedBlob and various attempts to use LoadPipeline.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 14:50:57 +01:00
Hans-Kristian Arntzen 12c73ee18a swapchain: More gracefully handle SURFACE_LOST.
Just like handling min/maxImageExtent of 0, we can just fall back to
user buffers.

Elden Ring hits this case on application teardown.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 14:04:06 +01:00
Hans-Kristian Arntzen f39ece9a7c vkd3d: Enable performance workarounds for Elden Ring.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:59:08 +01:00
Hans-Kristian Arntzen c19eaac376 vkd3d: Add VKD3D_CONFIG option for command pool recycling.
Normal behaving apps should not benefit from any of this.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:59:08 +01:00
Hans-Kristian Arntzen 54fbadcc94 vkd3d: Recycle command pools.
Elden Ring in particular spam frees and allocates command pools despite
this being a very bad idea.

Add a simple 8-entry cache which seems to take care of it.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:59:08 +01:00
Hans-Kristian Arntzen 4b07535909 vkd3d: Optimize memory access pattern for single descriptor copies.
We can mark a descriptor as being SINGLE_DESCRIPTOR, which means we
only need one descriptor copy. This way, we can avoid doing somewhat
expensive work (every nanosecond counts here):

- Bitscan loop
- Read deep into d3d12_device guts (often a cache miss). The memory
  index depends on the bitscan, which causes bubble.

When we have a single descriptor, we can just store the binding
information inline and avoid this jank.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:04:43 +01:00
Hans-Kristian Arntzen 84d632f194 vkd3d: Rewrite memory layout for resource descriptors.
Tune memory layout so that we can deduce various information without
making a single pointer dereference:

- d3d12_descriptor_heap*
- heap offset
- Pointer to various side data structures we need to keep around.

Instead of having one big 64 byte data structure with tons of padding,
tune it down to 32 + 8 bytes per descriptor of extra dummy data.

To make all of this work, use a somewhat clever encoding scheme for CPU
VA where lower bits store number of active bits used to encode
descriptor offset. From there, we can mask away bits to recover
d3d12_descriptor_heap. Metadata is stored inline in one big allocation,
and we can just offset from there based on extracted log2i_ceil(descriptor count).

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:04:43 +01:00
Hans-Kristian Arntzen b309913b6d vkd3d: Use unsafe_impl in CopyDescriptorsSimple.
This is an ultra-hot path and seems to show up somehow on profile.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:04:43 +01:00
Hans-Kristian Arntzen c29d005ef4 vkd3d: Don't enable fast descriptor copy path for descriptor QA.
The hooks are in the generic function.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 16:42:00 +01:00
Hans-Kristian Arntzen 8a46c21254 vkd3d: Add VKD3D_CONFIG to skip memory allocator clears.
For cases where games spam committed allocations and don't use
NOT_ZEROED. We still rely on zerovram behavior for initial backing which
should be enough in most cases.

Strictly speaking however, we are forced to clear the allocations every
time if application does not use the flag correctly.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:52:05 +01:00
Hans-Kristian Arntzen 76ca492a39 vkd3d: Add some debug logging for when clear passes happen.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:52:05 +01:00
Hans-Kristian Arntzen 83c4e62660 vkd3d: Bump suballocation limit to 2 MiB.
This is a more principled limit since that's the huge page size.

Avoids some allocation spam.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:14:22 +01:00
Hans-Kristian Arntzen 4bea653504 vkd3d: Fix CopyTiles for suballocated linear resources.
Forgot to offset buffer offset. Fun!
Found when bumping VA allocation limit to 2 MiB instead of 1 MiB.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:14:22 +01:00
Hans-Kristian Arntzen edbf49aad4 vkd3d: Support opt-in to single MUTABLE set.
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 17:08:25 +01:00
Hans-Kristian Arntzen e0af8f2810 vkd3d: Make error message for buffer alignment more direct.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:37:12 +01:00
Hans-Kristian Arntzen b066e72243 swapchain: Add env-var to override swapchain images.
For perf debug mostly.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:36:36 +01:00
Hans-Kristian Arntzen 15704b2419 vkd3d: Optimize descriptor copies for common code paths.
The common path that we really need to optimize for is CBV_SRV_UAV +
Simple + 1 descriptor.

Descriptor benchmark shows an almost 50% reduction in overhead now.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:35:36 +01:00
Hans-Kristian Arntzen c725c29bb6 vkd3d: Inline query for set/binding from set_index.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:35:36 +01:00
Hans-Kristian Arntzen 2f6a91e772 vkd3d: De-virtualize query for descriptor size.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:35:36 +01:00
Hans-Kristian Arntzen 1cc8afcc8e vkd3d: Fix potential crashes when VK_KHR_dynamic_rendering is added.
Checking for pNext here is too brittle and causes crashes when dynamic
rendering path is added.
Also need to chain in existing pNexts.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-17 11:27:25 +01:00
Hans-Kristian Arntzen 5d345f47cc vkd3d: Rewrite the pipeline library implementation.
This became basically a rewrite in the end, and it got too awkward to
split these commits in any meaningful way.

The goals here were primarily to:

- Support serializing SPIR-V and load SPIR-V.
  To do this robustly requires a lot more validation and checks to make
  sure end up compiling the same SPIR-V that we load from cache.
  This is critical for performance when games have primed their pipeline
  libraries and expect that loading a PSO should be fast. Without this,
  we will hit vkd3d-shader for every PSO, causing very long load times.
- Implement the required validation for mismatched PSO descriptions.
- Rewrite the binary layout of the pipeline library for flexibility
  concerns and performance.
  If the pipeline library is mmap-ed from disk - which appears to be
  the intended use - we only need to scan through the TOC to fully parse
  the library contents.
  From a flexibility concern, a blob needs to support inlined data,
  but a library can use referential links. We introduce separate
  hashmaps which store deduplicated SPIR-V and pipeline cache blobs,
  which significantly drop memory and storage requirements.
  For future improvements, it should be fairly easy to add information
  which lets us avoid SPIR-V or pipeline cache data altogether if
  relevant changes to Vulkan/drivers are made.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-17 11:00:03 +01:00
Hans-Kristian Arntzen 33f17cc74d vkd3d: Add VK_EXT_pipeline_creation_feedback.
Useful when used together with pipeline library logging. Confirms that
we can load pipeline caches as expected.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-04 14:31:34 +01:00
Hans-Kristian Arntzen 47337d5e0b vkd3d: Add VKD3D_CONFIG flags for various pipeline library logging.
Additionally, add option to ignore cached SPIR-V.
Will be useful for debugging, and also required for VKD3D_SHADER_OVERRIDE.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-04 14:31:34 +01:00
Hans-Kristian Arntzen f03940ef4b vkd3d: Add global_pipeline_cache option.
Avoids saving out pipeline cache blobs which are likely going to be
cached by on-disk cache anyways.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-04 14:31:34 +01:00
Hans-Kristian Arntzen e5e662ce22 vkd3d: Record root signature compatibility hashes.
For pipeline libraries and DXR to some extent later, we'll need an easy
way to compare root signature objects.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-04 14:31:34 +01:00
Hans-Kristian Arntzen 1d39c25a59 vkd3d: Properly invalidate pipeline when binding NULL DSV.
We did not test the scenario where we first render with depth enabled,
and then bind a NULL DSV with the same pipeline.
Also fix issues if we bind NULL RTVs with same pipeline bound.

Fixes crash in Guardians of the Galaxy.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-04 13:10:16 +01:00
Hans-Kristian Arntzen 5e526d506b vkd3d: Remove warning for setting NULL index buffer.
This is benign and easily gets spammed a TON.
We will warn if an indexed draw is actually made like this.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-03 18:16:36 +01:00
Hans-Kristian Arntzen 81a215d0bf vkd3d: Implement COLOR -> STENCIL copy if stencil export is supported.
Fallback is a bit more involved. Cleans up the FIXME to not report
benign issues.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-03 15:43:41 +01:00
Hans-Kristian Arntzen 29d956c6c4 vkd3d: Fix memory leak of D3D12 device singleton.
Fairly trivial, caught by ASAN.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-02 13:56:36 +01:00
Hans-Kristian Arntzen 49d0eb37e3 vkd3d: Properly align d3d12_command_list allocations.
UBSAN found a bug here since we store RTV descriptors inline, the
compiler can assume the pointer is 64 byte aligned.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-02 13:56:36 +01:00
Philip Rebohle 8f81aaa710 vkd3d: Fix reporting of WriteBufferImmediateSupportFlags.
Oversight from when we added bundles.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-02-01 16:21:43 +01:00
Hans-Kristian Arntzen 833f56154c cache: Store shader interface key in pipeline library as well.
If we're going to create different SPIR-V files from what the
VkPipelineCache represents, it's meaningless to load it.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen 86f8f41490 vkd3d: Compute a global shader interface key for a D3D12 device.
This key represents the variations of SPIR-V which would be generated
from otherwise identical inputs like DXBC blobs and root signatures.

Typically, changing VKD3D_CONFIG flags or enabled extensions will affect
this key. This ensures that we will not attempt to use a cached SPIR-V
file unless we can trust that the SPIR-V interface will match.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen a3f1a0e3cd vkd3d-shader: Add mechanism to get vkd3d-shader implementation revision.
Not immediately useful, might be nuked later in development.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen e90b573896 vkd3d-shader: Use flag for vkd3d_shader_meta bools.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen 8196b85408 vkd3d-shader: Make vkd3d_shader_hash public.
Prepare for meta struct to be serialized to a cache.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen a2c1527acd vkd3d-shader: Reuse hashmap.h hasher for shader hash.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen 6e697a54b6 vkd3d: Add d3d12_cached_pipeline_state.
Wraps the D3D12 struct with a pipeline library handle.
This is needed if the blob contains references to external data,
which then needs to be resolved.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen 41c977d616 cache: Move cache implementation over to read-writer locks.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-25 14:07:07 +01:00
Hans-Kristian Arntzen 1409ebab1f vkd3d: Consider sparse buffers to alias any other buffer.
Technically cannot alias committed buffers, but 🤷 ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-20 15:14:27 +01:00
Hans-Kristian Arntzen 7d0743345a vkd3d: Remove useless buffer barrier tracking.
This copy is to a scratch buffer, which needs no tracking.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-20 15:14:27 +01:00
Philip Rebohle 1af62abfe7 vkd3d: Enable quirk for further UE4 shaders.
Fixes artifacts in The Ascent.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-01-19 16:49:42 +01:00
Hans-Kristian Arntzen 5c492e9e6c vkd3d: Handle overlapped transfer writes.
D3D12 expects drivers to implicitly synchronize transfer operations,
since there is no TRANSFER barrier ala UAV barriers.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-19 14:44:33 +01:00
Hans-Kristian Arntzen 68ce4b4116 vkd3d: MSVC build fix.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-19 14:21:09 +01:00
Hans-Kristian Arntzen 6cba8b9945 vkd3d: Workaround broken barriers in DEATHLOOP.
In DEATHLOOP, there is a render pass which renders out a simple image,
which is then directly followed by a compute dispatch, reading that
image. The image is still in RENDER_TARGET state, and color buffers are
*not* flushed properly on at least RADV, manifesting as a very
distracting glitch pattern. This is a game bug, but for the time being,
we have to workaround it, *sigh*.

For a simple workaround, we can detect patterns where we see these
events in succession:

- Color RT is started
- StateBefore == RENDER_TARGET is not observed
- Dispatch()

In particular, when entering the options menu, highly distracting
glitches are observed in the background.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-12 12:20:03 +01:00
Robin Kertels 35be1329ed vkd3d: Don't do layout transition in aliasing barrier.
HZD issues an aliasing barrier for an alias of a resource that it
still needs.
Because D3D12 requires you to call DiscardResource or a full resource
clear/copy, we can just rely on those to do the actual image layout
transition and treat the aliasing barrier as a pure sync + flush.

This behavior is also observed in a test case where D3D12 drivers
do not seem to discard / fast-clear anything in an aliasing barrier.

Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-01-12 12:16:52 +01:00