Tests various scenarios where we need to handle DSV layouts:
- Clears
- Discards
- Draw
- Transitions
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When performing a decay of a DSV resource, make sure to transition all
subresources, not just the particular aspect being transitioned.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We require separate DS layouts.
Fixes validation errors where we transition from read-only, but our
neighbor aspect might have been optimal.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
These only existed for VRS attachment, which is no longer
necessary with VK_KHR_dynamic_rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When we require inter-stage fixups, we need a solution for partial
validity of the cache. Accept the modules all or nothing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)
Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Elden Ring does not detect the proper error code and create a new
pipeline library. Instead, create a fresh new library, which works
around the issue.
The game has a pattern of LoadPipeline -> if fail -> CreatePSO ->
StorePipeline. Sometimes, in the same process it will LoadLibrary from
its own cache (could explain some stutters),
so it's very useful to have this either way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Just like handling min/maxImageExtent of 0, we can just fall back to
user buffers.
Elden Ring hits this case on application teardown.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Elden Ring in particular spam frees and allocates command pools despite
this being a very bad idea.
Add a simple 8-entry cache which seems to take care of it.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We can mark a descriptor as being SINGLE_DESCRIPTOR, which means we
only need one descriptor copy. This way, we can avoid doing somewhat
expensive work (every nanosecond counts here):
- Bitscan loop
- Read deep into d3d12_device guts (often a cache miss). The memory
index depends on the bitscan, which causes bubble.
When we have a single descriptor, we can just store the binding
information inline and avoid this jank.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Tune memory layout so that we can deduce various information without
making a single pointer dereference:
- d3d12_descriptor_heap*
- heap offset
- Pointer to various side data structures we need to keep around.
Instead of having one big 64 byte data structure with tons of padding,
tune it down to 32 + 8 bytes per descriptor of extra dummy data.
To make all of this work, use a somewhat clever encoding scheme for CPU
VA where lower bits store number of active bits used to encode
descriptor offset. From there, we can mask away bits to recover
d3d12_descriptor_heap. Metadata is stored inline in one big allocation,
and we can just offset from there based on extracted log2i_ceil(descriptor count).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For cases where games spam committed allocations and don't use
NOT_ZEROED. We still rely on zerovram behavior for initial backing which
should be enough in most cases.
Strictly speaking however, we are forced to clear the allocations every
time if application does not use the flag correctly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is a more principled limit since that's the huge page size.
Avoids some allocation spam.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Forgot to offset buffer offset. Fun!
Found when bumping VA allocation limit to 2 MiB instead of 1 MiB.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The common path that we really need to optimize for is CBV_SRV_UAV +
Simple + 1 descriptor.
Descriptor benchmark shows an almost 50% reduction in overhead now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>