Some implementation can support marker, but not explicit coherency.
Buffer markers are often uncached either way, so should be fine ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Spec says that in device lost, driver must return DEVICE_LOST in finite
time, but this does not happen on NV drivers. Use a long timeout instead
in this scenario.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
AMD path for this commit.
Idea is that we can automatically instrument markers with command list
information we can make some sense of in vkd3d-proton.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Rather than having to take writer lock on serialize calls from the
outside, we should just take locks when accessing the internal hashmaps
instead.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If outer code has taken a reader lock, we don't need to lock again.
Also allows a reader lock to go GetSerializedSize + Serialize with one
reader lock.
This will be relevant for magic cache implementation.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It's redundant to add an UNDEFINED transition here for committed
resources. We need it for sparse and placed resources to handle aliasing
rules, but that's it.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
On some implementations, it doesn't matter for performance what we use,
and we can avoid a lot of ugly barriers this way.
Opt-in to use this extensions on GPUs we know handles it well,
otherwise, keep using the tracking paths.
With VK_KHR_dynamic_rendering, this is now feasible to do since we no longer
have to deal with shenanigans related to VkRenderPass layouts and
complicated compatibility rules.
To make this work with the existing framework, just need to consider
that GENERAL can be a common layout alongside DEPTH_STENCIL_OPTIMAL,
which are both common layouts that do not need to be tracked at all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In pipeline libraries, the library holds on to private references of the
libraries so that they can be rapidly loaded on-demand.
This behavior is verifed by API tests.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When we store pipeline state in libraries we have to manage lifetime a
bit differently, which requires internal refcounts of some sort.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Required extension by VK_KHR_fragment_shading_rate and
VK_KHR_separate_depth_stencil_layouts, but we don't care about enabling
any features or use it directly.
Needed to silence validation errors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When performing a decay of a DSV resource, make sure to transition all
subresources, not just the particular aspect being transitioned.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We require separate DS layouts.
Fixes validation errors where we transition from read-only, but our
neighbor aspect might have been optimal.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
These only existed for VRS attachment, which is no longer
necessary with VK_KHR_dynamic_rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When we require inter-stage fixups, we need a solution for partial
validity of the cache. Accept the modules all or nothing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)
Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Elden Ring does not detect the proper error code and create a new
pipeline library. Instead, create a fresh new library, which works
around the issue.
The game has a pattern of LoadPipeline -> if fail -> CreatePSO ->
StorePipeline. Sometimes, in the same process it will LoadLibrary from
its own cache (could explain some stutters),
so it's very useful to have this either way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Just like handling min/maxImageExtent of 0, we can just fall back to
user buffers.
Elden Ring hits this case on application teardown.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Elden Ring in particular spam frees and allocates command pools despite
this being a very bad idea.
Add a simple 8-entry cache which seems to take care of it.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We can mark a descriptor as being SINGLE_DESCRIPTOR, which means we
only need one descriptor copy. This way, we can avoid doing somewhat
expensive work (every nanosecond counts here):
- Bitscan loop
- Read deep into d3d12_device guts (often a cache miss). The memory
index depends on the bitscan, which causes bubble.
When we have a single descriptor, we can just store the binding
information inline and avoid this jank.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Tune memory layout so that we can deduce various information without
making a single pointer dereference:
- d3d12_descriptor_heap*
- heap offset
- Pointer to various side data structures we need to keep around.
Instead of having one big 64 byte data structure with tons of padding,
tune it down to 32 + 8 bytes per descriptor of extra dummy data.
To make all of this work, use a somewhat clever encoding scheme for CPU
VA where lower bits store number of active bits used to encode
descriptor offset. From there, we can mask away bits to recover
d3d12_descriptor_heap. Metadata is stored inline in one big allocation,
and we can just offset from there based on extracted log2i_ceil(descriptor count).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For cases where games spam committed allocations and don't use
NOT_ZEROED. We still rely on zerovram behavior for initial backing which
should be enough in most cases.
Strictly speaking however, we are forced to clear the allocations every
time if application does not use the flag correctly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is a more principled limit since that's the huge page size.
Avoids some allocation spam.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Forgot to offset buffer offset. Fun!
Found when bumping VA allocation limit to 2 MiB instead of 1 MiB.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>