CP77 relies on this to work somehow ...
The DXR spec seems to suggest this is allowed, but there is no direct
concept for this in Vulkan.
This seems to work on NVIDIA at least, but we're on very shaky ground
here ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The hash should only depend on the raw byte stream, not the entire DXBC
blob. Useful now since we can declare root signatures either through
DXBC blob or as RDAT object (which is raw).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If game uses NOT_ZEROED, it might still rely on buffers being properly
cleared to 0.
Enable this and FORCE_RAW_VA_CBV for Halo Infinite.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For certain ExecuteIndirect() uses, we're forced to use this path
since we have no way to update push descriptors indirectly yet.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For now, just keep the NV path as well. It's the exact same extension
basically as the KHR one.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With VKD3D_SHADER_CACHE_PATH, we can add automatic serialization of pipeline
blobs to disk, even for games which do not make any use of GetCachedBlob
of ID3D12PipelineLibrary interfaces. Most applications expect drivers to
have some kind of internal caching.
This is implemented as a system where a disk
thread will manage a private ID3D12PipelineLibrary, and new PSOs are
automatically committed to this library. PSO creation will also consult
this internal pipeline library if applications do not provide their own
blob.
The strategy for updating the cache is based on a read-only cache which
is mmaped from disk, with an exclusive write-only portion for new blobs,
which ensures some degree of safety if there are multiple
concurrent processes using the same cache.
The memory layout of the disk cache is optimized to be very efficient
for appending new blobs, just simple fwrites + fflush.
The format is also robust against sliced files, which solves the problem
where applications tear down without destroying the D3D12 device
properly.
This structure is very similar to Fossilize, and in fact the idea is to
move towards actually using the Fossilize format directly later.
This implementation prepares us for this scenario where e.g. Steam could
potentially manage the vkd3d-proton cache.
The main complication in this implementation is that we have to merge
the read-only and write caches.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Supports more advanced file operations than we'd normally need.
Intended to be used by magic disk cache.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The elected lane must be able to perform side effects, so make sure
helper lanes don't participate.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we know the input is wave uniform (progress markers for example),
no need to spam the log.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
AMD path for this commit.
Idea is that we can automatically instrument markers with command list
information we can make some sense of in vkd3d-proton.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Elden Ring does not detect the proper error code and create a new
pipeline library. Instead, create a fresh new library, which works
around the issue.
The game has a pattern of LoadPipeline -> if fail -> CreatePSO ->
StorePipeline. Sometimes, in the same process it will LoadLibrary from
its own cache (could explain some stutters),
so it's very useful to have this either way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For cases where games spam committed allocations and don't use
NOT_ZEROED. We still rely on zerovram behavior for initial backing which
should be enough in most cases.
Strictly speaking however, we are forced to clear the allocations every
time if application does not use the flag correctly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
To be used for upcoming disk driver cache implementation which needs to
live on a thread.
Need a separate wrapper since pthread and SRWLock interface is quite
different. Similar rationale as rwlock_t.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Additionally, add option to ignore cached SPIR-V.
Will be useful for debugging, and also required for VKD3D_SHADER_OVERRIDE.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Avoids saving out pipeline cache blobs which are likely going to be
cached by on-disk cache anyways.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In DEATHLOOP, there is a render pass which renders out a simple image,
which is then directly followed by a compute dispatch, reading that
image. The image is still in RENDER_TARGET state, and color buffers are
*not* flushed properly on at least RADV, manifesting as a very
distracting glitch pattern. This is a game bug, but for the time being,
we have to workaround it, *sigh*.
For a simple workaround, we can detect patterns where we see these
events in succession:
- Color RT is started
- StateBefore == RENDER_TARGET is not observed
- Dispatch()
In particular, when entering the options menu, highly distracting
glitches are observed in the background.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>