For internal debug shaders, it is helpful to ensure in-order logs when
sorted for later inspection.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The hash should only depend on the raw byte stream, not the entire DXBC
blob. Useful now since we can declare root signatures either through
DXBC blob or as RDAT object (which is raw).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
To keep things simple, outer code is responsible for keeping string
alive. Intended to be used for RTPSO entry point name debugging.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is barely implementable, and relies on implementations to do kinda
what we want.
To make this work in practice, we need to allow two pipelines per state
object. One that is created with LIBRARY and one that can be bound. When
incrementing the PSO, we use the LIBRARY one.
It seems to be allowed to create a new library from an old library.
It is more convenient for us if we're allowed to do this, so do this
until we're forced to do otherwise.
DXR 1.1 requires that shader identifiers remain invariant for child
pipelines if the parent pipeline also have them.
Vulkan has no such guarantee, but we can speculate that it works and
validate that identifiers remain invariant. This seems to work fine on
NVIDIA at least ... It probably makes sense that it works for
implementations where pipeline libraries are compiled at that time.
The basic implementation of AddToStateObject() is to consider
the parent pipeline as a COLLECTION pipeline. This composes well and
avoids a lot of extra implementation cruft.
Also adds validation to ensure that COLLECTION global state matches with
other COLLECTION objects and the parent. We will also inherit global
state like root signatures, pipeline config, shader configs etc when
using AddToStateObject().
The tests pass on NVIDIA at least.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The D3D12 docs outline this as an implementation detail explicitly, so
we should do the same thing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Implements the most basic iteration where we don't try to take advantage
of index LUT, hoisting CS patching or attempting to reuse application
indirect buffer directly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Currently we are translating the index type. This will be changed in a
follow up commit where we move over to index LUT.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Separate scratch pools by their intended usage.
Allows e.g. preprocess buffers to be
allocated differently from normal buffers, which is necessary on
implementations that use special memory types to implement preprocess
buffers.
Potentially can also allow for separate pools for
host visible scratch memory down the line.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Scratch buffers are 1 MiB blocks which will end
up being suballocated. This was not intended and a fallout from the
earlier change where VA_SIZE was bumped to 2 MiB for Elden Ring.
Introduce a memory allocation flag INTERNAL_SCRATCH which disables
suballocation and VA map insert.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The runtime is specified to validate certain things.
Also, be more robust against unsupported command signatures, since we
might need to draw/dispatch at an offset. Avoids hard GPU crashes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Just use VK_NULL_HANDLE. We rely on the disk cache to exist anyways
here. We never serialize the global pipeline cache, so it might just
confuse drivers into disable disk cache if anything.
Also reduce memory bloat.
Also gets rid of very old NV driver workaround where we forced global
pipeline cache.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Transfer batches buffers CopyTextureRegion calls for batching.
The flushes needs to happen in a few places:
1. ResourceBarrier: This is where the transition from COPY_DEST to other
might happen, at which point the writes must be visible. This might
also transition away from COPY_SRC which invalidates the
precondition.
2. Copy operations. Copies to the same resource are implicitly ordered.
3. Draws and dispatches. These are not strictly necessary, but we don't
want too much command reordering so flushing here seems good.
4. Close. So that we don't throw commands into the void.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
A parameter preparation stage, a pre-execution barrier stage, then finally
the execution and post-execution barrier stage.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
For now, just keep the NV path as well. It's the exact same extension
basically as the KHR one.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When the disk cache is used, the cache we give back to applications is a
dummy. Therefore, try to use the disk cache blob if we detect a useless
application blob.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With VKD3D_SHADER_CACHE_PATH, we can add automatic serialization of pipeline
blobs to disk, even for games which do not make any use of GetCachedBlob
of ID3D12PipelineLibrary interfaces. Most applications expect drivers to
have some kind of internal caching.
This is implemented as a system where a disk
thread will manage a private ID3D12PipelineLibrary, and new PSOs are
automatically committed to this library. PSO creation will also consult
this internal pipeline library if applications do not provide their own
blob.
The strategy for updating the cache is based on a read-only cache which
is mmaped from disk, with an exclusive write-only portion for new blobs,
which ensures some degree of safety if there are multiple
concurrent processes using the same cache.
The memory layout of the disk cache is optimized to be very efficient
for appending new blobs, just simple fwrites + fflush.
The format is also robust against sliced files, which solves the problem
where applications tear down without destroying the D3D12 device
properly.
This structure is very similar to Fossilize, and in fact the idea is to
move towards actually using the Fossilize format directly later.
This implementation prepares us for this scenario where e.g. Steam could
potentially manage the vkd3d-proton cache.
The main complication in this implementation is that we have to merge
the read-only and write caches.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For internal pipeline libraries, we want a somewhat different strategy.
- PSOs are keyed by hash instead of user key.
- We want the option to conditionally store SPIR-V and PSO blobs.
For internal caches, there isn't much of a reason to store PSO blobs
since the disk cache is going to be primed anyways.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Makes sure that we drop private root signature device references when
public pipeline state refcount hits 0.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This was off by one, at some point, which could cause a stack buffer overrun which is naughty.
Replace this with just an ARRAY_SIZE on the dynamic_state_list for the array size.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Needed to support SM 6.0 CBufferLoad.
This path is mostly unused since it's opt-in in DXC and horribly broken
...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Primitive restart is only used for strip primitive types, and must be
ignored for lists. Use and require extended_dynamic_state2 for this
purpose.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For EXTENDED_USAGE, we still need to restrict image usage when creating
concrete views.
Use VkImageViewUsageCreateInfo to restrict usage flags to the kind of
view we're creating.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Found some validation errors where rt_count != rtv_active_mask,
and blending used rt_count instead of rtv_active_mask. If shader renders
to a NULL attachment, we must make sure that it's part of the PSO
interface.
Also, use rt_count rather than active mask when beginning render pass.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is basically required for not horrible stutter and performance and
is widely supported.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For this case, we want to block and teardown the debug ring thread.
It's okay to fish for dead messages in the ring, since we know there
won't be more GPU work submitted.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we expect device losts (breadcrumb debug), we need to use DEVICE uncached/coherent,
since we might not be able to flush GPU caches properly.
We also need to remove the idea of being able to copy out the control
block back to host. This is too brittle and we should instead just place
the control block in PCI-e BAR instead. Rethink how we pass messages
from GPU to CPU to make it more robust.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
AMD path for this commit.
Idea is that we can automatically instrument markers with command list
information we can make some sense of in vkd3d-proton.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>