Just use VK_NULL_HANDLE. We rely on the disk cache to exist anyways
here. We never serialize the global pipeline cache, so it might just
confuse drivers into disable disk cache if anything.
Also reduce memory bloat.
Also gets rid of very old NV driver workaround where we forced global
pipeline cache.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When the disk cache is used, the cache we give back to applications is a
dummy. Therefore, try to use the disk cache blob if we detect a useless
application blob.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Makes sure that we drop private root signature device references when
public pipeline state refcount hits 0.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This was off by one, at some point, which could cause a stack buffer overrun which is naughty.
Replace this with just an ARRAY_SIZE on the dynamic_state_list for the array size.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Primitive restart is only used for strip primitive types, and must be
ignored for lists. Use and require extended_dynamic_state2 for this
purpose.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are strict limits on number of descriptors which can be used,
and we have to use MUTABLE + single set to make this work.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
{depth,stencil}AttachmentFormat and p{Depth,Stencil}Attachment are only
allowed if the format contains that aspect. Check this explicitly.
Fixes some validation errors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We need to check RTVFormats and IO signature.
If both RTVFormat uses non-null format and IO signature has an active
entry, we must fail compilation.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Found some validation errors where rt_count != rtv_active_mask,
and blending used rt_count instead of rtv_active_mask. If shader renders
to a NULL attachment, we must make sure that it's part of the PSO
interface.
Also, use rt_count rather than active mask when beginning render pass.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is basically required for not horrible stutter and performance and
is widely supported.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In pipeline libraries, the library holds on to private references of the
libraries so that they can be rapidly loaded on-demand.
This behavior is verifed by API tests.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When we store pipeline state in libraries we have to manage lifetime a
bit differently, which requires internal refcounts of some sort.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
These only existed for VRS attachment, which is no longer
necessary with VK_KHR_dynamic_rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When we require inter-stage fixups, we need a solution for partial
validity of the cache. Accept the modules all or nothing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)
Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Checking for pNext here is too brittle and causes crashes when dynamic
rendering path is added.
Also need to chain in existing pNexts.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This became basically a rewrite in the end, and it got too awkward to
split these commits in any meaningful way.
The goals here were primarily to:
- Support serializing SPIR-V and load SPIR-V.
To do this robustly requires a lot more validation and checks to make
sure end up compiling the same SPIR-V that we load from cache.
This is critical for performance when games have primed their pipeline
libraries and expect that loading a PSO should be fast. Without this,
we will hit vkd3d-shader for every PSO, causing very long load times.
- Implement the required validation for mismatched PSO descriptions.
- Rewrite the binary layout of the pipeline library for flexibility
concerns and performance.
If the pipeline library is mmap-ed from disk - which appears to be
the intended use - we only need to scan through the TOC to fully parse
the library contents.
From a flexibility concern, a blob needs to support inlined data,
but a library can use referential links. We introduce separate
hashmaps which store deduplicated SPIR-V and pipeline cache blobs,
which significantly drop memory and storage requirements.
For future improvements, it should be fairly easy to add information
which lets us avoid SPIR-V or pipeline cache data altogether if
relevant changes to Vulkan/drivers are made.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful when used together with pipeline library logging. Confirms that
we can load pipeline caches as expected.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For pipeline libraries and DXR to some extent later, we'll need an easy
way to compare root signature objects.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Wraps the D3D12 struct with a pipeline library handle.
This is needed if the blob contains references to external data,
which then needs to be resolved.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Guardians of the Galaxy hits this case. Fallback is to disable depth
attachment entirely in a fallback pipeline.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In d3d12, input element alignment needs to be the _minimum_ of 4 and the size of
the type. See the D3D11 spec, section 4.4.6, which behaves similarly:
https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#4.4.6%20Element%20Alignment
This is correctly taken into account when generating, e.g., the
vertex_buffer_stride_align_mask used for validation, but is not taken
into account when D3D12_APPEND_ALIGNED_ELEMENT is used to automatically
place input elements. Currently, vkd3d always assumes the alignment is
4.
This means that, for example, bytes or shorts should be packed tightly
together when D3D12_APPEND_ALIGNED_ELEMENT is used, but are instead
padded to 4 bytes.
Fixing this makes units appear in Age of Empires IV (see vkd3d-proton
issue #880 for examples.)
Signed-off-by: David Gow <david@ingeniumdigital.com>
Wine VKD3D version of my original commit.
Co-authored-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
If we need to fallback in both VRS and non-VRS scenarios, we need to key
on it. Fixes segfault in DIRT5 when toggling VRS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
parameter_count == NumParameters for local RS since
hoisting is explicitly ignored for those.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With RTPSOs we might have to create static sampler sets for local root
signatures. In this case we will have to create a compatible pipeline
layout which is equal to global pipeline layout, except for an extra
set.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>