Checking for pNext here is too brittle and causes crashes when dynamic
rendering path is added.
Also need to chain in existing pNexts.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This became basically a rewrite in the end, and it got too awkward to
split these commits in any meaningful way.
The goals here were primarily to:
- Support serializing SPIR-V and load SPIR-V.
To do this robustly requires a lot more validation and checks to make
sure end up compiling the same SPIR-V that we load from cache.
This is critical for performance when games have primed their pipeline
libraries and expect that loading a PSO should be fast. Without this,
we will hit vkd3d-shader for every PSO, causing very long load times.
- Implement the required validation for mismatched PSO descriptions.
- Rewrite the binary layout of the pipeline library for flexibility
concerns and performance.
If the pipeline library is mmap-ed from disk - which appears to be
the intended use - we only need to scan through the TOC to fully parse
the library contents.
From a flexibility concern, a blob needs to support inlined data,
but a library can use referential links. We introduce separate
hashmaps which store deduplicated SPIR-V and pipeline cache blobs,
which significantly drop memory and storage requirements.
For future improvements, it should be fairly easy to add information
which lets us avoid SPIR-V or pipeline cache data altogether if
relevant changes to Vulkan/drivers are made.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful when used together with pipeline library logging. Confirms that
we can load pipeline caches as expected.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Additionally, add option to ignore cached SPIR-V.
Will be useful for debugging, and also required for VKD3D_SHADER_OVERRIDE.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Avoids saving out pipeline cache blobs which are likely going to be
cached by on-disk cache anyways.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For pipeline libraries and DXR to some extent later, we'll need an easy
way to compare root signature objects.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We did not test the scenario where we first render with depth enabled,
and then bind a NULL DSV with the same pipeline.
Also fix issues if we bind NULL RTVs with same pipeline bound.
Fixes crash in Guardians of the Galaxy.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is benign and easily gets spammed a TON.
We will warn if an indexed draw is actually made like this.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
UBSAN found a bug here since we store RTV descriptors inline, the
compiler can assume the pointer is 64 byte aligned.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we're going to create different SPIR-V files from what the
VkPipelineCache represents, it's meaningless to load it.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This key represents the variations of SPIR-V which would be generated
from otherwise identical inputs like DXBC blobs and root signatures.
Typically, changing VKD3D_CONFIG flags or enabled extensions will affect
this key. This ensures that we will not attempt to use a cached SPIR-V
file unless we can trust that the SPIR-V interface will match.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Wraps the D3D12 struct with a pipeline library handle.
This is needed if the blob contains references to external data,
which then needs to be resolved.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 expects drivers to implicitly synchronize transfer operations,
since there is no TRANSFER barrier ala UAV barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In DEATHLOOP, there is a render pass which renders out a simple image,
which is then directly followed by a compute dispatch, reading that
image. The image is still in RENDER_TARGET state, and color buffers are
*not* flushed properly on at least RADV, manifesting as a very
distracting glitch pattern. This is a game bug, but for the time being,
we have to workaround it, *sigh*.
For a simple workaround, we can detect patterns where we see these
events in succession:
- Color RT is started
- StateBefore == RENDER_TARGET is not observed
- Dispatch()
In particular, when entering the options menu, highly distracting
glitches are observed in the background.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
HZD issues an aliasing barrier for an alias of a resource that it
still needs.
Because D3D12 requires you to call DiscardResource or a full resource
clear/copy, we can just rely on those to do the actual image layout
transition and treat the aliasing barrier as a pure sync + flush.
This behavior is also observed in a test case where D3D12 drivers
do not seem to discard / fast-clear anything in an aliasing barrier.
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Mesa RADV translates these legacy entrypoints to the 2 variants. Using
them directly will cost a bit less CPU cycles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Mesa RADV translates these legacy entrypoints to the 2 variants. Using
them directly will cost a bit less CPU cycles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Otherwise, we accidentally merge ranges from different pools if
the indices happen to align.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
NVIDIA drivers apparently cannot support EXTENDED_USAGE linear
images for whatever reason, so attempt to create these images without
the creation flag.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For 64-bit image atomics, we should at the very least add 64-bit format
to compatibility list to avoid potential problems.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We can bind texel buffers at scalar alignment now.
The warning is misleading for placed resources, since 64k never aligns
with a float3.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The creation with those extensions may fail in few cases:
* older 32 bit drivers
* missing or inaccessible /dev/nvidia-uvm
There's also a mysterious crash that some Debian users experience with
64bit titles and a correct /dev/nvidia-uvm.
Signed-off-by: Arkadiusz Hiler <ahiler@codeweavers.com>
We previously did not take into account the new relaxed format compatibility
rules that we allow with CastingFullyTypedFormatSupported being supported.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Halo Infinite uses &desc->Width for total_bytes.
We can't set total_bytes early because code after this relies on desc->Width.
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
Guardians of the Galaxy hits this case. Fallback is to disable depth
attachment entirely in a fallback pipeline.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The 16-byte requirement is kind of a lie. The real requirement is tied
to how vectorized load-store instructions are emitted in the shader
itself since I guess it allows compiler to assume something about
alignment of the base pointer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In d3d12, input element alignment needs to be the _minimum_ of 4 and the size of
the type. See the D3D11 spec, section 4.4.6, which behaves similarly:
https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#4.4.6%20Element%20Alignment
This is correctly taken into account when generating, e.g., the
vertex_buffer_stride_align_mask used for validation, but is not taken
into account when D3D12_APPEND_ALIGNED_ELEMENT is used to automatically
place input elements. Currently, vkd3d always assumes the alignment is
4.
This means that, for example, bytes or shorts should be packed tightly
together when D3D12_APPEND_ALIGNED_ELEMENT is used, but are instead
padded to 4 bytes.
Fixing this makes units appear in Age of Empires IV (see vkd3d-proton
issue #880 for examples.)
Signed-off-by: David Gow <david@ingeniumdigital.com>
Wine VKD3D version of my original commit.
Co-authored-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
The Vulkan spec update 1.2.195 restricted these features to a very limited
format subset, and somehow this is supposed to not be an API break?
Anyway, let's follow the new rules.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes a number of issues observed in tessellation shaders,
and potentially geometry shaders, when inputs and/or outputs
are array variables.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
It's common enough that new games break on RDNA2 because of this that we
should enable this by default. This matches DXVK behavior.
SOTTR gets a special weird exception, just like DXVK. The shaders are
broken enough that the proper fix is actually precise, not invariant.
This will be addressed at some later point.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This function fails if the counter overflows.
CP77 hits this case a lot and we should just warn the specific failure
instead of a random error.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Potentially reduces the size of the query map, and makes each entry
versioned so that we no longer have to clear the entire map for multiple
dispatches even if it is sparsely populated.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
If we need to fallback in both VRS and non-VRS scenarios, we need to key
on it. Fixes segfault in DIRT5 when toggling VRS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The ordinals except for D3D12CreateDevice and GetDebugInterface are not
part of the ABI apparently.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we don't find a clear association to an entry point,
we can also find it in the hit group.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
parameter_count == NumParameters for local RS since
hoisting is explicitly ignored for those.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With RTPSOs we might have to create static sampler sets for local root
signatures. In this case we will have to create a compatible pipeline
layout which is equal to global pipeline layout, except for an extra
set.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for test suite since a test can be comprised of several smaller
submissions, and it's easier to debug if we have one trace.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we deduce that fallback heap allocation is impossible, we will accept
this, and defer allocation to CreatePlacedResource() instead where we make a committed resource.
This breaks aliasing, but in practice, this situation will only arise for render
targets, and it's not like we have a choice in the matter here on NV :\
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When allocating dedicated memory, ignore heap_flag requirements we
deduce from memory info. Any memory type is allowed. This is important
on NV when allocating fallback render targets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are situations where we cannot fallback to system memory, so don't
log that we're going to do so.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Don't attempt to enter memory allocation when we can invalidate a heap
allocation up front. Avoids some dumb edge cases later.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Many UE4 games have this broken bloom shader that samples a texture with implicit lod in divergent control flow.
Fixes Bus Simulator 21
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Width + offset must not overflow in SPIR-V. SM 5+ is well-defined here.
It's enough to just clamp the width against 32 - offset in all cases.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to use fallback pipeline system here.
Keep track of active masks for PSO and current render target.
The intersection of those sets are the attachments which should be
active in the render pass.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fix failure in test_create_heap where a TIER_2 host visible heap was
attempted, but failed due to recent DEATHLOOP fixes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Windows returns E_INVALIDARG at least on AMD and Intel.
Psychonaughts 2 seems to use this as a de facto "do not create"
value, and reasonable vram usage depends on the call failing.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Game attempts to create a host visible resource with
ALLOW_RENDER_TARGET flag. We cannot make this work on NVIDIA, but the
game never seems to actually create an RTV, so as a workaround, nop out
the flag, which does make it work after all :3
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For resizable BAR, we don't want to endlessly promote UPLOAD heaps to
BAR since VRAM is precious. The aim is to set a fixed budget where we
can keep allocating until full, at which point we fall back to plain HOST.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With BAR budgets, what will happen is that
- Small allocation is requested
- A new chunk is requested
- try_suballocate_memory will end up calling allocate_memory, which
allocates a fallback memory type
- Subsequent small allocators will always end up allocating a new
fallback memory block, never reusing existing blocks.
- System memory is rapidly exhausted once apps start hitting against
budget.
The fix is to add flags which explicitly do not attempt to fallback
allocate. This makes it possible to handle fallbacks at the appropriate
level in try_suballocate_memory instead.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will need to consider some form of budgeting, so make sure that all
allocation and freeing is done in a central place.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 validation layers complain if you try to map mipmapped 3D volumes
for ... some reason. The error is very explicit, so I assume it's
intentional :)
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to consider that based on host visibility requirements, we need to
select either LINEAR or OPTIMAL image types, and those tiling modes can
have different memory requirements.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to initialize the set mask so that copies happen properly
on default-initialized descriptors. Also, move the current_null_type to
metadata so that it's properly copied on descriptor copy.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are titles clearing the same descriptors constantly.
This leads to unnecessary updates that can become costly.
This commit introduces a new flag to track when D3D12 descriptors are
not null, and skips clearing them if they are already null.
Descriptors are assumed to be null by default.
This fixes a performance regression introduced by
9983a1720f
Signed-off-by: Rodrigo Locatti <rlocatti@nvidia.com>
Emitting render pass clears while we're in the process of starting
a render pass overrides dsv layout tracking info.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
D3D12 validation layer errors out, so unless we can prove that specific
behavior is relied upon, we should be okay to just ignore.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Get information directly from vkd3d_format and allow for subsampled
formats in the future.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Psychonauts 2 uses a SAMPLE_DESC.Count of 0 for some things, which
previously was forcing it down the MSAA alignment placement path.
Found from playing a native D3D12 apitrace back and seeing
the log spam.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Since we added validation here for FH4, this crashes now as vkd3d-compiler passes a NULL shader_interface_info.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Consider we have declarations of CB0 of size 36 and CB1 of size 153.
Previously we'd just return the struct of CB0 when accessing CB1 because it came first as we didn't consider the size.
Psychonauts 2 indexes into CB1 by constant values above 36.
There is no reason a compiler could not eliminate these reads as it is technically out of bounds for the underlying array type.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Adds the "upload_hvv" config flag, which will make D3D12_HEAP_TYPE_UPLOAD attempt to use host-visible VRAM for allocations.
This takes advantage of large or resizable BAR if available.
I see a perf delta of 83-84 -> 92-94 (~12%) when using this in Horizon Zero Dawn.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
FloatControlProperties struct appears to be broken, and it does seem to
work just fine.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
WaveMatch and WaveMultiPrefix are implemented and pass test.
Other features are gated behind feature bits.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
From native testing, we can expose higher shader models if
cap bits features are not supported. E.g. Polaris exposes SM 6.5, even
when 16-bit and barycentrics are not supported.
With latest dxil-spirv updates we can support the required SM 6.4
features.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This was passing through flags of the root signature not the shader interface flags of it.
Need to get the shader interface flags of the root signature instead.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Goal here is to avoid unnecessary image layout transitions when render
passes toggle depth-stencil PSO states. Since we cannot know which
states a resource is in, we have to be conservative, and assume that
shader reads *could* happen.
The best effort we can do is to detect when writes happen to a DSV
resource. In this scenario, we can deduce that the aspect cannot be
read, since DEPTH_WRITE | RESOURCE state is not allowed.
To make the tracking somewhat sane, we only promote to OPTIMAL if an
entire image's worth of subresources for a given aspect is transitioned.
The common case for depth-stencil images is 1 mip / 1 layer anyways.
Some other changes are required here:
- Instead of common_layout for the depth image, we need to consult the
command list, which might promote the layout to optimal.
- We make use of render pass compatibility rules which state that we can
change attachment reference layouts as well as initial/finalLayout.
To make this change, a pipeline will fill in a
vkd3d_render_pass_compat struct.
- A command list has a dsv_plane_optimal_mask which keeps track
of the plane aspects we have promoted to OPTIMAL, and we know cannot
be read by shaders.
The desired optimal mask is (existing optimal | PSO write).
The initial existing optimal is inherited from the command list's
tracker.
- RTV/DSV/views no longer keep track of VkImageLayout. This is
unnecessary since we always deduce image layout based on context.
Overall, this shows a massive gain in HZD benchmark (RADV, 1440p ultimate, ~16% FPS on RX 6800).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Idea is to keep track of scenarios where we know a resource's aspect is
known to be in a OPTIMAL state. Based on this, we can override the image
layout from the common_layout in order to avoid unnecessary full
barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For copies, we can always use the intended aspects, since we have
separate DS layouts now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When clearing a DSV, we must get aliasing guarantees, so we must
transition away from UNDEFINED. This is only possible when using
separate_ds_layouts and for render pass clears we need to use
renderpass2 mechanisms to do this.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The clamp is absolute, not relative to baseMip. Also avoids validation
error and potential crash when LODClamp > numLevels.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Not correct, will need spec additions to handle it properly.
Fixes ground rendering in DIRT 5.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>