We have no way of expressing size / alignment requirements to
applications since the API query does not provide us with heap
information. Reuse the fallback path for promoting placed to committed.
Guardians of the Galaxy hits a case where it tries to place 3x
host-visible 3D images in one heap, and they end up overlapping in
memory due to a 16x16x80 3D texture taking up far less space in optimal
tiling compared to linear tiling on AMD.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Docs explicitly specify that placed RTV / DSV resource must be properly
initialized before use, either on first use or after aliasing barriers,
so there should be no need to perform initial layout transition.
Fixes spurious GPU hangs in Hitman III where application aliases
an indirect buffer and a DSV. The DSV is cleared after the indirect
buffer is consumed, but the initial_layout_transition is triggered and
HTILE init clobbered the buffer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If game uses NOT_ZEROED, it might still rely on buffers being properly
cleared to 0.
Enable this and FORCE_RAW_VA_CBV for Halo Infinite.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
5.3.9.5 in D3D11 spec explicit outlines when we can
cast to R32{U,I,F}. The D3D12 validation layers
seem to have missed this.
Fixes assertions in RADV when running test under debug.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For EXTENDED_USAGE, we still need to restrict image usage when creating
concrete views.
Use VkImageViewUsageCreateInfo to restrict usage flags to the kind of
view we're creating.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
On some implementations, it doesn't matter for performance what we use,
and we can avoid a lot of ugly barriers this way.
Opt-in to use this extensions on GPUs we know handles it well,
otherwise, keep using the tracking paths.
With VK_KHR_dynamic_rendering, this is now feasible to do since we no longer
have to deal with shenanigans related to VkRenderPass layouts and
complicated compatibility rules.
To make this work with the existing framework, just need to consider
that GENERAL can be a common layout alongside DEPTH_STENCIL_OPTIMAL,
which are both common layouts that do not need to be tracked at all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)
Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We can mark a descriptor as being SINGLE_DESCRIPTOR, which means we
only need one descriptor copy. This way, we can avoid doing somewhat
expensive work (every nanosecond counts here):
- Bitscan loop
- Read deep into d3d12_device guts (often a cache miss). The memory
index depends on the bitscan, which causes bubble.
When we have a single descriptor, we can just store the binding
information inline and avoid this jank.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Tune memory layout so that we can deduce various information without
making a single pointer dereference:
- d3d12_descriptor_heap*
- heap offset
- Pointer to various side data structures we need to keep around.
Instead of having one big 64 byte data structure with tons of padding,
tune it down to 32 + 8 bytes per descriptor of extra dummy data.
To make all of this work, use a somewhat clever encoding scheme for CPU
VA where lower bits store number of active bits used to encode
descriptor offset. From there, we can mask away bits to recover
d3d12_descriptor_heap. Metadata is stored inline in one big allocation,
and we can just offset from there based on extracted log2i_ceil(descriptor count).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The common path that we really need to optimize for is CBV_SRV_UAV +
Simple + 1 descriptor.
Descriptor benchmark shows an almost 50% reduction in overhead now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Mesa RADV translates these legacy entrypoints to the 2 variants. Using
them directly will cost a bit less CPU cycles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
NVIDIA drivers apparently cannot support EXTENDED_USAGE linear
images for whatever reason, so attempt to create these images without
the creation flag.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For 64-bit image atomics, we should at the very least add 64-bit format
to compatibility list to avoid potential problems.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We can bind texel buffers at scalar alignment now.
The warning is misleading for placed resources, since 64k never aligns
with a float3.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We previously did not take into account the new relaxed format compatibility
rules that we allow with CastingFullyTypedFormatSupported being supported.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
If we deduce that fallback heap allocation is impossible, we will accept
this, and defer allocation to CreatePlacedResource() instead where we make a committed resource.
This breaks aliasing, but in practice, this situation will only arise for render
targets, and it's not like we have a choice in the matter here on NV :\
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When allocating dedicated memory, ignore heap_flag requirements we
deduce from memory info. Any memory type is allowed. This is important
on NV when allocating fallback render targets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fix failure in test_create_heap where a TIER_2 host visible heap was
attempted, but failed due to recent DEATHLOOP fixes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Windows returns E_INVALIDARG at least on AMD and Intel.
Psychonaughts 2 seems to use this as a de facto "do not create"
value, and reasonable vram usage depends on the call failing.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Game attempts to create a host visible resource with
ALLOW_RENDER_TARGET flag. We cannot make this work on NVIDIA, but the
game never seems to actually create an RTV, so as a workaround, nop out
the flag, which does make it work after all :3
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>