Docs explicitly specify that placed RTV / DSV resource must be properly
initialized before use, either on first use or after aliasing barriers,
so there should be no need to perform initial layout transition.
Fixes spurious GPU hangs in Hitman III where application aliases
an indirect buffer and a DSV. The DSV is cleared after the indirect
buffer is consumed, but the initial_layout_transition is triggered and
HTILE init clobbered the buffer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Also be a bit more uniform with using break/return on fail conditions.
Otherwise, the indirect command will read data from the count buffer
instead, which may lead to bugs or GPU hangs.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Transfer batch can clobber graphics pipeline for e.g. depth->color copies.
Hence, flushing the batches after applying the graphics pipeline set by the
app can cause correctness issues.
To prevent that, do the transfer batch flush first before we apply any
render-related states.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
offset_component_count was set to 0 for cubes, but GRAD path also
uses the variable to check how many components to use for GRAD.
OFFSET is not supported for cubes, so that's likely why it was bugged.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If game uses NOT_ZEROED, it might still rely on buffers being properly
cleared to 0.
Enable this and FORCE_RAW_VA_CBV for Halo Infinite.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The D3D12 docs outline this as an implementation detail explicitly, so
we should do the same thing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Implements the most basic iteration where we don't try to take advantage
of index LUT, hoisting CS patching or attempting to reuse application
indirect buffer directly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Currently we are translating the index type. This will be changed in a
follow up commit where we move over to index LUT.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Separate scratch pools by their intended usage.
Allows e.g. preprocess buffers to be
allocated differently from normal buffers, which is necessary on
implementations that use special memory types to implement preprocess
buffers.
Potentially can also allow for separate pools for
host visible scratch memory down the line.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Scratch buffers are 1 MiB blocks which will end
up being suballocated. This was not intended and a fallout from the
earlier change where VA_SIZE was bumped to 2 MiB for Elden Ring.
Introduce a memory allocation flag INTERNAL_SCRATCH which disables
suballocation and VA map insert.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The runtime is specified to validate certain things.
Also, be more robust against unsupported command signatures, since we
might need to draw/dispatch at an offset. Avoids hard GPU crashes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For certain ExecuteIndirect() uses, we're forced to use this path
since we have no way to update push descriptors indirectly yet.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This function doesn't indicate failure and the possibility of a return
causes -Wmaybe-uninitialized warnings.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Just use VK_NULL_HANDLE. We rely on the disk cache to exist anyways
here. We never serialize the global pipeline cache, so it might just
confuse drivers into disable disk cache if anything.
Also reduce memory bloat.
Also gets rid of very old NV driver workaround where we forced global
pipeline cache.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We rely on zerovram behavior in drivers. Opt-in to this path where we
know implementation does what we want (backed up by testing).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The suballocation test should also try to allocate >= 2 MiB buffers so
we can verify VRAM clear behavior for dedicated allocations as well.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Transfer batches buffers CopyTextureRegion calls for batching.
The flushes needs to happen in a few places:
1. ResourceBarrier: This is where the transition from COPY_DEST to other
might happen, at which point the writes must be visible. This might
also transition away from COPY_SRC which invalidates the
precondition.
2. Copy operations. Copies to the same resource are implicitly ordered.
3. Draws and dispatches. These are not strictly necessary, but we don't
want too much command reordering so flushing here seems good.
4. Close. So that we don't throw commands into the void.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
A parameter preparation stage, a pre-execution barrier stage, then finally
the execution and post-execution barrier stage.
Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
5.3.9.5 in D3D11 spec explicit outlines when we can
cast to R32{U,I,F}. The D3D12 validation layers
seem to have missed this.
Fixes assertions in RADV when running test under debug.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Gets better codegen, since compiler no longer has to assume
that negative indices can be generated, which means full 64-bit sign
extension and addressing math (slow).
Based on experiments, no native driver lets -1 indices work,
so it's safe to make the u32 assumption.
See test_root_descriptor_offset_sign as a justification for this change.
Also, see https://gitlab.freedesktop.org/mesa/mesa/-/issues/6562
for discussion on InBounds.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For now, just keep the NV path as well. It's the exact same extension
basically as the KHR one.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>