Command lists may need to allocate temporary device memory for
certain operations. In order to avoid frequent alloc/free calls,
we'll recycle these scratch buffers until a certain threshold.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Realign VBO strides and offsets if we have to, for sake of
robustness. Violating these rules is against D3D12 spec, but it does not
cause crashes on native drivers. On RDNA we can hit hangs with unaligned
vertex attributes. It appears that native drivers apply some kind of
fixup here to avoid the crash, even if the result is not what we expect.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
BDA cannot map to their hardware, and we observe a large performance
loss in games which use root CBVs. For this reason, fall back to push
descriptors here.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Ensures that queries are always available and initialized
in the correct order on the GPU timeline.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
- fail/success memory orders exist for a reason, we can't
e.g. do release on fail since it's a read-only operation
- silence some warnings about pointer->integer casts
- fix linker errors on mingw by marking functions as static
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Game renders the map with wrong descriptor type, which means we must
implement everything as texel buffers to make this work.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We have observed a lot of large GPU bubbles when using back-to-back
timeline semaphores to synchronize GPU submissions. Use prebaked
pipeline barrier command buffers instead.
To resolve queue sparse serialization, use two binary semaphore pairs to
resolve this. There is no need to use timeline semaphores in this case.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is undefined behaviour in SPIR-V, but well-defined in
DXBC, so we should explicitly 'and' the shift amount with 31.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Can just use uvec2. Also improves performance on ACO since ACO cannot
promote uint64_t to SGPR yet, u32x2 however, works fine and can be
bitcast to pointer as well.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The fix which enabled waveops detection broke HZD, since we never tested
with that feature enabled.
Keep it disabled until we can figure out what is going on.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
USE_PUSH_DESCRIPTORS may be misleading since it would be set even when
we're not using push descriptors at all due to root descriptors being
passed in via VAs. Instead, make the flag represent whether or not we
use a regular descriptor set for root parameters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>