On GFX8 the number of records is in bytes while on other chips
it's in units of "stride".
Fixes dEQP-VK.robustness.vertex_access.*.draw.vertex_* on RAVEN.
Tested on GFX6, GFX8, GFX10 and RAVEN.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.
This just prevents a crash and rbplus probaby needs more work.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.
Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
e.g. a VERTEX only flush with tess on Vega should look at the TCS
to see which bits are needed.
CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1953
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The buffer isn't necessarily used before.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco.
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use the fastest way only if both aspects are used. Oops.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111728
Fixes: 218ce34962 ("radv: add mipmap support for the clear depth/stencil values")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise the next streamout operation will overwrite GDS. This
can be improved by tracking if there is a streamout operation in
flight. Currently the driver unconditionally flushes but that
doesn't matter much as NGG streamout is disabled by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
NGG streamout uses GDS and we have to make sure that another
process isn't going to overwrite GDS while our shaders are busy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's used to determined the max emit per buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allocates two BOs for GFX10 NGG streamout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This internal option is turned off by default because NGG streamout
still hangs. It seems like it's related to GDS as RadeonSI.
That option will be turned on once all issues are resolved.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Having two different structs is useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
16-bit and 32-bit values match hardware values but 8-bit doesn't.
This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index.
Fixes: 372c3dcfdb ("radv: implement VK_EXT_index_type_uint8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
Test that found this: dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer
Fixes: 49e6c2fb78 "radv: Store color/depth surface info in attachment info instead of framebuffer."
Reviewed-by: Dave Airlie <airlied@redhat.com>
This makes it cheaper to just change the dynamic offsets with
the same descriptor sets.
This optimization has been reverted a while back because of
random GPU hangs on GFX9, no it looks fine, at least CTS no longer
hangs on GFX9 and it doesn't hang on GFX10 as well.
It fixes a performance problem with Wolfenstein Youngblood.
Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It should be possible to build it on-demand too but it requires
more work. On GFX10, the GS copy shader is required when tess
is enabled with extreme geometry.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This includes metadata as well. On GFX10, we have to invalidate
the L2 metadata cache when shaders read DCC.
Note that we still have to implement GFX10 coherency by
introducing INV_L2_METATADA but for now just flush L2.
This fixes a corruption with DCC and Talos.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>