Commit Graph

175 Commits

Author SHA1 Message Date
Samuel Pitoiset 994253b400 radv/gfx10: add missing conversions for 16-bit exports
This fixes
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_*

Found with RADV_DEBUG=checkir

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-16 08:12:34 +02:00
Samuel Pitoiset d8844533af radv: remove unused code in radv_export_param()
It was hack for geometry shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-16 08:12:20 +02:00
Samuel Pitoiset b650f3d197 radv/gfx10: export the PrimitiveID for ES stages (VS or TES)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:10 +02:00
Samuel Pitoiset 8175f6269b radv/gfx10: declare an external symbol for the ESGS ring
It will be used for stream output but for now only declares it
if VS and if the PrimitiveID needs to be exported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:08 +02:00
Samuel Pitoiset 4478f14327 radv/gfx10: fix crash when emitting NGG GS prologue
ac_nir_context is initialized after the driver emits the NGG GS
prologue so it's likely to crash.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 08:51:53 +02:00
Samuel Pitoiset 958ee4c21a radv: report shader stage name when dumping LLVM IR
For debugging purposes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset c6fa4de15d radv/gfx10: do not set alignment on the ngg_emit pointer
This is invalid and this fixes a crash in LLVM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset df0a23ad1e radv/gfx10: fix exporting clip/cull distances for GS
This fixes dEQP-VK.clipping.user_defined.clip_distance.*geom*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset edcd2bc833 radv/gfx10: fix exporting the subpass view index for GS
This fixes dEQP-VK.multiview.*geometry*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:20 +02:00
Bas Nieuwenhuizen 0a8ef756d3 radv/gfx10: Fix NGG GS output mask handlings for LDS indexing.
In emit_vertex we optimize storage if the output mask does not
have all bits set. Do the same in the epilogue so the indices
actually match up.

Fixes dEQP-VK.geometry.input.basic_primitive.points because it
outputs PSIZE with an output mask of 1, which cause the generic
attribute for the color to be loaded from the wrong indices.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:59 +02:00
Bas Nieuwenhuizen f5982917ff radv/gfx10: Simplify output mask handling for NGG GS.
We only ever get in this function for a NGG GS proper.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:58 +02:00
Bas Nieuwenhuizen 7515f41c78 radv/gfx10: Do GS prologue outside of gs_threads if.
Mirror radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:56 +02:00
Samuel Pitoiset 5bbcb3f5bc radv/gfx10: implement support for GS as NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:45:53 +02:00
Samuel Pitoiset 51e2124a4b radv: switch to the new VS exports path
It will help for GS as NGG on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:37:02 +02:00
Samuel Pitoiset f616d80a7a radv: set the slot_index correctly for VARYING_SLOT_CLIP_DIST1
For selecting a different SQ_EXP_POS target.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:59 +02:00
Samuel Pitoiset c4ab33378a radv: add a new function for exporting VS outputs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:57 +02:00
Samuel Pitoiset ac0edc369c radv: implement new path for exporting generic varyings
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:55 +02:00
Samuel Pitoiset 0b368fc8c3 radv: use the generic export path for clip/cull distances
When they are exported to the next stage.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:52 +02:00
Samuel Pitoiset f653e5c1d6 radv: remove an extra memcpy when exporting clip/cull distances
Cleanup.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 23:36:50 +02:00
Samuel Pitoiset 3303bc8b74 radv: remove extra code for exporting LayerID to the next stage
Now that the output usage mask is set to 0x1 the LayerID is
correctly exported in the loop above.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 15:17:08 +02:00
Bas Nieuwenhuizen 14291342ec radv: Add a common member in the union to make things more clear.
This clarifies that the struct can be used when the shader can be
one of VS/TES.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-09 09:59:07 +00:00
Bas Nieuwenhuizen f9070743a9 Revert "radv: keep track of whether NGG is used for GS on GFX10"
This reverts commit 63e0675d98.

The GS is merged with the preceding shader and since the preceding
shader will have as_ngg set the final binary will have is_ngg set.

So we do not need the gs key here.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-09 09:59:07 +00:00
Samuel Pitoiset 63e0675d98 radv: keep track of whether NGG is used for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:19 +02:00
Samuel Pitoiset 2974df819e radv: set max workgroup size to 128 for TES as NGG on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:12 +02:00
Samuel Pitoiset 53c75f17ec radv: fix allocating USER SGPRs on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:11 +02:00
Bas Nieuwenhuizen 9a8e4a07ad radv/gfx10: Add tess eval ngg shader support.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:20 +10:00
Connor Abbott 118a66df99 radv: Use NIR barycentric intrinsics
We have to add a few lowering to deal with things that used to be dealt
with inline when creating inputs. We also move the code that fills out
the radv_shader_variant_info struct for linking purposes to
radv_shader.c, as it's no longer tied to the NIR->LLVM lowering.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:18:25 +02:00
Connor Abbott 27f0c3c15e radv: Make FragCoord a sysval
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Daniel Schürmann e41e932e57 radv: Lower input attachments in NIR.
v2 (Connor)
- Fix warning in release mode using MAYBE_UNUSED

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Daniel Schürmann c65e880a65 radv: Implement nir_intrinsic_load_layer_id().
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:14:53 +02:00
Bas Nieuwenhuizen 4d118ad44a radv/gfx10: Move NGG output handling outside of giant if-statement.
In merged shaders we put a big if around each shader, so both stages
can have a different number of threads. However, the NGG output code
still needs to run if the first shader is not executed.

This can happen when there are more gs threads than vs/es threads, or
when there are 0 es/vs threads (why? no clue).

Fixes: ee21bd7440 "radv/gfx10: implement NGG support (VS only)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-08 01:49:54 +02:00
Samuel Pitoiset ee21bd7440 radv/gfx10: implement NGG support (VS only)
This needs to be cleaned up a bit, and it probably contains
missing stuff and/or bugs.

This doesn't fix the "half of the triangles" issue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen 9e37609d0b radv: Combine vs and tes output keys parts.
That way the same deref is valid for both shader stages.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset ce3b5d4c17 radv/gfx10: do not declare streamout SGPRS
Streamout is completely different on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset 4c31f3dcc0 radv/gfx10: fix PS exports for SPI_SHADER_32_AR
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Samuel Pitoiset 34b185cc43 radv/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:38 +02:00
Marek Olšák 8a71f60194 ac: replace glc,slc with cache_policy for loads
cosmetic change

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:38:56 -04:00
Marek Olšák a29e781961 ac: replace glc,slc with cache_policy for stores
cosmetic change

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-04 15:38:54 -04:00
Bas Nieuwenhuizen 6a220e67ce radv: Switch to using rtld.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 5ff651c0a7 radv: Move more stuff to variant create time.
Due to them depending on the linker result.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 726a31df70 radv: Add the concept of radv shader binaries.
This simplifies a bunch of stuff by
(1) Keeping all the things in a single allocation, making things easier
 for the cache.
(2) creating a shader_variant creation helper.

This is immediately put to use by creating rtld shader binaries. This
is the main reason for the binaries, as we need to do the linking at
upload time, i.e. post caching. We do not enable rtld yet.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 43f2f01cc8 radv: Add export_prim_id to the shader variant info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 15046ef7c8 radv: use last nir shader to determine stage in postprocessing
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Samuel Pitoiset d8b079e4c7 radv: rework how the number of VGPRs is computed
Just a cleanup, it shouldn't change anything.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:27 +02:00
Samuel Pitoiset d5004f60be radv: only export clip/cull distances if PS reads them
The only exception is the GS copy shader which emits them
unconditionally.

Totals from affected shaders:
SGPRS: 71320 -> 71008 (-0.44 %)
VGPRS: 54372 -> 54240 (-0.24 %)
Code Size: 2952628 -> 2941368 (-0.38 %) bytes
Max Waves: 9689 -> 9723 (0.35 %)

This helps Dota2, Doom, GTAV and Hitman 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-27 08:56:37 +02:00
Connor Abbott 3bf8981c51 ac,radeonsi: Always mark buffer stores as inaccessiblememonly
inaccessiblememonly means that it doesn't modify memory accesible via
normal LLVM pointers. This lets LLVM's dead store elimination, memcpy
forwarding, etc. ignore functions with this attribute. We don't
represent descriptors as pointers, so this property is always true of
buffer and image stores. There are plans to represent descriptors via
pointers, but this just means that now nothing is inaccessiblememonly,
as LLVM will then understand loads/stores via its usual alias analysis.

Radeonsi was mistakenly only setting it if the driver could prove that
there were no reads, and then it was cargo-culted into ac_llvm_build
and ac_llvm_to_nir. Rip it out of everything.

statistics with nir enabled:

Totals from affected shaders:
SGPRS: 152 -> 152 (0.00 %)
VGPRS: 128 -> 132 (3.12 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 9324 -> 9244 (-0.86 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 17 -> 17 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

The only difference was a manhattan31 shader.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-19 14:08:27 +02:00
Samuel Pitoiset 33f4e04d5a ac,radv: do not emit vec3 for raw load/store on SI
It's unsupported, only load/store format with vec3 are supported.

Fixes: 6970a9a6ca ("ac,radv: remove the vec3 restriction with LLVM 9+")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-04 08:47:26 +02:00
Nicolai Hähnle f480b8aaa4 amd/common: use generated register header 2019-06-03 20:05:20 -04:00
Marek Olšák 486bc1e17e ac: use amdgpu-flat-work-group-size
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-03 14:32:47 -04:00
Samuel Pitoiset 6970a9a6ca ac,radv: remove the vec3 restriction with LLVM 9+
This changes requires LLVM r356755.

32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)

Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-03 11:30:08 +02:00