Commit Graph

185 Commits

Author SHA1 Message Date
Timothy Arceri a20a9d0c5e radv: dont store disasm string unless keep_shader_info flag set
This fixes the memory use regression from bug 111107.

Fixes: 726a31df70 ("radv: Add the concept of radv shader binaries.")

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107
2019-07-18 00:25:55 +00:00
Samuel Pitoiset 07ff367442 radv/gfx10: implement VK_EXT_post_depth_coverage
I did implement this extension a while ago but it didn't work
on pre GFX10 for some reasons. Now all CTS pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:39 +02:00
Samuel Pitoiset edf1af696f radv/gfx10: fallback to the legacy path if tess and extreme geometry
This is unsupported and hangs.

This fixes GPU hangs with
dEQP-VK.tessellation.geometry_interaction.limits.output_required_*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:33 +02:00
Samuel Pitoiset ed12be1b8f radv/gfx10: enable OC_LDS_EN for NGG GS if the ES stage is TES
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 20:05:21 +02:00
Samuel Pitoiset f0a90eddb6 radv/gfx10: allocate ESGS ring space for exporting PrimitiveID
Only VS needs that. We shouldn't hardcode these values but
that's complicated to not do that for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:05 +02:00
Samuel Pitoiset e68b55f5e3 radv/gfx10: set HS/GS/CS.WGP_MODE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset 2b6a089813 radv: tidy up radv_get_shader_name() and add NGG stages
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset 5bbcb3f5bc radv/gfx10: implement support for GS as NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:45:53 +02:00
Bas Nieuwenhuizen 7286865f6d radv/gfx10: Use correct ES shader for es_vgpr_comp_cnt for GS.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:51 +02:00
Connor Abbott f18b8a1174 radv: Don't optimize after lowering FS inputs
Currently this is done rather late in radv, after lowering booleans, so
it isn't safe to run additional optimizations that may add e.g. 1-bit
booleans. We could move the lowering parts earlier, but since right now
we only lower FS inputs and by this point all indirects have been
lowered away, there's no reason we should need to optimize anything.

One shader from Devil May Cry 5 was getting optimized, but only because
the optimization loop was working on 32-bit booleans which revealed an
opportunity that was hidden with 1-bit booleans, and we generated a
1-bit boolean which is invalid.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111092
Fixes: 118a66df99
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 10:10:20 +02:00
Samuel Pitoiset 3f50007ad8 radv: set correct number of VGPRs for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:27 +02:00
Samuel Pitoiset d2a8b63a2c radv: fix computing the number of ES VGPRS for TES on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:14 +02:00
Bas Nieuwenhuizen 795adbbadd radv/gfx10: Add pipeline state support for tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:26 +10:00
Connor Abbott 118a66df99 radv: Use NIR barycentric intrinsics
We have to add a few lowering to deal with things that used to be dealt
with inline when creating inputs. We also move the code that fills out
the radv_shader_variant_info struct for linking purposes to
radv_shader.c, as it's no longer tied to the NIR->LLVM lowering.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:18:25 +02:00
Connor Abbott 27f0c3c15e radv: Make FragCoord a sysval
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Daniel Schürmann e41e932e57 radv: Lower input attachments in NIR.
v2 (Connor)
- Fix warning in release mode using MAYBE_UNUSED

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Samuel Pitoiset ee21bd7440 radv/gfx10: implement NGG support (VS only)
This needs to be cleaned up a bit, and it probably contains
missing stuff and/or bugs.

This doesn't fix the "half of the triangles" issue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen aeb5b1a998 radv/gfx10: Set MEM_ORDERED flags on shaders.
Scattered because depending on stage they are at offset 24/25/27/30.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset 352365c5e2 radv/gfx10: do not set stream output shader config
Transform feedback is really different on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset 4c82094b7b radv/gfx10: implement radv_fill_shader_variant()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Bas Nieuwenhuizen 5ff651c0a7 radv: Move more stuff to variant create time.
Due to them depending on the linker result.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 726a31df70 radv: Add the concept of radv shader binaries.
This simplifies a bunch of stuff by
(1) Keeping all the things in a single allocation, making things easier
 for the cache.
(2) creating a shader_variant creation helper.

This is immediately put to use by creating rtld shader binaries. This
is the main reason for the binaries, as we need to do the linking at
upload time, i.e. post caching. We do not enable rtld yet.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 43f2f01cc8 radv: Add export_prim_id to the shader variant info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 7469516244 radv: Merge rsrc1/rsrc2 fields with the config fields.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Nicolai Hähnle 74a26af913 amd/common/gfx10: add register JSON
A small number of fields now need new disambiguation.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Sagar Ghuge 456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Samuel Pitoiset d8b079e4c7 radv: rework how the number of VGPRs is computed
Just a cleanup, it shouldn't change anything.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:27 +02:00
Samuel Pitoiset f4d2c47cf6 radv: the number of VGPR_COMP_CNT for GS is expected to be 0 on GFX8
Just move around the switch case. GFX9+ is handled below.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:19 +02:00
Samuel Pitoiset b4477fa4d4 radv: reduce number of VGPRs for TESS_EVAL if primitive ID is not used
We only need to 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:17 +02:00
Daniel Schürmann 0daeb1d127 amd/common: lower bitfield_extract to ubfe/ibfe.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann 48a75e7af0 amd/common: lower bitfield_insert to bfm & bitfield_select
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann c58dff753c radv: enable AMD_shader_ballot with RADV_PERFTEST_SHADER_BALLOT ('shader_ballot')
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann 7a858f274c spirv/nir: add support for AMD_shader_ballot and Groups capability
This commit also renames existing AMD capabilities:
 - gcn_shader -> amd_gcn_shader
 - trinary_minmax -> amd_trinary_minmax

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Nicolai Hähnle f480b8aaa4 amd/common: use generated register header 2019-06-03 20:05:20 -04:00
Caio Marcelo de Oliveira Filho e45bf01940 spirv: Change spirv_to_nir() to return a nir_shader
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it.  There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.

The return type reflects better what the function name suggests.  It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it.  When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho a3bfdacb6c radv: Don't re-use entry_point pointer from spirv_to_nir
Replace its uses with checking for is_entrypoint and calling
nir_shader_get_entrypoint().

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho 31a7476335 spirv, radv, anv: Replace ptr_type with addr_format
Instead of setting the glsl types of the pointers for each resource,
set the nir_address_format, from which we can derive the glsl_type,
and in the future the bit pattern representing a NULL pointer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Samuel Pitoiset d7501834cd radv: add a workaround for Monster Hunter World and LLVM 7&8
The load/store optimizer pass doesn't handle WaW hazards correctly
and this is the root cause of the reflection issue with Monster
Hunter World. AFAIK, it's the only game that are affected by this
issue.

This is fixed with LLVM r361008, but we need a workaround for older
LLVM versions unfortunately.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-17 11:41:19 +02:00
Marek Olšák ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Jonathan Marek d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Ian Romanick 1f1007a4ed nir: Initialize lower_flrp_progress everywhere
I don't know why I thought NIR_PASS always set the progress variable.
Derp.

Fixes: d41cdef2a5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
2019-05-09 10:03:51 -07:00
Timothy Arceri e19a8fe033 radv: call constant folding before opt algebraic
The pattern of calling opt algebraic first seems to have originated
in i965. The order in OpenGL drivers generally doesn't matter
because the GLSL IR optimisations do constant folding before
opt algebraic.

However in Vulkan drivers calling opt algebraic first can result
in missed constant folding opportunities.

vkpipeline-db results (VEGA64):

Totals from affected shaders:
SGPRS: 3160 -> 3176 (0.51 %)
VGPRS: 3588 -> 3580 (-0.22 %)
Spilled SGPRs: 52 -> 44 (-15.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 12 -> 12 (0.00 %) dwords per thread
Code Size: 261812 -> 261036 (-0.30 %) bytes
LDS: 7 -> 7 (0.00 %) blocks
Max Waves: 346 -> 348 (0.58 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-08 19:45:01 +10:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Juan A. Suarez Romero 06c9d7f9f9 radv: enable descriptor indexing capabilities
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.

Fixes: 0e10790558 "radv: Enable VK_EXT_descriptor_indexing."

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-30 09:23:23 +02:00
Bas Nieuwenhuizen 5c3467e74a radv: Run the new ycbcr lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Samuel Pitoiset b3e3440c87 radv: add VK_NV_compute_shader_derivates support
Only computeDerivativeGroupLinear is supported for now.

All crucible tests pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-22 14:51:57 +02:00
Samuel Pitoiset 9cf55b022d radv: add VK_KHR_shader_atomic_int64 but disable it for now
No support for 64-bit compare&swap atomic operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-17 21:59:56 +02:00
Samuel Pitoiset ecbe6cb805 radv: sort the shader capabilities alphabetically
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-16 09:14:22 +02:00
Samuel Pitoiset 14f03978ed radv: enable VK_KHR_shader_float16_int8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-15 10:43:55 +02:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00