Commit Graph

589 Commits

Author SHA1 Message Date
Rhys Perry a87b0f5141 radv/aco,aco: set lower_fmod
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).

Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)

At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.

v2: remove fmod/frem handling in init_context()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-04 14:00:46 +00:00
Samuel Pitoiset 5ebe1a17e9 radv: enable lower_fmod for the LLVM path
This lowers fmod and frem at NIR level like RadeonSI. fmod is
already lowered directly in NIR->LLVM, and frem will be lowered by
LLVM anyways.

This fixes a LLVM crash with:
dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-03 18:15:14 +02:00
Samuel Pitoiset a2a68d551c radv/gfx10: fix the ESGS ring size symbol
Random hangs no longer happen, I'm actually not sure if they were
related to this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-02 21:50:40 +02:00
Daniel Schürmann 0fb27f1e5a radv/aco: Don't lower subtractions
40228 shaders in 20236 tests
Totals:
SGPRS: 2045512 -> 2046496 (0.05 %)
VGPRS: 1430856 -> 1430464 (-0.03 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 77202840 -> 77151832 (-0.07 %) bytes
LDS: 863 -> 863 (0.00 %) blocks
Max Waves: 260729 -> 260754 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Timur Kristóf 30f0c0ea7d radv: Add debug option to dump meta shaders.
This new option can help debug shader compiler problems when
there are issues with the meta shaders.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 13:36:49 +00:00
Timur Kristóf a4fd8ba7e3 amd/common: Introduce ac_get_fs_input_vgpr_cnt.
Add a function called ac_get_fs_input_vgpr_cnt which will return
the number of input VGPRs used by an AMD shader. Previously,
radv and radeonsi had the same code duplicated, but this commit also
allows them to share this code.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-26 13:36:49 +00:00
Timur Kristóf 83eebdb507 radv: Set shared VGPR count in radv_postprocess_config.
This commit allows RADV to set the shared VGPR count according to
the shader config.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 13:36:49 +00:00
Rhys Perry 3c966fd688 aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_string
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:47 +01:00
Rhys Perry ec8ced9123 radv/aco: return a correct name and description for the backend IR
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:43 +01:00
Rhys Perry 6613b81327 aco,radv/aco: get dissassembly for release builds if requested
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-26 11:08:09 +01:00
Daniel Schürmann 8b78cce433 radv: remove dead shared variables
LLVM does this anyway, but for ACO we need to do it in NIR.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-19 12:10:00 +02:00
Daniel Schürmann 281262281b radv/aco: enable VK_EXT_shader_demote_to_helper_invocation
For now, this extension will only be enabled for ACO.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-19 12:10:00 +02:00
Daniel Schürmann a70a998718 radv/aco: Setup alternate path in RADV to support the experimental ACO compiler
LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco.

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-19 12:10:00 +02:00
Marek Olšák 0692ae34e9 ac: move ac_get_num_physical_sgprs into radeon_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-09-18 14:39:06 -04:00
Marek Olšák ca43006fd2 ac: move ac_get_max_wave64_per_simd into radeon_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-09-18 14:39:06 -04:00
Samuel Pitoiset 5ebc76471c radv/gfx10: adjust the GS NGG scratch size for streamout
It needs more space for multiple streams.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-16 12:08:22 +02:00
Samuel Pitoiset a15b3bcf1a radv/gfx10: add an option to switch from legacy to NGG streamout
This internal option is turned off by default because NGG streamout
still hangs. It seems like it's related to GDS as RadeonSI.

That option will be turned on once all issues are resolved.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-16 12:08:22 +02:00
Samuel Pitoiset 538766792d radv/gfx10: declare a LDS symbol for the NGG emit space
This fixes some interactions when NGG GS is enabled. It fixes:

- dEQP-VK.clipping.user_defined.clip_cull_distance_dynamic_index.*geom*
- dEQP-VK.tessellation.geometry_interaction.passthrough.*

For some reasons, using the computed ESGS ring size randomly hangs
with CTS. For now, just use the maximum LDS size for ESGS.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-10 09:27:01 +02:00
Samuel Pitoiset a9af11f1fa radv: fill shader info for all stages in the pipeline
This shouldn't be in NIR->LLVM because ACO also needs the shader
info. This will also help for computing some NGG values that are
necessary for declaring LDS symbols.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-10 09:26:45 +02:00
Marek Olšák d95afd8b9e radeonsi/gfx10: fix wave occupancy computations
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-09-09 23:43:03 -04:00
Samuel Pitoiset 83499ac765 radv: merge radv_shader_variant_info into radv_shader_info
Having two different structs is useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-09-06 15:52:03 +02:00
Vasily Khoruzhick 9367d2ca37 nir: allow specifying filter callback in lower_alu_to_scalar
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Connor Abbott 3f5b541fc8 radv: Call nir_propagate_invariant()
Without this, invariant qualifiers don't do anything. Together with a
fix to the game, this fixes flickering in No Man's Sky.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-09-05 14:05:46 +02:00
Connor Abbott 71a6794200 ac/nir: Enable nir_opt_large_constants
vkpipeline-db numbers:

Totals:
SGPRS: 1740306 -> 1741322 (0.06 %)
VGPRS: 1331124 -> 1331712 (0.04 %)
Spilled SGPRs: 21201 -> 21316 (0.54 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 256 -> 256 (0.00 %) dwords per thread
Code Size: 79022628 -> 78694788 (-0.41 %) bytes
LDS: 6500 -> 6500 (0.00 %) blocks
Max Waves: 301413 -> 301302 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 53633 -> 54649 (1.89 %)
VGPRS: 53000 -> 53588 (1.11 %)
Spilled SGPRs: 3454 -> 3569 (3.33 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5284232 -> 4956392 (-6.20 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 4239 -> 4128 (-2.62 %)
Wait states: 0 -> 0 (0.00 %)

(The biggest VGPR and max wave regression is due to unrolling a loop,
which made the scheduler more aggressive, but in this case it's able to
effectively hide latency so it's actually probably a win.)

shader-db numbers with radeonsi NIR:

Totals:
SGPRS: 3526496 -> 3526512 (0.00 %)
VGPRS: 2198576 -> 2198576 (0.00 %)
Spilled SGPRs: 10463 -> 10463 (0.00 %)
Spilled VGPRs: 86 -> 86 (0.00 %)
Private memory VGPRs: 3182 -> 2528 (-20.55 %)
Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread
Code Size: 74117280 -> 74106140 (-0.02 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 775846 -> 775844 (-0.00 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 856 -> 872 (1.87 %)
VGPRS: 680 -> 680 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 654 -> 0 (-100.00 %)
Scratch size: 668 -> 0 (-100.00 %) dwords per thread
Code Size: 49652 -> 38512 (-22.44 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 182 -> 180 (-1.10 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-05 12:21:46 +02:00
Connor Abbott 5dadbabb47 radv/radeonsi: Don't count read-only data when reporting code size
We usually use these counts as a simple way to figure out if a change
reduces the number of instructions or shrinks an instruction. However,
since .rodata sections aren't executed, we shouldn't be counting their
size for this analysis. Make the linker return the total executable
size, and use it to report the more useful size in both drivers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-09-05 12:21:35 +02:00
Samuel Pitoiset cc3d36b5dd radv: remove radv_init_llvm_target() helper
RADV no longer uses specific LLVM options compared to the common code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-30 09:33:21 +02:00
Samuel Pitoiset 8d44f83844 radv: move lowering PS inputs/outputs at the right place
At shaders creation, just after NIR linking.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-30 09:29:31 +02:00
Samuel Pitoiset 151d6990ec radv: gather info about PS inputs in the shader info pass
It's the right place to do that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-30 09:29:29 +02:00
Samuel Pitoiset 49f5ddd3ae radv: make use of has_ls_vgpr_init_bug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-27 08:04:51 +02:00
Samuel Pitoiset 2b9c371575 ac: add cpdma_prefetch_writes_memory to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-27 08:04:29 +02:00
Samuel Pitoiset 1fd60db4a1 ac,radv,radeonsi: remove LLVM 7 support
Now that LLVM 9 will be released soon, we will only support
LLVM 8, 9 and master (10).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-23 08:12:34 +02:00
Samuel Pitoiset e73d863a66 radv: allow to enable VK_AMD_shader_ballot only on GFX8+
Scans aren't implemented on SI/CIK.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-21 15:14:29 +02:00
Bas Nieuwenhuizen 2e763f7c87 radv: Use correct vgpr_comp_cnt for VS if both prim_id and instance_id are needed.
Should take the max of the 2.

Fixes: ea337c8b7e "radv/gfx10: fix VS input VGPRs with the legacy path"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-21 09:38:46 +00:00
Rhys Perry 7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Bas Nieuwenhuizen 8874af8ef4 radv: Keep shader info when needed.
This allows enabling the shader info keeping on a per shader basis.
Also disables the cache on a per shader basis.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen 5444d3e0c2 radv: Use string for nir dumping.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Allows us to easily dump all nir shaders for combined variants in
vega and simplifies ownership.
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen 739a2880f5 radv: Get max workgroup size without nir.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen 290ca0c4dd radv: Add utility function to calculate max waves.
Not AC because a lot of it is data extraction out of radv structs.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen ba8d3c362b radv: Properly use Wave64 for non-NGG GS and copy shader.
Fixes: 8a86908e9a "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen 035406ecf7 radv: Put wave size in shader options/info.
Instead of having the three values everywhere. This is also more
future proof if we want the driver to make those decisions eventually.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen 72e7b7a00b ac/nir,radv: Optimize bounds check for 64 bit CAS.
When the application does not ask for robust buffer access.

Only implemented the check in radv.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-08-02 21:21:55 +02:00
Samuel Pitoiset 96a5445559 radv/gfx10: use the correct target machine for Wave32
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:38 +02:00
Samuel Pitoiset 8a86908e9a radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders
It can be enabled with RADV_PERFTEST=gewave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:36 +02:00
Samuel Pitoiset 953bbacc23 radv/gfx10: add Wave32 support for fragment shaders
It can be enabled with RADV_PERFTEST=pswave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-08-02 09:37:34 +02:00
Samuel Pitoiset ea38565011 radv/gfx10: add Wave32 support for compute shaders
It can be enabled with RADV_PERFTEST=cswave32.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-31 09:35:04 +02:00
Daniel Schürmann 45638e14fb radv: Don't include radv_private.h from radv_shader.h
This patch decouples radv_shader.h from any LLVM dependency.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-30 10:29:11 +02:00
Connor Abbott a69ab1b7d2 radv: Delete unused local variables in optimization loop
Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 620 -> 560 (-9.68 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 292 -> 292 (0.00 %) dwords per thread
Code Size: 20024 -> 20144 (0.60 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 25 -> 25 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-29 11:37:46 +02:00
Samuel Pitoiset 09abe571a2 radv/gfx10: emit streamout shader config
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:32 +02:00
Samuel Pitoiset ea337c8b7e radv/gfx10: fix VS input VGPRs with the legacy path
For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-24 08:23:21 +02:00
Samuel Pitoiset 9343c93e34 radv: fix dumping disassembly with RADV_DEBUG=shaders
Fixes: a20a9d0c5e ("radv: dont store disasm string unless keep_shader_info flag set")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-23 10:22:29 +02:00
Daniel Schürmann 64b7386ee8 radv: move nir_opt_conditional_discard out of optimization loop
This late optimization pass is only affected by nir_opt_if() and handles all cases
in a single pass. It's enough to call it once after the optimization loop.
No changes on vkpipeline-db.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-22 08:12:18 +02:00
Bas Nieuwenhuizen 451f030c06 radv: Fix uninitialized warning.
For es_vgpr_comp_cnt.

Fixes: 795adbbadd "radv/gfx10: Add pipeline state support for tess."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-21 01:39:08 +02:00
Marek Olšák 921c1d24d5 ac/rtld: add support for Wave32
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Dave Airlie 2ac2b98780 radv: fix crash in shader tracing.
Enabling tracing, and then having a vmfault, can leads to a segfault
before we print out the traces, as if a meta shader is executing
and we don't have the NIR for it.

Just pass the stage and give back a default.

Fixes: 9b9ccee4d6 ("radv: take LDS into account for compute shader occupancy stats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 11:00:25 +10:00
Timothy Arceri a20a9d0c5e radv: dont store disasm string unless keep_shader_info flag set
This fixes the memory use regression from bug 111107.

Fixes: 726a31df70 ("radv: Add the concept of radv shader binaries.")

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107
2019-07-18 00:25:55 +00:00
Samuel Pitoiset 07ff367442 radv/gfx10: implement VK_EXT_post_depth_coverage
I did implement this extension a while ago but it didn't work
on pre GFX10 for some reasons. Now all CTS pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:39 +02:00
Samuel Pitoiset edf1af696f radv/gfx10: fallback to the legacy path if tess and extreme geometry
This is unsupported and hangs.

This fixes GPU hangs with
dEQP-VK.tessellation.geometry_interaction.limits.output_required_*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-17 08:32:33 +02:00
Samuel Pitoiset ed12be1b8f radv/gfx10: enable OC_LDS_EN for NGG GS if the ES stage is TES
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 20:05:21 +02:00
Samuel Pitoiset f0a90eddb6 radv/gfx10: allocate ESGS ring space for exporting PrimitiveID
Only VS needs that. We shouldn't hardcode these values but
that's complicated to not do that for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-15 11:30:05 +02:00
Samuel Pitoiset e68b55f5e3 radv/gfx10: set HS/GS/CS.WGP_MODE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 17:47:12 +02:00
Samuel Pitoiset 2b6a089813 radv: tidy up radv_get_shader_name() and add NGG stages
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-12 08:19:53 +02:00
Samuel Pitoiset 5bbcb3f5bc radv/gfx10: implement support for GS as NGG
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-11 15:45:53 +02:00
Bas Nieuwenhuizen 7286865f6d radv/gfx10: Use correct ES shader for es_vgpr_comp_cnt for GS.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-11 15:45:51 +02:00
Connor Abbott f18b8a1174 radv: Don't optimize after lowering FS inputs
Currently this is done rather late in radv, after lowering booleans, so
it isn't safe to run additional optimizations that may add e.g. 1-bit
booleans. We could move the lowering parts earlier, but since right now
we only lower FS inputs and by this point all indirects have been
lowered away, there's no reason we should need to optimize anything.

One shader from Devil May Cry 5 was getting optimized, but only because
the optimization loop was working on 32-bit booleans which revealed an
opportunity that was hidden with 1-bit booleans, and we generated a
1-bit boolean which is invalid.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111092
Fixes: 118a66df99
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-10 10:10:20 +02:00
Samuel Pitoiset 3f50007ad8 radv: set correct number of VGPRs for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:27 +02:00
Samuel Pitoiset d2a8b63a2c radv: fix computing the number of ES VGPRS for TES on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 09:54:14 +02:00
Bas Nieuwenhuizen 795adbbadd radv/gfx10: Add pipeline state support for tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 12:04:26 +10:00
Connor Abbott 118a66df99 radv: Use NIR barycentric intrinsics
We have to add a few lowering to deal with things that used to be dealt
with inline when creating inputs. We also move the code that fills out
the radv_shader_variant_info struct for linking purposes to
radv_shader.c, as it's no longer tied to the NIR->LLVM lowering.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:18:25 +02:00
Connor Abbott 27f0c3c15e radv: Make FragCoord a sysval
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Daniel Schürmann e41e932e57 radv: Lower input attachments in NIR.
v2 (Connor)
- Fix warning in release mode using MAYBE_UNUSED

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Samuel Pitoiset ee21bd7440 radv/gfx10: implement NGG support (VS only)
This needs to be cleaned up a bit, and it probably contains
missing stuff and/or bugs.

This doesn't fix the "half of the triangles" issue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen aeb5b1a998 radv/gfx10: Set MEM_ORDERED flags on shaders.
Scattered because depending on stage they are at offset 24/25/27/30.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset 352365c5e2 radv/gfx10: do not set stream output shader config
Transform feedback is really different on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:51:32 +02:00
Samuel Pitoiset 4c82094b7b radv/gfx10: implement radv_fill_shader_variant()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-07 17:03:39 +02:00
Bas Nieuwenhuizen 5ff651c0a7 radv: Move more stuff to variant create time.
Due to them depending on the linker result.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 726a31df70 radv: Add the concept of radv shader binaries.
This simplifies a bunch of stuff by
(1) Keeping all the things in a single allocation, making things easier
 for the cache.
(2) creating a shader_variant creation helper.

This is immediately put to use by creating rtld shader binaries. This
is the main reason for the binaries, as we need to do the linking at
upload time, i.e. post caching. We do not enable rtld yet.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 43f2f01cc8 radv: Add export_prim_id to the shader variant info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen 7469516244 radv: Merge rsrc1/rsrc2 fields with the config fields.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-04 10:52:26 +00:00
Nicolai Hähnle 74a26af913 amd/common/gfx10: add register JSON
A small number of fields now need new disambiguation.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Sagar Ghuge 456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Samuel Pitoiset d8b079e4c7 radv: rework how the number of VGPRs is computed
Just a cleanup, it shouldn't change anything.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:27 +02:00
Samuel Pitoiset f4d2c47cf6 radv: the number of VGPR_COMP_CNT for GS is expected to be 0 on GFX8
Just move around the switch case. GFX9+ is handled below.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:19 +02:00
Samuel Pitoiset b4477fa4d4 radv: reduce number of VGPRs for TESS_EVAL if primitive ID is not used
We only need to 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-01 14:59:17 +02:00
Daniel Schürmann 0daeb1d127 amd/common: lower bitfield_extract to ubfe/ibfe.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann 48a75e7af0 amd/common: lower bitfield_insert to bfm & bitfield_select
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann c58dff753c radv: enable AMD_shader_ballot with RADV_PERFTEST_SHADER_BALLOT ('shader_ballot')
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann 7a858f274c spirv/nir: add support for AMD_shader_ballot and Groups capability
This commit also renames existing AMD capabilities:
 - gcn_shader -> amd_gcn_shader
 - trinary_minmax -> amd_trinary_minmax

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Nicolai Hähnle f480b8aaa4 amd/common: use generated register header 2019-06-03 20:05:20 -04:00
Caio Marcelo de Oliveira Filho e45bf01940 spirv: Change spirv_to_nir() to return a nir_shader
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it.  There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.

The return type reflects better what the function name suggests.  It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it.  When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho a3bfdacb6c radv: Don't re-use entry_point pointer from spirv_to_nir
Replace its uses with checking for is_entrypoint and calling
nir_shader_get_entrypoint().

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho 31a7476335 spirv, radv, anv: Replace ptr_type with addr_format
Instead of setting the glsl types of the pointers for each resource,
set the nir_address_format, from which we can derive the glsl_type,
and in the future the bit pattern representing a NULL pointer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Samuel Pitoiset d7501834cd radv: add a workaround for Monster Hunter World and LLVM 7&8
The load/store optimizer pass doesn't handle WaW hazards correctly
and this is the root cause of the reflection issue with Monster
Hunter World. AFAIK, it's the only game that are affected by this
issue.

This is fixed with LLVM r361008, but we need a workaround for older
LLVM versions unfortunately.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-17 11:41:19 +02:00
Marek Olšák ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Jonathan Marek d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Ian Romanick 1f1007a4ed nir: Initialize lower_flrp_progress everywhere
I don't know why I thought NIR_PASS always set the progress variable.
Derp.

Fixes: d41cdef2a5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
2019-05-09 10:03:51 -07:00
Timothy Arceri e19a8fe033 radv: call constant folding before opt algebraic
The pattern of calling opt algebraic first seems to have originated
in i965. The order in OpenGL drivers generally doesn't matter
because the GLSL IR optimisations do constant folding before
opt algebraic.

However in Vulkan drivers calling opt algebraic first can result
in missed constant folding opportunities.

vkpipeline-db results (VEGA64):

Totals from affected shaders:
SGPRS: 3160 -> 3176 (0.51 %)
VGPRS: 3588 -> 3580 (-0.22 %)
Spilled SGPRs: 52 -> 44 (-15.38 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 12 -> 12 (0.00 %) dwords per thread
Code Size: 261812 -> 261036 (-0.30 %) bytes
LDS: 7 -> 7 (0.00 %) blocks
Max Waves: 346 -> 348 (0.58 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-08 19:45:01 +10:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Juan A. Suarez Romero 06c9d7f9f9 radv: enable descriptor indexing capabilities
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.

Fixes: 0e10790558 "radv: Enable VK_EXT_descriptor_indexing."

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-30 09:23:23 +02:00
Bas Nieuwenhuizen 5c3467e74a radv: Run the new ycbcr lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Samuel Pitoiset b3e3440c87 radv: add VK_NV_compute_shader_derivates support
Only computeDerivativeGroupLinear is supported for now.

All crucible tests pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-22 14:51:57 +02:00
Samuel Pitoiset 9cf55b022d radv: add VK_KHR_shader_atomic_int64 but disable it for now
No support for 64-bit compare&swap atomic operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-17 21:59:56 +02:00
Samuel Pitoiset ecbe6cb805 radv: sort the shader capabilities alphabetically
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-16 09:14:22 +02:00
Samuel Pitoiset 14f03978ed radv: enable VK_KHR_shader_float16_int8
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-15 10:43:55 +02:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Samuel Pitoiset c25f63872b radv: partially enable VK_KHR_shader_float16_int8
Only 8-bit integers for now, float16 requires a bit more work.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-01 18:53:59 +02:00
Rhys Perry 0af95f0ffc radv: lower 16-bit flrp
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-01 09:58:48 +02:00
Samuel Pitoiset 8a6e61cc52 radv: do not lower frexp_exp and frexp_sig
Hardware has two instructions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-28 13:02:51 +01:00
Samuel Pitoiset 23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00
Rhys Perry 037f11d42e radv: enable VK_KHR_8bit_storage
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-21 09:02:27 +01:00
Jason Ekstrand 08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Samuel Pitoiset 71ffa00fc6 radv: enable lower_mul_2x32_64
Fixes: 58bcebd987 ("spirv: Allow [i/u]mulExtended to use new nir opcode")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-06 22:41:20 +01:00
Bas Nieuwenhuizen 13ab63bb62 radv: Implement VK_EXT_buffer_device_address.
v2: Also update the release notes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:37:38 +01:00
Samuel Pitoiset 5e7f800f32 radv: fix build
Fixes: 9b9ccee4d6 ("radv: take LDS into account for compute shader occupancy stats")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-01 15:31:55 +01:00
Timothy Arceri 9b9ccee4d6 radv: take LDS into account for compute shader occupancy stats
Ported from d205faeb6c.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-01 22:25:30 +11:00
Timothy Arceri a53d68d318 ac/radv/radeonsi: add ac_get_num_physical_sgprs() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-01 22:25:30 +11:00
Bas Nieuwenhuizen ead54d4a42 radv/winsys: Set winsys bo priority on creation.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-01-29 15:56:41 +01:00
Karol Herbst 9b24028426 nir: rename nir_var_function to nir_var_function_temp
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-19 20:01:41 +01:00
Karol Herbst d0c6ef2793 nir: rename global/local to private/function memory
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.

v2: rename local to function as well
v3: rename vtn_variable_mode_local as well

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 18:51:46 +01:00
Jason Ekstrand 05d72d6d48 spirv: Sort supported capabilities
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-07 18:41:15 -06:00
Jason Ekstrand 63b9aa2e25 spirv: Add support for using derefs for UBO/SSBO access
For now, it's hidden behind a cap.  Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-08 00:38:30 +00:00
Jason Ekstrand adc155a815 spirv: Add explicit pointer types
Instead of baking in uvec2 for UBO and SSBO pointers and uint for push
constant and shared memory pointers, make it configurable.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand fc9c4f89b8 nir: Move propagation of cast derefs to a new nir_opt_deref pass
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them.  Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Samuel Pitoiset 9606310081 radv: enable shaderStorageImageMultisample feature on GFX8+
Untested on older chips.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-20 18:01:19 +01:00
Samuel Pitoiset 0a7e767e58 radv: drop the amdgpu-skip-threshold=1 workaround for LLVM 8
This workaround has been introduced by 135e4d434f for fixing
DXVK GPU hangs with many games. It is no longer needed since
LLVM r345718.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-20 12:09:57 +01:00
Ian Romanick 378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Samuel Pitoiset 3fbdcd942f amd: remove support for LLVM 6.0
User are encouraged to switch to LLVM 7.0 released in September 2018.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-06 14:02:56 +01:00
Nicolai Hähnle 8c97abc066 radv: include LLVM IR in the VK_AMD_shader_info "disassembly"
Helpful for debugging compiler backend problems: this allows us to
easily retrieve the LLVM IR from RenderDoc.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-09 14:54:37 +01:00
Samuel Pitoiset b4eb029062 radv: implement VK_EXT_transform_feedback
This implementation should work and potential bugs can be
fixed during the release candidates window anyway.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 17:10:58 +01:00
Jason Ekstrand 28bb6abd1d nir/validate: Print when the validation failed
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-10-26 11:45:29 -05:00
Timothy Arceri 3a95396f3c radv: use nir_shrink_vec_array_vars()
Totals from affected shaders:
SGPRS: 1096 -> 1096 (0.00 %)
VGPRS: 1192 -> 1056 (-11.41 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 100940 -> 94384 (-6.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 100 -> 112 (12.00 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from Batman Arkham City.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri 8086fa1bcd radv: use nir_split_array_vars()
We call in the opt loop in case another pass results in an
array with indirect access being turned into direct access.

Totals from affected shaders:
SGPRS: 512 -> 496 (-3.12 %)
VGPRS: 456 -> 452 (-0.88 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 40040 -> 39664 (-0.94 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 41 -> 43 (4.88 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from Batman Arkham City.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri 06675711e7 radv: use nir_opt_find_array_copies()
Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from "Batman: Arkham City" over DXVK.

The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri 9d5b106b2e radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
Totals from affected shaders:
SGPRS: 2856 -> 2856 (0.00 %)
VGPRS: 3236 -> 3248 (0.37 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 236560 -> 233548 (-1.27 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 277 -> 283 (2.17 %)
Wait states: 0 -> 0 (0.00 %)

Even in the cases were we have increased VGPR use it appears
the NIR is improved significantly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Timothy Arceri 72e4287e8f radv: make use of nir_lower_load_const_to_scalar()
This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example in radeonsi more loops are unrolled in
Civilization Beyond Earth.

The actual pipeline-db stats are not overwhelming but even in the
negatively affected shaders the NIR is clearly better. It just
happens that the code shuffling and in some cases calls to max
rather than a flt result in the final output from LLVM not
giving as good numbers.

However this is an incremental opt that further passes build off
so the change should be made IMO.

Totals from affected shaders:
SGPRS: 20192 -> 20184 (-0.04 %)
VGPRS: 19516 -> 19524 (0.04 %)
Spilled SGPRs: 437 -> 444 (1.60 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1527444 -> 1522276 (-0.34 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 1018 -> 1016 (-0.20 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-25 09:31:22 +10:00
Samuel Pitoiset 35656823b9 radv: enable VK_SUBGROUP_FEATURE_ARITHMETIC_BIT
All CTS pass on Polaris/Vega with LLVM 6, 7 and master, so
I think it's safe to enable the feature.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-19 13:36:10 +02:00
Samuel Pitoiset 08103c5f65 radv: enable shaderInt16 capability
Not sure if this is all wired up. CTS does pass and the Tangrams
demo works fine on Vega. There are corruption issues on Polaris
but not sure if that related to 16-bit support.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:39 +02:00
Bas Nieuwenhuizen d97c892584 radv: Set the user SGPR MSB for Vega.
Otherwise using 32 user SGPRs would be broken.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-09-16 12:50:58 +02:00
Jason Ekstrand 44ec31cd75 nir: Drop the vs_inputs_dual_locations option
It was very inconsistently handled; the only things that made use of it
were glsl_to_nir, glspirv, and nir_gather_info.  In particular,
nir_lower_io completely ignored it so anyone using nir_lower_io on
64-bit vertex attributes was going to be in for a shock.  Also, as of
the previous commit, it's set by every driver that supports 64-bit
vertex attributes.  There's no longer any reason to have it be an option
so let's just delete it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-09-06 16:07:50 -05:00
Samuel Pitoiset 24ee53231d radv: remove dead variables after splitting per member structs
Otherwise, nir_lower_clip_cull_distance_arrays might report
wrong number of output clips/culls because it relies on
shader output variables and some of them might be dead.

This fixes a rendering issue with Dolphin and Super Mario
Sunshine.

Fixes: b0c643d8f5 ("spirv: Use NIR per-member splitting")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107610
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-08-22 13:57:18 +02:00
Dave Airlie b88468f15c radv: return binary code_size not variant code size to cache
The code sizes return here get passed to the cache shader insert function,
which then memcpy from the code ptr, and causes all sorts of valgrind
errors like:
==6755== Invalid read of size 8
==6755==    at 0x4C32FEE: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1021)
==6755==    by 0x2305D4C7: radv_pipeline_cache_insert_shaders (radv_pipeline_cache.c:416)
==6755==    by 0x2305791D: radv_create_shaders (radv_pipeline.c:2158)
==6755==    by 0x2305C523: radv_pipeline_init (radv_pipeline.c:3404)
==6755==    by 0x2305C890: radv_graphics_pipeline_create (radv_pipeline.c:3515)
==6755==    by 0x230188AB: radv_device_init_meta_blit_color (radv_meta_blit.c:871)
==6755==    by 0x2301D50E: radv_device_init_meta_blit_state (radv_meta_blit.c:1278)
==6755==    by 0x23011893: radv_device_init_meta (radv_meta.c:352)
==6755==    by 0x2300744B: radv_CreateDevice (radv_device.c:1576)
==6755==    by 0x5187D0F: ??? (in /usr/lib64/libvulkan.so.1.1.77)
==6755==    by 0x518F6A3: ??? (in /usr/lib64/libvulkan.so.1.1.77)
==6755==    by 0x5192A42: vkCreateDevice (in /usr/lib64/libvulkan.so.1.1.77)
==6755==  Address 0x22a58548 is 4 bytes after a block of size 116 alloc'd
==6755==    at 0x4C2EBAB: malloc (vg_replace_malloc.c:299)
==6755==    by 0x23089DC4: ac_elf_read (ac_binary.c:144)
==6755==    by 0x23090A60: ac_compile_module_to_binary (ac_llvm_helper.cpp:162)
==6755==    by 0x23053F06: compile_to_memory_buffer (radv_llvm_helper.cpp:58)
==6755==    by 0x23053F06: radv_compile_to_binary (radv_llvm_helper.cpp:98)
==6755==    by 0x23052769: ac_llvm_compile (radv_nir_to_llvm.c:3394)
==6755==    by 0x23052823: ac_compile_llvm_module (radv_nir_to_llvm.c:3418)
==6755==    by 0x23053C05: radv_compile_nir_shader (radv_nir_to_llvm.c:3542)
==6755==    by 0x23061B4E: shader_variant_create (radv_shader.c:580)
==6755==    by 0x23061CFD: radv_shader_variant_create (radv_shader.c:634)
==6755==    by 0x23057765: radv_create_shaders (radv_pipeline.c:2123)
==6755==    by 0x2305C523: radv_pipeline_init (radv_pipeline.c:3404)
==6755==    by 0x2305C890: radv_graphics_pipeline_create (radv_pipeline.c:3515)

Since we are just inserting the code into the cache, we can avoid these
bad reads and data in the cache by just using the binary code size here.

Fixes: 939e5a382 (radv: add padding for the UMR disassembler)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-07-28 06:20:20 +10:00
Daniel Schürmann 62024fa775 radv: enable VK_KHR_16bit_storage extension / 16bit storage features
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:26 +02:00
Danylo Piliaiev 494a206229 radv: Fix incorrect assumption about ternary operator precedence
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-07-19 10:04:27 +02:00
Dave Airlie 6f3aee40f9 radv: using tls to store llvm related info and speed up compiles (v10)
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.

This should remove all the fixed overheads setup costs of creating
the pass manager each time.

This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)

v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-10 07:58:03 +10:00
Dave Airlie 7398913a62 ac/radv: move llvm compiler info to struct and init in one place
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.

 Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 10:29:16 +10:00
Dave Airlie 35c82af539 radv/radeonsi: add a check ir tm options
This doesn't do much yet, but it makes it easier to move the code
to a common shared code base.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:32:35 +10:00
Dave Airlie e1387eaf12 radv: create/destroy passmgr at the higher level.
This is prep work for moving this to a per-thread struct

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:31:05 +10:00
Dave Airlie f2b3e96e75 radv: drop copy of ac_create_target_machine.
Once we split the init once stuff out, this can be shared again.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:15:35 +10:00
Dave Airlie 473be16c74 ac/radv: split the non-common init_once code from the common target code. (v2)
This just splits out the non-shared code and reuses ac_get_llvm_target in radv.

v2: rebase on Marek's patch - fixup brace position/whitespace

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:15:23 +10:00
Samuel Pitoiset 939e5a3823 radv: add padding for the UMR disassembler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-02 10:42:17 +02:00
Samuel Pitoiset bcbd8dd6c9 radv: enable VK_EXT_shader_stencil_export
The driver already supports exporting the stencil value.

The following CTS test now pass:
dEQP-VK.pipeline.shader_stencil_export.op_replace

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-26 10:40:10 +02:00
Bas Nieuwenhuizen 8c4f430d43 radv: Enable lower_io_to_temporaries after deref changes.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 21:23:06 -07:00
Rob Clark d143f6c856 move lower_deref_instrs
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:54:00 -07:00
Bas Nieuwenhuizen 3573570afe radv: Disable lower_io_to_temporaries during deref changes.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:54:00 -07:00
Jason Ekstrand c11833ab24 nir,spirv: Rework function calls
This commit completely reworks function calls in NIR.  Instead of having
a set of variables for the parameters and return value, nir_call_instr
now has simply has a number of sources which get mapped to load_param
intrinsics inside the functions.  It's up to the client API to build an
ABI on top of that.  In SPIR-V, out parameters are handled by passing
the result of a deref through as an SSA value and storing to it.

This virtue of this approach can be seen by how much it allows us to
delete from core NIR.  In particular, nir_inline_functions gets halved
and goes from a fairly difficult pass to understand in detail to almost
trivial.  It also simplifies spirv_to_nir somewhat because NIR functions
never were a good fit for SPIR-V.

Unfortunately, there is no good way to do this without a mega-commit.
Core NIR and SPIR-V have to be changed at the same time.  This also
requires changes to anv and radv because nir_inline_functions couldn't
handle deref instructions before this change and can't work without them
after this change.

Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:15:58 -07:00
Jason Ekstrand b0c643d8f5 spirv: Use NIR per-member splitting
Before, we were doing structure splitting in spirv_to_nir.
Unfortunately, this doesn't really work when you think about passing
struct pointers into functions.  Doing it later in NIR is a much better
plan.

Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:15:57 -07:00
Jason Ekstrand 74212c2414 anv,i965,radv,st,ir3: Call nir_lower_deref_instrs
This inserts a call to nir_lower_deref_instrs at every call site of
glsl_to_nir, spirv_to_nir, and prog_to_nir.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-22 20:15:54 -07:00
Eric Engestrom d85fef1e34 radv: fix reported number of available VGPRs
It's a bit late to round up after an integer division.

Fixes: de88979413 "radv: Implement VK_AMD_shader_info"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
2018-06-18 17:08:22 +01:00
Samuel Pitoiset bfca15e16a radv: add RADV_DEBUG=checkir
This allows to run the LLVM verifier pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:08 +02:00
Samuel Pitoiset 135e4d434f radv: add a workaround for DXVK hangs by setting amdgpu-skip-threshold
Workaround for bug in llvm that causes the GPU to hang in presence
of nested loops because there is an exec mask issue. The proper
solution is to fix LLVM but this might require a bunch of work.

This fixes a bunch of GPU hangs that happen with DXVK.

Vega10:
Totals from affected shaders:
SGPRS: 110456 -> 110456 (0.00 %)
VGPRS: 122800 -> 122800 (0.00 %)
Spilled SGPRs: 7478 -> 7478 (0.00 %)
Spilled VGPRs: 36 -> 36 (0.00 %)
Code Size: 9901104 -> 9922928 (0.22 %) bytes
Max Waves: 7143 -> 7143 (0.00 %)

Code size slightly increases because it inserts more branch
instructions but that's expected. I don't see any real performance
changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105613
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-09 14:16:49 +02:00
Bas Nieuwenhuizen 38933c1151 radv: Add option to print errors even in optimized builds.
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.

In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Samuel Pitoiset 38a8c5903b radv: call nir_lower_io_to_temporaries for VS, GS, TES and FS
Do not lower FS inputs because this moves all load_var
instructions at beginning of shaders and because
interp_var_at_sample (and friends) seem broken. That might
be eventually enabled later on if we really want to preload
all FS inputs at beginning.

Polaris10:
Totals from affected shaders:
SGPRS: 54072 -> 54264 (0.36 %)
VGPRS: 38580 -> 38124 (-1.18 %)
Spilled SGPRs: 652 -> 652 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2128116 -> 2127380 (-0.03 %) bytes
Max Waves: 8048 -> 8086 (0.47 %)

Vega10:
Totals from affected shaders:
SGPRS: 52616 -> 52656 (0.08 %)
VGPRS: 37536 -> 37116 (-1.12 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Code Size: 2043756 -> 2042672 (-0.05 %) bytes
Max Waves: 9176 -> 9254 (0.85 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-24 09:18:57 +02:00
Samuel Pitoiset ded1509587 radv: call nir_split_var_copies() before nir_lower_var_copies()
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-24 09:18:54 +02:00
Samuel Pitoiset d8a61d3232 radv: set amdgpu-32bit-address-high-bits LLVM attribute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:15 +02:00
Samuel Pitoiset 6211799aff radv: remove the radv_finishme() when compiling shaders
Having an entrypoint different than "main" doesn't mean we
have multiple shaders per module.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:24 +02:00
Samuel Pitoiset 1e86eaf7d8 radv: remove radv_device::llvm_supports_spill
It's always true.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 13:48:21 +02:00
Samuel Pitoiset 8ade3e4684 radv: allow to dump the GS copy shader with RADV_DEBUG="shaders"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-14 12:38:00 +02:00
Timothy Arceri ce188813bf radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.

We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.

With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.

V2: add bit to radv_pipeline_key

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
2018-05-13 09:58:33 +10:00
Samuel Pitoiset 3a410f0afc radv: minor cleanups in radv_fill_shader_variant()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-11 12:35:05 +02:00
Iago Toral Quiroga 2d648e5ba3 compiler/lower_64bit_packing: rename the pass to be more generic
It can do 32-bit packing too now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-05-03 11:40:26 +02:00
Marek Olšák 43f0a10051 radeonsi: add triple into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Dave Airlie f77caa7411 ac/radv/radeonsi: refactor max simd waves into common code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-24 09:08:33 +10:00
Bas Nieuwenhuizen dffdef6737 radv: Add Vega M support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-19 16:36:21 +02:00
Bas Nieuwenhuizen 0e10790558 radv: Enable VK_EXT_descriptor_indexing.
This adds everything except non-uniform indexing, which needs a bit
more work and testing.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Daniel Schürmann f2c6a55061 radv: enable subgroup capabilities
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 01:03:15 +02:00
Samuel Pitoiset 466aba9fa2 radv: add RADV_NUM_PHYSICAL_VGPRS constant
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 14:28:13 +02:00
Samuel Pitoiset 2f7bb93146 radv: add radv_get_num_physical_sgprs() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 14:28:13 +02:00
Samuel Pitoiset acf60abc54 radv: enable VK_EXT_shader_viewport_index_layer
The driver already supports exporting the Layer and ViewportIndex
built-ins from vertex or tessellation shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-03 14:05:46 +02:00
Daniel Schürmann b91cd5dba4 radv: enable VK_AMD_shader_trinary_minmax extension
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-29 01:29:39 +02:00
Timothy Arceri 9a243eccae radv: don't lower indirects until after opts have run
Noticed while passing by. Not sure if it impacts anything, but
likely to impact GFX9 more than anything else since we lower
inputs, outputs and locals there.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-20 15:01:44 +11:00
Dave Airlie e8d9b7ab02 radv: lower constant initializers on output variables earlier
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.

Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.float

This is ported from anv:
99b57daf4a anv/pipeline: lower constant initializers on output variables earlier
from Iago Toral Quiroga <itoral@igalia.com>

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-19 19:29:40 +00:00
Samuel Pitoiset e96a1d27dc radv: run nir_opt_move_load_ubo
Polaris10:
SGPRS: 108560 -> 107856 (-0.65 %)
VGPRS: 74576 -> 74520 (-0.08 %)
Spilled SGPRs: 7375 -> 7113 (-3.55 %)
Code Size: 4273464 -> 4274364 (0.02 %) bytes
Max Waves: 9434 -> 9446 (0.13 %)

Vega10:
Totals from affected shaders:
SGPRS: 108264 -> 107576 (-0.64 %)
VGPRS: 69068 -> 69000 (-0.10 %)
Spilled SGPRs: 7221 -> 6959 (-3.63 %)
Code Size: 3800796 -> 3801496 (0.02 %) bytes
Max Waves: 10687 -> 10709 (0.21 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-16 09:58:19 +01:00
Dave Airlie 010d055aae radv: drop tess offchip layout for tcs.
This removes the last TCS specific user sgpr.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-03-16 05:22:54 +00:00
Samuel Pitoiset 81818662a5 radv: record LLVM IR when debugging shaders
If AMD_shader_info or RADV_TRACE_FILE is used we might need to
keep trace of LLVM IR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-15 17:20:03 +01:00
Samuel Pitoiset d07edf5fdf radv: add dump_shader to the NIR compiler options
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-15 17:20:00 +01:00
Alejandro Piñeiro 50767214a7 spirv/radv: add AMD_gcn_shader capability, remove current extensions
So now, during spirv_to_nir, it uses the capability instead of the
extension. Note that we are really doing here is treating
SPV_AMD_gcn_shader as other supported extensions. SPV_AMD_gcn_shader
is not the first SPV extension supported. For example, the capability
draw_parameters infers if the extension SPV_KHR_shader_draw_parameters
is supported or not.

This could be seen as counter-intuitive, and that it would be easier
to define which extensions are supported, and based our checks on
that, but we need to take into account that some capabilities are
optional from core, and others came from new extensions.

Also this commit would make the implementation of ARB_spirv_extensions
easier.

v2: AMD_gcn_shader capability renamed to gcn_shader (Daniel Schürmann)

Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-15 12:08:25 +01:00
Samuel Pitoiset fbe694562b ac/nir: move ac_nir_compiler_options and friends to radv folder
Also replace ac_ by radv_.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 16:54:23 +01:00
Samuel Pitoiset 237229430f ac: move ac_shader_info to radv folder
This is RADV specific code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 16:54:21 +01:00
Samuel Pitoiset b2653007b9 ac/nir: move all RADV related code to radv_nir_to_llvm.c
Now the "ac/nir" prefix will really be the shared code between
RadeonSI and RADV, that might avoid confusions in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Daniel Schürmann ffbf75cde4 radv: enable AMD_gcn_shader extension
Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-07 23:09:58 +01:00
Bas Nieuwenhuizen 5240fddb9d radv: Add trivial device group implementation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-07 21:18:35 +01:00
Bas Nieuwenhuizen 8f9af587a2 radv: Add minimal subgroup support.
Deliberately not implementing workgroup scopes as that is not needed
for core vulkan.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-07 21:18:35 +01:00
Samuel Pitoiset e96e6f60f7 radv: report the scratch private memory size with shader stats
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-06 10:38:42 +01:00
Timothy Arceri 0f2c7341e8 ac/radv: move lower_indirect_derefs() to ac_nir_to_llvm.c
Until llvm handles indirects better we will need to use these
workarounds in the radeonsi backend also.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-05 14:09:23 +11:00
Samuel Pitoiset 7aa008d1d7 radv: enable lowering of fpow to fexp2 and flog2
There is no fpow in hardware, so it's always lowered somewhere,
but it appears that lowering at NIR level is better. Figured while
comparing compute shaders between RadeonSI and RADV.

Polaris10:
Totals from affected shaders:
SGPRS: 18936 -> 18904 (-0.17 %)
VGPRS: 12240 -> 12220 (-0.16 %)
Spilled SGPRs: 2809 -> 2809 (0.00 %)
Code Size: 718116 -> 719848 (0.24 %) bytes
Max Waves: 1409 -> 1410 (0.07 %)

Vega10:
Totals from affected shaders:
SGPRS: 18392 -> 18392 (0.00 %)
VGPRS: 12008 -> 11920 (-0.73 %)
Spilled SGPRs: 3001 -> 2981 (-0.67 %)
Code Size: 777444 -> 778788 (0.17 %) bytes
Max Waves: 1503 -> 1504 (0.07 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:40:47 +01:00
Bas Nieuwenhuizen 05d84ed68a radv: Always lower indirect derefs after nir_lower_global_vars_to_local.
Otherwise new local variables can cause hangs on vega.

CC: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105098
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-02-15 23:45:59 +01:00
Samuel Pitoiset 3488a3f033 radv: run nir_opt_shrink_load
LLVM can't shrink loads.

Polaris10:
Totals from affected shaders:
SGPRS: 62528 -> 59955 (-4.11 %)
VGPRS: 44708 -> 44616 (-0.21 %)
Spilled SGPRs: 16 -> 8 (-50.00 %)
Code Size: 1355504 -> 1355172 (-0.02 %) bytes
Max Waves: 11710 -> 11670 (-0.34 %)

Vega10:
Totals from affected shaders:
SGPRS: 51448 -> 50371 (-2.09 %)
VGPRS: 39140 -> 39048 (-0.24 %)
Spilled SGPRs: 16 -> 16 (0.00 %)
Code Size: 1307188 -> 1304296 (-0.22 %) bytes
Max Waves: 11312 -> 11292 (-0.18 %)

This reduces SGPRs spilling in MadMax, and it also reduces
number of SGPRs in DOW3 and F12017. The number of waves slightly
decreases in F1 but I don't see any performance changes after
benchmarking it. Talos and Serious Sam are not affected because
they don't use any push constants.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-06 23:08:44 +01:00
Timothy Arceri 5b8de4bdff nir: add vs_inputs_dual_locations compiler option
Allows nir drivers to either use a single or dual locations for
vs double inputs.

i965 uses dual locations for both OpenGL and Vulkan drivers, for
now gallium OpenGL drivers only use a single location.

The following patch will also make use of this option when
calling nir_shader_gather_info().

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-01-30 09:08:47 +11:00
Samuel Pitoiset 33e6e5e6a4 radv: add an option that allows to dump pre-optimization ir
With RADV_DEBUG=preoptir.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-01-22 12:28:33 +01:00
Bas Nieuwenhuizen 0f89f9b8eb radv: Replace an assert with unreachable.
Otherwise we get uninitialized variable warnings for es_vgpr_comp_cnt.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-19 00:38:45 +01:00
Timothy Arceri f0d74ecce8 radv/radeonsi/nir: lower 64bit flrp
Fixes a bunch of arb_gpu_shader_fp64 piglit tests for example:

generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-mix-double-double-double.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-13 18:04:40 +11:00
Samuel Pitoiset 4e701cf75c radv/gfx9: calculate the number of ES VGPRs for merged shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:53 +01:00
Samuel Pitoiset 232c418af5 radv/gfx9: enable LDS for GS only if the ES type is TES
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-10 12:31:51 +01:00
Samuel Pitoiset b462ceb482 radv/gfx9: do not load VGPR1 when GS uses points or lines
VGPR1 is only needed for topology that needs 3 offsets like
triangles or quads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:53 +01:00
Samuel Pitoiset a3c2a86757 radv: make shader BOs read-only for the GPU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-08 21:24:51 +01:00
Samuel Pitoiset 2670ebb584 radv/gfx9: reduce the number of input VGPRs for the GS stage
This can still be improved, but let's start with this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-04 18:43:25 +01:00
Samuel Pitoiset 4237c3d645 radv: properly load unused gl_LocalInvocationID/gl_WorkGroupID components
F1 2017 looks good now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-19 21:26:25 +01:00
Samuel Pitoiset bb01661918 Revert "radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components"
This reverts commit 2294d35b24.

We can't do this without adjusting the input SGPRs/VGPRs logic.
For now, just revert it. I will send a proper solution later.

It fixes a rendering issue in F1 2017 that CTS didn't catch up.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-18 11:50:02 +01:00
Samuel Pitoiset 90c3bf0789 radv: do not load the local invocation index when it's unused
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:22:26 +01:00
Samuel Pitoiset 2294d35b24 radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components
We should also not load the input SGPRs and VGPRS, but
let's start with this for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:22:06 +01:00
Jason Ekstrand e19c623128 spirv: Convert the supported_extensions struct to spirv_options
This is a bit more general and lets us pass additional options into the
spirv_to_nir pass beyond what capabilities we support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2017-12-02 08:09:11 -08:00
Samuel Pitoiset 921986b580 radv: do not dump meta shaders with RADV_DEBUG=shaders
It's really annoying and this pollutes the output especially
when a bunch of non-meta shaders are compiled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-01 11:38:26 +01:00
Samuel Pitoiset cd64a4f705 radv: use vk_error() everywhere an error is returned
For consistency and it might help for debugging purposes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-13 11:05:26 +01:00
Dave Airlie 3bf8be41b8 radv: pre-calculate user_data_0 registers and store in pipeline
There's no point recalculating these the whole time on descriptor
emission, just store them at pipeline creation.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-06 21:44:49 +00:00
Alex Smith 134a40d2a6 radv: Fix -Wformat-security issue
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103513
Fixes: de88979413 ("radv: Implement VK_AMD_shader_info")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-10-30 10:58:56 +01:00
Alex Smith de88979413 radv: Implement VK_AMD_shader_info
This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.

When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.

v2: Improvements to resource usage reporting
v3: Disassembly string must be null terminated (string_buffer's length
    does not include the terminator)
v4: Fixed LDS reporting. (Bas)

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-29 00:28:45 +02:00
Dave Airlie a639d40f13 radv: add support for local bos. (v3)
This uses the new kernel interfaces for reduced cs overhead,
We only set the local flag for memory allocations that don't have
 a dedicated allocation and ones that aren't imports.

v2: add to all the internal buffer creation paths.
v3: missed some command submission paths, handle 0/empty bo lists.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 23:59:28 +01:00
Timothy Arceri f0a2bbd1a4 radv: move nir print after linking is done
We now have linking optimisations so we want to delay dumping the
nir until after these are complete.

Fixes: 06f05040eb (radv: Link shaders)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-24 10:41:38 +11:00
Bas Nieuwenhuizen c07d719e8b radv: Disallow indirect outputs for GS on GFX9 as well.
Since it also uses the output vector before writing to memory.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-10-23 00:27:44 +02:00
Bas Nieuwenhuizen 6ce550453f radv: Don't use vgpr indexing for outputs on GFX9.
Due to LLVM bugs. Fixes a bunch of dEQP-VK.glsl.indexing.*
tests.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-22 02:36:37 +02:00
Jason Ekstrand 59fb59ad54 nir: Get rid of nir_shader::stage
It's redundant with nir_shader::info::stage.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-10-20 12:49:17 -07:00
Bas Nieuwenhuizen 73749caf0e radv: calculate and emit GFX9 GS registers to pipeline state.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-20 06:23:47 +01:00
Timothy Arceri 087e010b2b radv: copy indirect lowering settings from radeonsi
It looks the original indirect mask was probably copied from
ANV.

Sascha Willems demo results:

tessellation ~4000 -> ~4200 fps

V2: continue lowering local indirects due to llvm deficiencies.

Tested-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-20 08:01:26 +11:00
Bas Nieuwenhuizen 228325f4b7 radv: Modify rsrc1/rsrc2 generation for merged tess.
No OC_LDS_EN for HS, and the included LS vgpr_comp_cnt is at
a different offset.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-19 22:25:44 +02:00
Bas Nieuwenhuizen 91b033f4f6 radv: Update GFX9 user data regs for GS/tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-19 22:25:27 +02:00
Bas Nieuwenhuizen ce03c119ce radv: Add code to compile merged shaders.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-19 22:25:23 +02:00
Bas Nieuwenhuizen a996ed1f9b ac/nir: Change interface to allow multiple source shaders.
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-10-19 22:24:47 +02:00
Bas Nieuwenhuizen 06f05040eb radv: Link shaders.
Here we make use of NIR the linking helpers to remove unused
varyings.

Sascha Willems demo results:

computecullandlod 39 -> 41 fps
pipelines ~6100 -> ~6200 fps

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2017-10-18 09:19:35 +11:00
Timothy Arceri 7664aaf331 radv: remove duplicate debug_flags field
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-12 08:52:38 +11:00
Dave Airlie 2c61594d84 radv: lower ffma in nir.
So it appears the Vulkan SPIR-V fma opcode can be equivalent to a
mad operation, and the fma hw opcode on AMD hw is issued like a double
opcode so is slower. Also the radeonsi stack does this.

This appears to improve performance on a number of games from Feral,
and thanks to Feral for noticing the problem.

I'm reposting this one as Marek indicated he thinks this is what
we should be doing on AMD hw.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-11 07:31:27 +10:00
Marek Olšák 7b697c8b78 amd: move r600d_common.h into r600g
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:27:06 +02:00
Samuel Pitoiset 844ae722c4 radv: dump SPIRV when a GPU hang is detected
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-04 19:37:08 +02:00
Samuel Pitoiset a2a350a3be radv: dump NIR when a GPU hang is detected
This looks a bit ugly to me, but the existing codepath
is not terribly elegant as well.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-04 19:37:08 +02:00
Samuel Pitoiset 80b8d9f7e7 radv: add radv_shader_dump_stats() helper
To dump the shader stats when a hang is detected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Dave Airlie 64d9bd149a radv/nir: call opt_remove_phis after trivial continues.
With the shaders in the ssao demo, the nir_opt_if wasn't
working properly without this, after this the if gets optimised
so that loop unrolling gets called.

(loop unrolling fails due to instruction count, but at least
it gets to do that.)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13 21:13:03 +01:00
Samuel Pitoiset 885d75760b radv: keep track of the disasm string in debug mode only
This will allow to dump the active shaders when a hang is
detected. Only the ASM will be dumped for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-08 17:18:17 +02:00
Samuel Pitoiset 92db23f3f9 radv: add shader_variant_create() helper function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-08 17:17:40 +02:00
Samuel Pitoiset 47efc5264a radv: drop 'dump' parameters from some shader related functions
The device object contains the debug flags.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-08 17:17:40 +02:00
Samuel Pitoiset d4d777317b radv: move shaders related code to radv_shader.c
Reduce size of radv_pipeline.c and improve code isolation. More
code can probably moved but it's a start.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-08 17:17:40 +02:00