Commit Graph

79 Commits

Author SHA1 Message Date
Bas Nieuwenhuizen 0dd8189f15 radv: Only allow 16 user SGPRs for compute on GFX9+.
Apparently for compute there are only 16 instead of the 32 for the
graphics path.

Fixes dEQP-VK.binding_model.descriptorset_random.sets16.noarray.ubolimitlow.sbolimitlow.imglimitlow.noiub.comp.0

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-09-16 12:50:58 +02:00
Samuel Pitoiset 9de062ef20 radv: fix setting global locations for indirect descriptors
Indirect descriptors only need one entry, we don't have to
emit a location for every descriptors.

Fixes GPU hangs with new CTS:
dEQP-VK.binding_model.descriptorset_random.*

CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset aa30205929 radv: handle loc->indirect correctly for the first descriptor
This was wrong for descriptor #0 when all of them are indirect.
This is because indirect_offset was 0 and we emitted a
"normal" descriptor pointer for nothing.

While we are at it remove
radv_userdata_info::indirect_offset which is useless.

CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset b9f6521157 radv: bump the maximum number of arguments to 64
Bumping to 64 should be safe enough.

Fixes some crashes with new CTS:
dEQP-VK.binding_model.descriptorset_random.*

CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset c28ea92947 radv: tidy up ac_setup_rings() for the GSVS rings
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset 40fb8c7fca radv: fix setting the number of entries for GSVS on VI+
According to RadeonSI, it's unnecessary to multiply by
the stride. That field seems to always be 64.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset a006c24237 radv: always compute the number of components from the output mask
That removes two special cases for clip/cull distances.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset 9447e91329 radv: emit data contiguously in the GS->VS ring buffer
Instead of having holes. The other ring parameters like
offset and stride can be updated later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset fbc064a5b4 radv: make use of the output usage mask in GS copy shader
This is just for consistency because LLVM can detect and
remove unused loads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset 18464d298b radv: make use of ac_unpack_param() instead of ac_build_bfe()
Same code is generated because LLVM ends up by using bfe, but
that seems cleaner to me.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-14 10:59:52 +02:00
Samuel Pitoiset 7355e9326b radv: remove dead code in scan_shader_output_decl()
Never used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-31 17:34:41 +02:00
Samuel Pitoiset e9acf069b2 radv: remove radv_shader_context::num_output_{clips,culls}
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-31 17:34:41 +02:00
Samuel Pitoiset a6a6441c75 radv: adjust the cull dist mask in scan_shader_output_decl()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-31 17:34:41 +02:00
Samuel Pitoiset ea778e760c radv: get length of the clip/cull distances array from usage mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-31 17:34:41 +02:00
Samuel Pitoiset 732679c25e radv: do not recompute the output usage mask for clipdist twice
The shader info pass takes care of this now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-31 17:34:41 +02:00
Samuel Pitoiset 6f47df3129 radv: fix passing clip/cull distances from VS to PS
CTS doesn't test input clip/cull distances for the fragment
shader stage, which explains why this was totally broken. I
wrote a simple test locally that works now.

This fixes a crash with GTA V and DXVK.

Note that we are exporting unused parameters from the vertex
shader now, but this can't be optimized easily because we don't
keep the fragment shader info...

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107477
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-08-31 17:34:36 +02:00
Samuel Pitoiset 0608349232 radv: use ac_build_imad()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-08-22 09:17:40 +02:00
Bas Nieuwenhuizen 011a811652 radv: Revert divisor = 0 case for vertex attribute extension.
Seems like DXVK depends on that and it might get reverted
upstream. Since apps are not supposed to use 0 in v2 anyway,
we should be safe implementing the old behavior there.

Fixes: 66e12451ac "radv: Update to new VK_EXT_vertex_attribute_divisor to version 2."
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-16 11:13:19 +02:00
Bas Nieuwenhuizen 66e12451ac radv: Update to new VK_EXT_vertex_attribute_divisor to version 2.
Behavior wrt firstInstance got changed, and a divisor of 0 has been
disallowed.

The new version of the ext got published in specification 1.1.81.

Sending to stable since the only known user is DXVK, which needs
this for correctness.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.2 <mesa-stable@lists.freedesktop.org>
2018-08-14 22:13:09 +02:00
Samuel Pitoiset ff0d553818 radv: fix adjusting vertex fetches since 16bit support
Move the integer conversion after the fixup.

This fixes some regressions with
dEQP-VK.pipeline.vertex_input.single_attribute.mat4.as_a2r10g10b10*

Fixes: b722b29f10 ("radv: add support for 16bit input/output")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-26 08:57:43 +02:00
Daniel Schürmann b722b29f10 radv: add support for 16bit input/output
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Dave Airlie 6f3aee40f9 radv: using tls to store llvm related info and speed up compiles (v10)
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.

This should remove all the fixed overheads setup costs of creating
the pass manager each time.

This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)

v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-10 07:58:03 +10:00
Samuel Pitoiset 5e5a28d52a radv: reduce CPU overhead in radv_flush_descriptors()
The number of enabled descriptors for a given pipeline stage
can be computed at compile time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-09 13:56:58 +02:00
Marek Olšák 4695984dbc ac: fold LLVMContext creation into ac_llvm_context_init
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-07-04 15:48:18 -04:00
Dave Airlie 7398913a62 ac/radv: move llvm compiler info to struct and init in one place
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.

 Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 10:29:16 +10:00
Dave Airlie e1387eaf12 radv: create/destroy passmgr at the higher level.
This is prep work for moving this to a per-thread struct

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:31:05 +10:00
Dave Airlie 97d9b88447 radv: port to use common passmgr code.
This adds a inline always pass, but otherwise should work the
same.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-04 05:30:34 +10:00
Marek Olšák 32e413ca59 ac: move all LLVM module initialization into ac_create_module
This removes some ugly code around module initialization.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-07-02 14:34:39 -04:00
Samuel Pitoiset 70c1bee187 radv: do not use an user SGPR for the sample position offset
We know the number of samples at compile time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Samuel Pitoiset 20170865db radv: don't store the number of samples as log2
Needed for the following patch.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Samuel Pitoiset 5917761e3d radv: fix emitting the TCS regs on GFX9
The primitive ID is NULL and this generates an invalid
select instruction which crashes because one operand is NULL.

This fixes crashes in The Long Journey Home, Quantum Break
and Just Cause 3 with DXVK.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-16 10:18:51 +02:00
Samuel Pitoiset bfca15e16a radv: add RADV_DEBUG=checkir
This allows to run the LLVM verifier pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:08 +02:00
Samuel Pitoiset 45eb24fedf radv: run the EarlyCSEMemSSA LLVM pass
It's recommended by the instruction combining pass, and
RadeonSI also runs it.

This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.

Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)

Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-25 14:24:14 +02:00
Samuel Pitoiset 75e919c045 radv: fix computation of user sgprs for 32-bit pointers
With 32-bit pointers we only need one user SGPR per desc set.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:29 +02:00
Samuel Pitoiset c5536fc813 radv: drop user_sgpr_info::sgpr_count
It's only used inside allocate_user_sgprs().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:26 +02:00
Samuel Pitoiset 36a4d6d081 radv: add support for 32-bit pointers in user data SGPRs
We still use 64-bit GPU pointers for all ring buffers because
llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit
GPU pointers for now. This can be improved later anyways.

Vega10:
Totals from affected shaders:
SGPRS: 1008722 -> 1026710 (1.78 %)
VGPRS: 706580 -> 707136 (0.08 %)
Spilled SGPRs: 22555 -> 22209 (-1.53 %)
Spilled VGPRs: 75 -> 75 (0.00 %)
Code Size: 34819208 -> 35202140 (1.10 %) bytes
Max Waves: 175423 -> 175086 (-0.19 %)

Polaris10:
Totals from affected shaders:
SGPRS: 1029849 -> 1036517 (0.65 %)
VGPRS: 709984 -> 708872 (-0.16 %)
Spilled SGPRs: 22672 -> 22309 (-1.60 %)
Spilled VGPRs: 82 -> 66 (-19.51 %)
Scratch size: 76 -> 60 (-21.05 %) dwords per thread
Code Size: 34915336 -> 35309752 (1.13 %) bytes
Max Waves: 151221 -> 151677 (0.30 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:22 +02:00
Samuel Pitoiset b654ef5808 radv: add set_loc_shader_ptr() helper
This helper will hep for switching to 32-bit GPU pointers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:20 +02:00
Samuel Pitoiset d8a61d3232 radv: set amdgpu-32bit-address-high-bits LLVM attribute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:15 +02:00
Samuel Pitoiset 73df16dcee radv: fix centroid interpolation
It's legal to set the centroid and sample interpolation modes
when MSAA disabled. So, we have to initialize the centroid
inputs because the hardware doesn't.

This fixes rendering issues with DXVK and The Witness, World of
Warcraft, Trackmania and probably more games.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-21 13:57:46 +02:00
Samuel Pitoiset 03c4816093 radv: pass radv_nir_compiler_options directly to create_llvm_function()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-18 11:07:01 +02:00
Marek Olšák f9eb1ef870 amd: remove support for LLVM 4.0
It doesn't support GFX9.

Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-17 14:54:41 -04:00
Samuel Pitoiset 1fba2e10b3 radv: only declare the ESGS rings for pre GFX9 chips
GFX9 uses LDS instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:20 +02:00
Samuel Pitoiset 56d53ed1d6 radv: do not emit unnecessary ES output stores
GFX9:
Totals from affected shaders:
SGPRS: 472 -> 464 (-1.69 %)
VGPRS: 576 -> 584 (1.39 %)
Code Size: 45432 -> 44324 (-2.44 %) bytes
Max Waves: 40 -> 40 (0.00 %)

VI:
SGPRS: 720 -> 720 (0.00 %)
VGPRS: 728 -> 728 (0.00 %)
Code Size: 45348 -> 43992 (-2.99 %) bytes
Max Waves: 120 -> 120 (0.00 %)

This affects Rise of Tomb Raider and the three Vulkan demos
that use a geometry shader (geometryshader, deferredshadows
and viewportarray).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:13 +02:00
Samuel Pitoiset a6e44d1271 radv: do not emit unnecessary GS output stores
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 14:14:11 +02:00
Samuel Pitoiset 97b179570c radv: reduce the number of parameters export by the GS copy shader
By using the geometry shader output usage mask.

This improves all Vulkan demos that use a geometry shader
(ie. geometryshader, deferredshadows, viewportarray).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-14 21:38:23 +02:00
Samuel Pitoiset ea43d935ab radv: run the shader info pass before emitting the GS copy shader
For further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-14 21:38:19 +02:00
Bas Nieuwenhuizen 3d4d388e39 radv: Fix up 2_10_10_10 alpha sign.
Pre-Vega HW always interprets the alpha for this format as unsigned,
so we have to implement a fixup to do the sign correctly for signed
formats.

v2: Improve indexing mess.

CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 18:58:20 +02:00
Samuel Pitoiset efc10949cc radv: move ac_build_if_state on top of radv_nir_to_llvm.c
These helpers will be needed for future work.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-11 12:35:07 +02:00
Grazvydas Ignotas 4fdce205dd radv: assorted typo fixes
Trivial.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-10 11:50:46 +03:00
Bas Nieuwenhuizen 6ff98dbf7c radv: Implement VK_EXT_vertex_attribute_divisor.
Pretty straight forward, just pass the divisors through the shader
key and then do a LLVM divide.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-12 22:57:23 +02:00