Commit Graph

137 Commits

Author SHA1 Message Date
Marek Olšák b878444c3a amd: drop support for LLVM 10
It doesn't support RDNA 2.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
2021-04-16 09:25:19 +00:00
Samuel Pitoiset 936b58378c amd: drop support for LLVM 8
It doesn't support Navi1x and the removal enables this nice code cleanup.

v2: rebase - mareko

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
2021-04-16 09:25:19 +00:00
Marek Olšák 1dff495057 ac/llvm: implement 16-bit packed VS outputs and FS inputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9051>
2021-04-13 21:10:43 -04:00
Marek Olšák fdbcb58c06 ac/llvm: handle demote in LLVM 13 that just added support for it
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9362>
2021-03-17 00:42:27 +00:00
Marek Olšák 230a6dc55d ac,radeonsi: add sampler changes for Aldebaran
- no 3D and cube textures
- no mipmapping
- no border color
- image_sample is the only supported opcode with a sampler (behaves like _lz)

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
2021-03-10 18:02:27 +00:00
Marek Olšák 18c1c1404d ac/llvm: add type parameter into ac_build_buffer_load to fix 16-bit TES inputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395>
2021-03-03 20:06:09 +00:00
Marek Olšák 3475c79328 ac/llvm: add support for 16-bit source operands for samplers
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395>
2021-03-03 20:06:09 +00:00
Rhys Perry 6d5e26752c ac/nir: implement sparse image/texture loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
2021-01-08 14:27:07 +00:00
Rhys Perry c5973ede01 ac/nir: use llvm.readcyclecounter for LLVM9+
Unlike llvm.amdgcn.s.memtime, this works on GFX10.3

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4033
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8306>
2021-01-05 10:27:00 +00:00
James Park 31b4fdc008 amd: Cast to int for %d snprintf argument
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7791>
2020-11-27 20:49:00 -08:00
Marek Olšák d425d765bf ac: add build_alloca with an initializer
combining alloca_undef + BuildStore.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7542>
2020-11-18 06:19:59 +00:00
Marek Olšák aa757f4f8c ac/llvm: fix demote inside conditional branches
The big comment explains it.

v2: don't kill if subgroup ops are used

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7586>
2020-11-12 21:02:05 +00:00
Rhys Perry 9f43268772 ac/nir: implement 64-bit images
64-bit image atomics only work with LLVM 11+ because of a LLVM bug.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7234>
2020-11-09 18:28:59 +00:00
James Park 4bd18e772a amd/llvm,aco: Replace VLA with alloca
MSVC will never support VLA, so use alloca instead.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7157>
2020-11-03 07:44:02 +00:00
Marek Olšák e690a1b78b ac/llvm: don't lower bool to int32, switch to native i1 bool
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7077>
2020-10-20 10:21:39 +00:00
James Park 28d02b9d3e ac,amd/llvm,radv: Initialize structs with {0}
Necessary to compile with MSVC.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7123>
2020-10-14 12:15:23 +00:00
Samuel Pitoiset 31a0574b96 ac/nir: implement nir_op_fsat
With fmed3 if available, otherwise fallback to fmin/fmax.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6932>
2020-10-08 12:38:04 +00:00
Samuel Pitoiset 7a8f5eab71 ac/llvm: adjust dmask when image stores are shrinked using the format
It looks like GFX10 doesn't care about dmask if it's greater than
the number of components stored but it matters on GFX8-9 (I haven't
checked older gens).

Fixes: 1b4d968106 ("ac/llvm: fix invalid IR if image stores are shrinked using the format")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6982>
2020-10-05 08:13:24 +02:00
Samuel Pitoiset 1b4d968106 ac/llvm: fix invalid IR if image stores are shrinked using the format
It's not always v4f32 (or v4f16 for 16-bit) when image stores are
shrinked using the format.

This fixes a ton of crashes with RADV_DEBUG=checkir,llvm.

Fixes: e4d75c22be ("nir/opt_shrink_vectors: shrink image stores using the format")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6882>
2020-09-29 07:40:42 +00:00
Marek Olšák 98a52fecda radeonsi: implement 16-bit FS color outputs
This removes type conversions from 16 bits to 32 bits in the main function
and then back to 16 bits in the epilog.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6622>
2020-09-22 02:44:53 +00:00
Vinson Lee 50f1cd4076 ac/llvm: Fix nonportable sizeof.
Fix defect reported by Coverity.

Sizeof not portable (SIZEOF_MISMATCH)
suspicious_sizeof: Passing argument vec_size * 8UL /* sizeof
(LLVMValueRef *) */ to function __builtin_alloca and then casting
the return value to LLVMValueRef * is suspicious. In this
particular case sizeof (LLVMValueRef *) happens to be equal to
sizeof (LLVMValueRef), but this is not a portable assumption.

Fixes: ca74603b4f ("ac/llvm: add better code for isign")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6682>
2020-09-14 15:25:45 -07:00
Pierre-Eric Pelloux-Prayer 82d2d73e03 amd/llvm: switch to 3-spaces style
Follow-up of !4319 using the same clang-format config.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5310>
2020-09-07 10:00:20 +02:00
Marek Olšák e8d55e6db3 ac/llvm: fix b2f for v2f16
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284>
2020-09-06 14:36:21 +00:00
Marek Olšák d9a77f9ca3 ac/llvm: add better code for fsign
There are 2 improvements:
- better code for 16, 32, and 64 bits
- vector support for 16 and 32 bits

Totals:
SGPRS: 2639738 -> 2625882 (-0.52 %)
VGPRS: 1534120 -> 1533916 (-0.01 %)
Spilled SGPRs: 3541 -> 3557 (0.45 %)
Spilled VGPRs: 33 -> 33 (0.00 %)
Private memory VGPRs: 256 -> 256 (0.00 %)
Scratch size: 292 -> 292 (0.00 %) dwords per thread
Code Size: 55640332 -> 55384892 (-0.46 %) bytes
Max Waves: 964785 -> 964857 (0.01 %)

Totals from affected shaders:
SGPRS: 377352 -> 363496 (-3.67 %)
VGPRS: 209800 -> 209596 (-0.10 %)
Spilled SGPRs: 1979 -> 1995 (0.81 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 256 -> 256 (0.00 %)
Scratch size: 256 -> 256 (0.00 %) dwords per thread
Code Size: 12549300 -> 12293860 (-2.04 %) bytes
Max Waves: 105762 -> 105834 (0.07 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284>
2020-09-06 14:36:21 +00:00
Marek Olšák ca74603b4f ac/llvm: add better code for isign
There are 2 improvements:
- select v_med3_i32
- support vectors

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284>
2020-09-06 14:36:21 +00:00
Marek Olšák 7acc7ec33b ac/llvm: fix unaligned VS input loads on gfx10.3
Fixes: a23802bcb9

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6595>
2020-09-04 11:07:41 +00:00
Daniel Schürmann a79dad950b nir,amd: remove trinary_minmax opcodes
These consist of the variations nir_op_{i|u|f}{min|max|med}3 which are either
lowered in the backend (LLVM) anyway or can be recombined by the backend (ACO).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6421>
2020-08-24 20:56:11 +00:00
Bas Nieuwenhuizen 40e00c800c amd/llvm: Mark pointer function arguments as 32-byte aligned.
Otherwise LLVM does not see the pointers as allowing speculative
loads.

The pipeline-db results are pretty wild, but mostly what is to be
expected from allowing more code movement in LLVM:

Totals from affected shaders:
SGPRS: 157728 -> 168336 (6.73 %)
VGPRS: 158628 -> 158664 (0.02 %)
Spilled SGPRs: 10845 -> 24753 (128.24 %)
Spilled VGPRs: 13 -> 13 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8 -> 8 (0.00 %) dwords per thread
Code Size: 17189180 -> 17313712 (0.72 %) bytes
LDS: 204 -> 204 (0.00 %) blocks
Max Waves: 5700 -> 5687 (-0.23 %)
Wait states: 0 -> 0 (0.00 %)

This gives some boosts for shaders we can move a descriptor load
outside a loop.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3159>
2020-07-08 23:47:06 +00:00
Marek Olšák 2b8b62c55b ac/nir: fix 64-bit division for GL CTS
This fixes: KHR-GL45.gpu_shader_fp64.builtin.mod_*

Fixes: ba2ec1f3 "ac/nir: use llvm.amdgcn.rcp in ac_build_fdiv()"

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5531>
2020-06-23 04:46:55 +00:00
Pierre-Eric Pelloux-Prayer 993c64e6fe ac/llvm: load 1 byte at a time if unaligned on gfx10
If buffer or stride is unaligned we use the same trick as on gfx6:
load 1 byte at a time and recompose the output if needed.
This change fixes lots of deqp/glcts tests:
  - dEQP-GLES2.functional.draw.random.1, 10, ...
  - dEQP-GLES2.functional.vertex_arrays.multiple_attributes.stride.3_float2_0_float2_0_float2_17, ...
  - dEQP-GLES2.functional.vertex_arrays.single_attribute.first.byte_first24_offset1_stride2_quads256, ...
  - dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_17_byte2_vec4_dynamic_draw_quads_1, ...
  - dEQP-GLES31.functional.draw_indirect.random.14, ...

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5502>
2020-06-19 09:20:16 +02:00
Samuel Pitoiset 008b0d1701 ac/nir: adjust an assertion for D16 on GFX6-GFX7
16-bit types can be used with MUBUF on GFX6-GFX7.

Fixes: c3e0ba52a0 ("ac/nir: support 16-bit data in buffer_load_format opcodes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5325>
2020-06-08 08:45:32 +02:00
Marek Olšák c6c8a9bd55 ac/nir: support v2f16 derivatives
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>
2020-06-02 16:29:25 -04:00
Marek Olšák 70b6d54011 ac/nir: support 16-bit data in image opcodes
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>
2020-06-02 16:29:25 -04:00
Marek Olšák c3e0ba52a0 ac/nir: support 16-bit data in buffer_load_format opcodes
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>
2020-06-02 16:29:25 -04:00
Marek Olšák b819ba949b ac/nir: remove type and num_channels args from ac_build_buffer_store_common
They were only used for type overloading where we can just use
the type of data.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>
2020-06-02 16:29:25 -04:00
Marek Olšák b98df7bf50 ac/nir: support vector types in the type suffix of overloaded intrinsics
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>
2020-06-02 16:29:25 -04:00
Marek Olšák e5ea87cde8 ac/nir: use more types from ac_llvm_context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>
2020-06-02 16:29:25 -04:00
Samuel Pitoiset 14292310d9 ac/nir: implement nir_intrinsic_shader_clock with device scope
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5117>
2020-05-24 20:37:58 +02:00
Samuel Pitoiset b034f6cf2a ac/nir: fix shader clock with subgroup scope
The compiler should emit s_memtime instead of s_memrealtime for
the subgroup scope. I don't know why this LLVM 9 checks was for
but LLVM 8 also has this amdgcn intrinsic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5117>
2020-05-24 20:37:54 +02:00
Michel Dänzer 2a6811f0f9 Revert "ac,radeonsi: fix compilations issues with LLVM 11"
This reverts commit 42b1696ef6.

The corresponding LLVM changes were reverted.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5087>
2020-05-19 07:19:35 +00:00
Samuel Pitoiset 0d63a1a84d ac/llvm: add support for texturing with clamped LOD
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.

All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>
2020-05-14 10:05:44 +00:00
Pierre-Eric Pelloux-Prayer 7e7bb38bd8 radeonsi: fix export count
Fixes: 17acff01a0 ("radeonsi: skip vs output optimizations for some outputs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2877
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4871>
2020-05-04 15:11:09 +02:00
Samuel Pitoiset 42b1696ef6 ac,radeonsi: fix compilations issues with LLVM 11
Latest LLVM replaced LLVMVectorTypeKind.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2826
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4755>
2020-04-27 17:13:36 +00:00
Marek Olšák b4fd8f1919 ac,radeonsi: simplify checking for Navi1x chips
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4698>
2020-04-24 10:38:54 +00:00
Pierre-Eric Pelloux-Prayer 17acff01a0 radeonsi: skip vs output optimizations for some outputs
If PT_SPRITE_TEX is enabled, PS inputs are overriden at runtime so
we can't apply the vs output optim.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2747
Fixes: 3ec9975555 ("radeonsi: eliminate trivial constant VS outputs")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4559>
2020-04-20 08:45:16 +02:00
Samuel Pitoiset ba2ec1f369 ac/nir: use llvm.amdgcn.rcp in ac_build_fdiv()
Instead of emitting 1.0 / x which includes a slow division that
LLVM doesn't always optimize even if the metadata is correctly set.

No pipeline-db changes with VEGA10/LLVM 9.

pipeline-db (VEGA10/LLVM 10):
Totals from affected shaders:
SGPRS: 6672 -> 6672 (0.00 %)
VGPRS: 6652 -> 6652 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 561780 -> 561692 (-0.02 %) bytes
Max Waves: 1043 -> 1043 (0.00 %)

pipeline-db (VEGA10/LLVM 11 - 92744f62478):
Totals from affected shaders:
SGPRS: 84608 -> 83768 (-0.99 %)
VGPRS: 106768 -> 106636 (-0.12 %)
Spilled SGPRs: 1625 -> 1713 (5.42 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 10850936 -> 10726712 (-1.14 %) bytes
Max Waves: 3152 -> 3180 (0.89 %)

LLVM 11 (master) is more affected than previous versions, but
based on the small impact with LLVM 9/10, I decided to emit it
unconditionally.

Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4326>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4326>
2020-03-27 08:05:43 +01:00
Pierre-Eric Pelloux-Prayer 5533c41541 ac: fix ac_build_is_helper_invocation when postponed_kill is null
If there was no demote() in the shader, ac_build_is_helper_invocation
behaves exactly the same as ac_build_load_helper_invocation, i.e.
the helper lanes are the same as they were at the beginning of the shader.

Fixes: de57ea2a3d ("amd/llvm: implement nir_intrinsic_demote(_if) and nir_intrinsic_is_helper_invocation")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4301>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4301>
2020-03-25 08:19:38 +01:00
Marek Olšák 303842b2db ac: fix fast division
This stopped working with LLVM 11 and might occasionally have been broken
on older LLVM, because the metadata was set on the mul, not on the rcp.

Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4268>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4268>
2020-03-21 22:34:17 +00:00
Marek Olšák 63a5051ea6 ac: set new LLVM denormal flags
See: https://reviews.llvm.org/D71358

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4196>
2020-03-17 20:47:48 +00:00
Samuel Pitoiset cc320ef9af ac/llvm: add missing optimization barrier for 64-bit readlanes
Otherwise, LLVM optimizes it but it's actually incorrect.

Fixes: 0f45d4dc2b ("ac: add ac_build_readlane without optimization barrier")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3585>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3585>
2020-03-12 08:46:42 +01:00
Marek Olšák fc65df5651 ac: add a bug workaround for the 100% NGG culling case
Fixes: 8db00a51f8 - radeonsi/gfx10: implement NGG culling for 4x wave32 subgroups
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4079>
2020-03-09 16:08:11 -04:00
Daniel Schürmann de57ea2a3d amd/llvm: implement nir_intrinsic_demote(_if) and nir_intrinsic_is_helper_invocation
The current implementation uses a temporary helper variable
to ensure correct behavior until LLVM provides an intrinsic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
2020-03-09 12:29:32 +00:00
Samuel Pitoiset 9e5d2a73c5 ac/llvm: flush denorms for nir_op_fmed3 on GFX8 and older gens
The hardware doesn't flush denorms, exactly like fmin/fmax, so
we have to do it manually. This doesn't fix anything known.

Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
2020-02-27 08:04:33 +01:00
Samuel Pitoiset 30ac733680 ac/llvm: fix 16-bit fmed3 on GFX8 and older gens
16-bit med3 is only supported on GFX9+.

Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f16.*.

Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
2020-02-27 08:04:30 +01:00
Samuel Pitoiset 50b8c25274 ac/llvm: fix 64-bit fmed3
Lower 64-bit fmed3 because LLVM doesn't expose an intrinsic.

Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.mid3.f64.*.

Fixes: d6a07732c9 ("ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3962>
2020-02-27 08:04:28 +01:00
Samuel Pitoiset a31bcf2be6 ac/llvm: fix missing casts in ac_build_readlane()
Because ac_build_optimization_barrier() overwrites the original
src_type, we have to keep track of it before emitting that barrier.
Otherwise, wrong conversions are expected for pointers or small
bitsizes.

By doing this, we no longer need to do the cast dance in
ac_build_readlane_no_opt_barrier(), it was just necessary for
ac_build_optimization_barrier().

This fixes a bunch of crashes with subgroups related tests when
RADV_DEBUG=checkir is enabled, and it also fixes a compiler crash
with The Surge 2.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2395
Fixes: 0f45d4dc2b ("ac: add ac_build_readlane without optimization barrier")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3535>
2020-01-24 07:40:07 +01:00
Marek Olšák 4e4b2d13f0 ac: add helper ac_build_triangle_strip_indices_to_triangle
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 0f45d4dc2b ac: add ac_build_readlane without optimization barrier
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 77393cf39b ac: add prefix bitcount functions
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Timur Kristóf eccac46cdc ac/llvm: Fix ac_build_reduce in wave32 mode.
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2020-01-10 12:30:44 +01:00
Marek Olšák 9b71041627 ac: add ac_build_s_endpgm
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 16:03:48 -05:00
Marek Olšák 1c44480538 ac: add 128-bit bitcount
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 16:00:41 -05:00
Marek Olšák d1c8aeb24f ac: unify primitive export code
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 16:00:38 -05:00
Marek Olšák 1c77a18cc2 ac: unify build_sendmsg_gs_alloc_req
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 16:00:36 -05:00
Marek Olšák e5e3ffa6b9 ac: fix ac_get_i1_sgpr_mask for Wave32
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
2019-12-16 20:06:07 +00:00
Bas Nieuwenhuizen e09426ad6b amd/llvm: Refactor ac_build_scan.
Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-28 11:35:11 +01:00
Samuel Pitoiset 0812dbd403 ac: add 8-bit and 16-bit supports to ac_build_permlane16()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:42:42 +01:00
Samuel Pitoiset c9aa843961 radv/gfx10: fix implementation of exclusive scans
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.

Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-27 07:39:26 +01:00
Samuel Pitoiset f6770b9726 ac/llvm: fix warning in ac_build_canonicalize()
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 4567 |  return ac_build_intrinsic(ctx, intr, type, params, 1,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 4568 |       AC_FUNC_ATTR_READNONE);
      |       ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-26 08:35:10 +01:00
Marek Olšák f671cc4d95 ac: set swizzled bit in cache policy as a hint not to merge loads/stores
LLVM now merges loads and stores for all opcodes, so this must be set.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 16:48:27 -05:00
Connor Abbott 9885af3bdf ac: Add a shared interface between radv, radeonsi, LLVM and ACO
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:

- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.

- Stop using radv-specific structures for things like determining the
chip generation in ACO.

In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-25 14:12:46 +01:00
Daniel Schürmann 0cbcfc071e amd/llvm: Add Subgroup Scan functions for SI
The idea of this implementation is taken from the ROCm Device Libs:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-11-20 20:31:45 +00:00
Samuel Pitoiset 80c71cbbd8 ac: add 16-bit float support to ac_build_alu_op()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset 670aa24c69 ac: add 8-bit and 16-bit supports to ac_build_optimization_barrier()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset 21a9243f5e ac: add 8-bit and 16-bit supports to ac_build_wwm()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset ef352a2466 ac: add 8-bit and 16-bit supports to get_reduction_identity()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset c8af1d51d4 ac: add 8-bit and 16-bit supports to ac_build_swizzle()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset 1565118d8f ac: add 8-bit and 16-bit supports to ac_build_dpp()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset 2113867f0c ac: add 8-bit and 16-bit supports to ac_build_set_inactive()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset c29514bd22 ac: add 8-bit and 16-bit supports to ac_build_readlane()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset 58d5ab98a3 ac: add 8-bit and 16-bit supports to ac_build_shuffle()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset 204cf54b70 ac: remove useless cast in ac_build_set_inactive()
The return type is always the src type (32 or 64 bits).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-11-19 18:01:13 +00:00
Samuel Pitoiset bef7b2f805 ac: handle pointer types to LDS in ac_get_elem_bits()
This fixes crashes with some
dEQP-VK.spirv_assembly.instruction.spirv1p4.* tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-11-12 08:32:15 +01:00
Samuel Pitoiset 39760793b5 ac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointers
This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.

Most of these tests still fail because of an LLVM bug (they work
with ACO).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 22:32:01 +02:00
Samuel Pitoiset 7dfb15fff1 ac/llvm: add AC_FLOAT_MODE_ROUND_TO_ZERO
Because some instructions will be optimized by the backend compiler,
the driver has to manually flush to zero to keep the result exact.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-18 16:55:51 +02:00
Samuel Pitoiset d94bd4e512 ac/llvm: add ac_build_canonicalize() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-18 16:55:48 +02:00
Timur Kristóf 3a08110d43 amd: Move all amd/common code that depends on LLVM to amd/llvm.
This commit is a step towards the goal of being able to build RADV
without LLVM. In the future we would like to offer the option to
use RADV solely with ACO. There is still a need for the common AMD
code located in amd/common but the LLVM specific parts need to be
separated.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-08 00:44:08 +00:00
Renamed from src/amd/common/ac_llvm_build.c (Browse further)