Rhys Perry
bbbfdef683
ac/nir: make ac_build_clamp work on all bit sizes
...
v2: don't use ac_get_zerof() and ac_get_onef()
v3: rename "intr" to "name"
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:58 +00:00
Samuel Pitoiset
2cf5433b99
ac: use new LLVM 8 intrinsic when loading 16-bit values
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:20 +01:00
Samuel Pitoiset
f0223143a8
ac: add ac_build_llvm8_tbuffer_load() helper
...
It uses the new LLVM intrinsics.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:17 +01:00
Samuel Pitoiset
2154fac6f3
ac: make use of ac_build_expand_to_vec4() in visit_image_store()
...
And make ac_build_expand() a static function.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:09:48 +01:00
Bas Nieuwenhuizen
58c8dadd32
amd/common: Implement ptr->int casts in ac_to_integer.
...
For the implicit casts inherent in nir.
This should probably have been done for shared memory for
VK_KHR_variable_pointers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:40 +01:00
Bas Nieuwenhuizen
e00d9a9a72
amd/common: Add gep helper for pointer increment.
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:36 +01:00
Michel Dänzer
1a20b56798
amd/common: Restore v4i32 suffix for llvm.SI.load.const intrinsic
...
It was accidentally dropped in commit e4803ab7d2
"amd/common: use
llvm.amdgcn.s.buffer.load for LLVM 8.0", breaking the universe with LLVM
7.
Trivial.
2019-01-14 12:52:52 +01:00
Nicolai Hähnle
7fbd48fdc0
amd/common/vi+: enable SMEM loads with GLC=1
...
Only on LLVM 8.0+, which supports the new intrinsic.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-14 08:30:15 +01:00
Nicolai Hähnle
e4803ab7d2
amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0
...
llvm.SI.load.const is deprecated.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-14 08:30:12 +01:00
Marek Olšák
492ad9a402
ac: remove unused variable from ac_build_ddxy
...
trivial
2019-01-07 14:51:25 -05:00
Nicolai Hähnle
8efaffa893
amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scan
...
Allow for a unified but efficient treatment of adding a bitmask over a
wave or an entire threadgroup.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:19 +01:00
Nicolai Hähnle
300876a9a7
amd/common: scan/reduce across waves of a workgroup
...
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:17 +01:00
Nicolai Hähnle
3963402fd3
amd/common: add ac_build_ifcc
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:15 +01:00
Nicolai Hähnle
3c77f26ccc
amd/common: whitespace fixes
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:12 +01:00
Rhys Perry
12dc7cb202
ac: refactor visit_load_buffer
...
This is so that we can split different types of loads more easily.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-16 14:56:10 +00:00
Samuel Pitoiset
3fbdcd942f
amd: remove support for LLVM 6.0
...
User are encouraged to switch to LLVM 7.0 released in September 2018.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-06 14:02:56 +01:00
Dave Airlie
ec9fe8abc7
ac: avoid casting pointers on bcsel and stores
...
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-21 08:54:25 +10:00
Bas Nieuwenhuizen
dd0172e865
radv: Use structured intrinsics instead of indexing workaround for GFX9.
...
These force the index to be used in the instruction so we don't need the
workaround.
Totals:
SGPRS: 1321642 -> 1321802 (0.01 %)
VGPRS: 943664 -> 943788 (0.01 %)
Spilled SGPRs: 28468 -> 28480 (0.04 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 52415292 -> 52338932 (-0.15 %) bytes
LDS: 400 -> 400 (0.00 %) blocks
Max Waves: 233903 -> 233803 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 238344 -> 238504 (0.07 %)
VGPRS: 232732 -> 232856 (0.05 %)
Spilled SGPRs: 13125 -> 13137 (0.09 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 15752712 -> 15676352 (-0.48 %) bytes
LDS: 139 -> 139 (0.00 %) blocks
Max Waves: 31680 -> 31580 (-0.32 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-19 23:36:00 +01:00
Marek Olšák
8676af12c8
ac: fix ac_build_fdiv for f64
...
trivial
Fixes: a5f35aa742
2018-10-29 17:24:21 -04:00
Connor Abbott
59535b05cf
ac: Introduce ac_build_expand()
...
And implement ac_bulid_expand_to_vec4() on top of it.
Fixes: 7e7ee82698
("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-22 09:44:51 +02:00
Marek Olšák
bfc795670e
ac: add helpers for fast integer division by a constant
2018-10-16 17:23:25 -04:00
Samuel Pitoiset
416013b4f5
radv: emit the GLC bit for SSBO loads/stores when needed
...
This fixes some new memory model tests:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-12 08:42:08 +02:00
Marek Olšák
77903c8cfb
ac: add ac_build_round
2018-10-06 21:50:09 -04:00
Marek Olšák
82f5f89bf6
ac: simplify LLVM alloca helpers
2018-10-06 21:50:09 -04:00
Marek Olšák
a668c8d6ba
ac: define all address spaces properly
2018-10-06 21:50:09 -04:00
Samuel Pitoiset
cd76ce0078
ac: add 16-bit support to ac_build_bitfield_reverse()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:37 +02:00
Samuel Pitoiset
fc398f4d67
ac: add 16-bit support to ac_build_bit_count()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:34 +02:00
Samuel Pitoiset
94dd08eb7c
ac: add 16-bit support to ac_find_lsb()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:32 +02:00
Samuel Pitoiset
5a6c8ca3e8
ac: add 16-bit support to ac_build_umsb()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:30 +02:00
Samuel Pitoiset
3e7f3e2cd1
ac: add 16-bit support to ac_build_isign()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:28 +02:00
Samuel Pitoiset
cfd6314cfe
ac: add 16-bit constant values for zero and one
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:26 +02:00
Samuel Pitoiset
074e29183c
ac: add ac_build_bifield_reverse() helper
...
Are we missing 64-bit support?
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:23 +02:00
Samuel Pitoiset
371c35e5bb
ac: add ac_build_bit_count() helper
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:20 +02:00
Marek Olšák
be0bd95abf
radeonsi: fix GPU hangs with bindless textures and LLVM 7.0
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák
cc36ebbdc3
ac: use iN_0/1 constants
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák
a5f35aa742
ac: revert new LLVM 7.0 behavior for fdiv
...
Cc: 18.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák
60beac9efc
ac,radeonsi: use ac_build_fmad
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák
659f2e0fcb
ac: add imad & fmad helpers
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák
2276f8f064
ac: add ac_build_s_barrier
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák
fd1121e839
amd: remove support for LLVM 5.0
...
Users are encouraged to switch to LLVM 6.0 released in March 2018.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-03 18:36:11 -04:00
Daniel Schürmann
a6a21e651d
ac: add support for 16bit UBO loads
...
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Daniel Schürmann
f582367d49
ac: add 16bit conversion operations
...
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Marek Olšák
4695984dbc
ac: fold LLVMContext creation into ac_llvm_context_init
...
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-07-04 15:48:18 -04:00
Marek Olšák
e5e57c3a5e
ac: handle undefined EQAA samples in ac_apply_fmask_to_sample
...
RADV might wanna use this helper too.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:12 -04:00
Timothy Arceri
fae3b38770
ac: fix possible truncation of intrinsic name
...
Fixes the gcc warning:
snprintf’ output between 26 and 33 bytes into a destination of size 32
Fixes: d5f7ebda3e
("ac: add LLVM build functions for subgroup instrinsics")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-08 09:24:15 +10:00
Bas Nieuwenhuizen
4fc2d5e141
amd/common: Fix number of coords for getlod.
...
The LLVM 6 code reduced it to a non-array call. We need to do that
with the new code too.
This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv.
Fixes: a9a7993441
"amd/common: use the dimension-aware image intrinsics on LLVM 7+"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-07 23:59:52 +02:00
Nicolai Hähnle
a9a7993441
amd/common: use the dimension-aware image intrinsics on LLVM 7+
...
Requires LLVM trunk r329166.
Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-06-04 21:34:59 +02:00
Bas Nieuwenhuizen
699e1f5aac
ac: Use DPP for build_ddxy where possible.
...
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.
This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.
v2: Use ac_build_quad_swizzle.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-23 21:02:45 +02:00
Marek Olšák
f9eb1ef870
amd: remove support for LLVM 4.0
...
It doesn't support GFX9.
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-17 14:54:41 -04:00
Dave Airlie
eba4cf797c
ac/llvm: use amdgcn.tbuffer.store instead of SI.tbuffer.store intrinsic
...
Drop the use of the old intrinsic.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-17 11:46:53 +10:00
Nicolai Hähnle
c0acb596f4
amd/common: use llvm.amdgcn.wqm for explicit derivatives
...
To comply with an upcoming change in LLVM, see
https://reviews.llvm.org/D46051
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-04 11:02:48 +02:00
Samuel Pitoiset
d136a5fad9
ac: fix the number of coordinates for ac_image_get_lod and arrays
...
This fixes crashes for the following CTS:
dEQP-VK.glsl.texture_functions.query.texturequerylod.*
Cubemaps are the same as 2D arrays.
Fixes: 625dcbbc45
("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-23 21:48:38 +02:00
Nicolai Hähnle
24fb3e6aa1
ac/nir: use ac_build_image_opcode for image intrinsics
...
So that we'll use the dimension-aware intrinsics in the future.
Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:30:07 +02:00
Nicolai Hähnle
74063431f1
radeonsi: generate image load/store/atomic ops using ac_build_image_opcode
...
In preparation of dimension-aware LLVM image intrinsics.
Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:29:57 +02:00
Nicolai Hähnle
625dcbbc45
amd/common: pass address components individually to ac_build_image_intrinsic
...
This is in preparation for the new image intrinsics.
Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:23:52 +02:00
Nicolai Hähnle
f931583828
amd/common: pass new enum ac_image_dim to ac_build_image_opcode
...
This is in preparation for the new, dimension-aware LLVM image
intrinsics.
Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:23:40 +02:00
Daniel Schürmann
d5f7ebda3e
ac: add LLVM build functions for subgroup instrinsics
...
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 01:03:09 +02:00
Daniel Schürmann
d19f20e793
ac: make ballot and umsb capable of 64bit inputs
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 00:52:22 +02:00
Marek Olšák
dc04e4bba2
radeonsi: move FMASK shader logic to shared code
...
We'll need it for FBFETCH in both TGSI and NIR paths.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-02 13:55:22 -04:00
Bas Nieuwenhuizen
4503ff760c
ac/nir: Add workaround for GFX9 buffer views.
...
On GFX9 whether the buffer size is interpreted as elements or bytes
depends on whether IDXEN is enabled in the instruction. If the index
is a constant zero, LLVM optimizes IDXEN to 0.
Now the size in elements is interpreted in bytes which of course
results in out of bounds accesses.
The correct fix is most likely to disable the LLVM optimization,
but we need something to work with LLVM <= 6.0.
radeonsi does the max between stride and element count on the CPU
but that results in the size intrinsics returning the wrong size
for the buffer. This would cause CTS errors for radv.
v2: Also include the store changes.
Fixes: e38685cc62
'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-29 00:03:03 +02:00
Samuel Pitoiset
61a91ca3f5
ac/nir: move unpack_param() to ac_llvm_build.c
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Samuel Pitoiset
28bb6873ec
ac/nir: move trim_vector to ac_llvm_build.c
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Samuel Pitoiset
895632baef
ac/nir: move cast_ptr() to ac_llvm_build.c
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Samuel Pitoiset
bf6368297b
ac/nir: move ac_build_alloca() to ac_llvm_build.c
...
As well as si_build_alloca_undef() and drop the si prefix.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Timothy Arceri
42627dabb4
ac: add if/loop build helpers
...
These have been ported over from radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-03-08 10:12:34 +11:00
Samuel Pitoiset
675dde13b2
ac: update enabled channels mask when optimizing PARAM exports
...
When the mask is not 0xf we need to update the number of
enabled channels, otherwise the hardware won't emit the
components that are combined.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-06 10:37:52 +01:00
Samuel Pitoiset
322a51b549
ac: add ac_build_fsign()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-05 11:04:36 +01:00
Samuel Pitoiset
e8bdde2289
ac: add ac_build_isign()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-05 11:04:32 +01:00
Samuel Pitoiset
459e33900f
ac: add ac_build_fract()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-05 11:04:30 +01:00
Marek Olšák
931ec80eeb
radeonsi: implement 32-bit pointers in user data SGPRs (v2)
...
User SGPRs changes:
VS: 14 -> 9
TCS: 14 -> 10
TES: 10 -> 6
GS: 8 -> 4
GSCOPY: 2 -> 1
PS: 9 -> 5
Merged VS-TCS: 24 -> 16
Merged VS-GS: 18 -> 11
Merged TES-GS: 18 -> 11
SGPRS: 2170102 -> 2158430 (-0.54 %)
VGPRS: 1645656 -> 1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size: 52094872 -> 52692540 (1.15 %) bytes
Max Waves: 371848 -> 372723 (0.24 %)
v2: - the shader cache needs to take address32_hi into account
- set amdgpu-32bit-address-high-bits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
2018-02-17 04:52:17 +01:00
Timothy Arceri
12a2350e6d
ac: add 64bit support to ac_find_lsb()
...
v2: use LLVMBuildTrunc()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Timothy Arceri
a9f6b392c7
ac: move get_elem_bits() to ac_llvm_build.c
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Samuel Pitoiset
bd9f7b7635
ac: add ac_build_export_null() helper
...
Imported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-08 22:11:42 +01:00
Timothy Arceri
b7b89bbddb
ac/radeonsi: create ac_build_shader_clock() helper
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Marek Olšák
3bf1e036e8
amd: remove support for LLVM 3.9
...
Only these are supported:
- LLVM 4.0
- LLVM 5.0
- LLVM 6.0
- master (7.0)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-02 23:47:40 +01:00
Marek Olšák
847d0a393d
radeonsi: use pknorm_i16/u16 and pk_i16/u16 LLVM intrinsics
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-02 16:46:22 +01:00
Marek Olšák
bac9fa9f17
ac: add glc parameter to ac_build_buffer_load_format
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Marek Olšák
be973ed21f
radeonsi: load the right number of components for VS inputs and TBOs
...
The supported counts are 1, 2, 4. (3=4)
The following snippet loads float, vec2, vec3, and vec4:
Before:
buffer_load_format_x v9, v4, s[0:3], 0 idxen ; E0002000 80000904
buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ; E00C2000 80020005
s_waitcnt vmcnt(0) ; BF8C0F70
buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206
s_waitcnt vmcnt(0) ; BF8C0F70
buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ; E00C2000 80010507
After:
buffer_load_format_x v10, v4, s[0:3], 0 idxen ; E0002000 80000A04
buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ; E0042000 80020805
buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006
s_waitcnt vmcnt(0) ; BF8C0F70
buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ; E00C2000 80010307
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Dave Airlie
16dd0eb517
ac/llvm: bump the number of results to 8.
...
This function can get access for a 64-bit dvec4, which means we
have to load 8 components.
This fixes:
R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 05:37:16 +10:00
Marek Olšák
b633999a4e
ac: rename and move si_const_array into common code
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-27 02:09:09 +01:00
Samuel Pitoiset
51e14bc3c0
ac: pass the number of channels to ac_build_buffer_load_format()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-26 12:14:27 +01:00
Samuel Pitoiset
d7c93b558a
ac: add ac_build_buffer_load_common() helper
...
For both versions of llvm.amdgcn.buffer.load.{format}.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-26 12:14:27 +01:00
Timothy Arceri
5b9362c248
ac: fix ac_build_varying_gather_values() for packed layouts
...
This fixes a segfault for varyings not starting at component 0.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-23 10:00:52 +11:00
Timothy Arceri
38876c88d1
ac: add i64_0 and i64_1 to llvm build context
...
These will be used in the following patch.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-14 11:40:03 +11:00
Timothy Arceri
d7b6b8ba52
ac: add f64_0 to the llvm build context
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-12 09:29:18 +11:00
Timothy Arceri
c0eb304acd
ac: add f64_1 to the llvm build context
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-12 09:29:17 +11:00
Samuel Pitoiset
7239e265eb
amd/common: import get_{load,store}_intr_attribs() from RadeonSI
...
v2: move those helpers to the header and use static inline
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
2018-01-10 19:02:23 +01:00
Marek Olšák
a140aeb619
ac: add ac_build_fmin/fmax helpers
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-06 09:51:43 +01:00
Timothy Arceri
4a0c24f2dd
ac: rework ac_llvm_extract_elem()
...
Simplifies the logic a little and asserts index is 0.
Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 12:20:38 +11:00
Timothy Arceri
b99ebaa4fd
ac: move some helpers to ac_llvm_build.c
...
We will call these from the radeonsi NIR backend.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Samuel Pitoiset
03ef264146
amd/common: pass the family to ac_llvm_context_init()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-22 10:38:44 +01:00
Samuel Pitoiset
225b198802
amd/common: add ac_build_waitcnt()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:44 +01:00
Samuel Pitoiset
d43e72fd8c
radeonsi: make use of ac_build_fdiv()
...
And move the comment to amd/common.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:38 +01:00
Timothy Arceri
caf15ce670
ac: move build_varying_gather_values() to ac_llvm_build.h and expose
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-12-04 12:52:19 +11:00
Timothy Arceri
7f4966731f
ac: add v2f32 to the common code and make use of it
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-03 14:54:46 +11:00
Timothy Arceri
ee376ac6f4
ac: add v3i32 to the common code and make use of it
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-03 14:54:45 +11:00
Timothy Arceri
309a51411d
ac: add v2i32 to the common code and use it
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-03 14:54:45 +11:00
Dave Airlie
82d47b9d38
ac/llvm: consolidate find lsb function.
...
This was the same between si and ac.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 15:59:31 +10:00
Dave Airlie
a76b6c2192
ac/llvm: add i1false/i1true to common code.
...
These get used in fair few places.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 15:59:18 +10:00
Dave Airlie
f925f5b074
ac/nir: move lds declaration/load/store into shared code.
...
This was duplicated between both drivers, share here.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 15:59:11 +10:00