Commit Graph

225 Commits

Author SHA1 Message Date
Samuel Pitoiset 045fae0f73 ac: add ac_build_{struct,raw}_tbuffer_load() helpers
The struct version sets IDXEN=1, while the raw version sets IDXEN=0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-13 14:15:05 +01:00
Samuel Pitoiset 489dac0d21 ac: rework typed buffers loads for LLVM 7
Be more generic, this will be used by an upcoming series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-13 13:31:06 +01:00
Bas Nieuwenhuizen a1fdd4a4a7 radv: Fix float16 interpolation set up.
float16 types can have non-flat interpolation so set up the HW
correctly for that.

Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-22 17:06:55 +01:00
Rhys Perry 4261edc067 ac/nir: make ac_build_fdiv support 16-bit floats
v2: don't use ac_get_onef()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:29 +00:00
Rhys Perry 6790b3a8db ac/nir: make ac_build_isign work on all bit sizes
v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:20 +00:00
Rhys Perry bbbfdef683 ac/nir: make ac_build_clamp work on all bit sizes
v2: don't use ac_get_zerof() and ac_get_onef()
v3: rename "intr" to "name"

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:58 +00:00
Samuel Pitoiset 2cf5433b99 ac: use new LLVM 8 intrinsic when loading 16-bit values
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:20 +01:00
Samuel Pitoiset f0223143a8 ac: add ac_build_llvm8_tbuffer_load() helper
It uses the new LLVM intrinsics.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:17 +01:00
Samuel Pitoiset 2154fac6f3 ac: make use of ac_build_expand_to_vec4() in visit_image_store()
And make ac_build_expand() a static function.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:09:48 +01:00
Bas Nieuwenhuizen 58c8dadd32 amd/common: Implement ptr->int casts in ac_to_integer.
For the implicit casts inherent in nir.

This should probably have been done for shared memory for
VK_KHR_variable_pointers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:40 +01:00
Bas Nieuwenhuizen e00d9a9a72 amd/common: Add gep helper for pointer increment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:36 +01:00
Michel Dänzer 1a20b56798 amd/common: Restore v4i32 suffix for llvm.SI.load.const intrinsic
It was accidentally dropped in commit e4803ab7d2 "amd/common: use
llvm.amdgcn.s.buffer.load for LLVM 8.0", breaking the universe with LLVM
7.

Trivial.
2019-01-14 12:52:52 +01:00
Nicolai Hähnle 7fbd48fdc0 amd/common/vi+: enable SMEM loads with GLC=1
Only on LLVM 8.0+, which supports the new intrinsic.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-14 08:30:15 +01:00
Nicolai Hähnle e4803ab7d2 amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0
llvm.SI.load.const is deprecated.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-14 08:30:12 +01:00
Marek Olšák 492ad9a402 ac: remove unused variable from ac_build_ddxy
trivial
2019-01-07 14:51:25 -05:00
Nicolai Hähnle 8efaffa893 amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scan
Allow for a unified but efficient treatment of adding a bitmask over a
wave or an entire threadgroup.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:19 +01:00
Nicolai Hähnle 300876a9a7 amd/common: scan/reduce across waves of a workgroup
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:17 +01:00
Nicolai Hähnle 3963402fd3 amd/common: add ac_build_ifcc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:15 +01:00
Nicolai Hähnle 3c77f26ccc amd/common: whitespace fixes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-19 12:01:12 +01:00
Rhys Perry 12dc7cb202 ac: refactor visit_load_buffer
This is so that we can split different types of loads more easily.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-16 14:56:10 +00:00
Samuel Pitoiset 3fbdcd942f amd: remove support for LLVM 6.0
User are encouraged to switch to LLVM 7.0 released in September 2018.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-06 14:02:56 +01:00
Dave Airlie ec9fe8abc7 ac: avoid casting pointers on bcsel and stores
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-21 08:54:25 +10:00
Bas Nieuwenhuizen dd0172e865 radv: Use structured intrinsics instead of indexing workaround for GFX9.
These force the index to be used in the instruction so we don't need the
workaround.

Totals:
SGPRS: 1321642 -> 1321802 (0.01 %)
VGPRS: 943664 -> 943788 (0.01 %)
Spilled SGPRs: 28468 -> 28480 (0.04 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 52415292 -> 52338932 (-0.15 %) bytes
LDS: 400 -> 400 (0.00 %) blocks
Max Waves: 233903 -> 233803 (-0.04 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 238344 -> 238504 (0.07 %)
VGPRS: 232732 -> 232856 (0.05 %)
Spilled SGPRs: 13125 -> 13137 (0.09 %)
Spilled VGPRs: 88 -> 89 (1.14 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 80 -> 80 (0.00 %) dwords per thread
Code Size: 15752712 -> 15676352 (-0.48 %) bytes
LDS: 139 -> 139 (0.00 %) blocks
Max Waves: 31680 -> 31580 (-0.32 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-19 23:36:00 +01:00
Marek Olšák 8676af12c8 ac: fix ac_build_fdiv for f64
trivial

Fixes: a5f35aa742
2018-10-29 17:24:21 -04:00
Connor Abbott 59535b05cf ac: Introduce ac_build_expand()
And implement ac_bulid_expand_to_vec4() on top of it.

Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-22 09:44:51 +02:00
Marek Olšák bfc795670e ac: add helpers for fast integer division by a constant 2018-10-16 17:23:25 -04:00
Samuel Pitoiset 416013b4f5 radv: emit the GLC bit for SSBO loads/stores when needed
This fixes some new memory model tests:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-12 08:42:08 +02:00
Marek Olšák 77903c8cfb ac: add ac_build_round 2018-10-06 21:50:09 -04:00
Marek Olšák 82f5f89bf6 ac: simplify LLVM alloca helpers 2018-10-06 21:50:09 -04:00
Marek Olšák a668c8d6ba ac: define all address spaces properly 2018-10-06 21:50:09 -04:00
Samuel Pitoiset cd76ce0078 ac: add 16-bit support to ac_build_bitfield_reverse()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:37 +02:00
Samuel Pitoiset fc398f4d67 ac: add 16-bit support to ac_build_bit_count()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:34 +02:00
Samuel Pitoiset 94dd08eb7c ac: add 16-bit support to ac_find_lsb()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:32 +02:00
Samuel Pitoiset 5a6c8ca3e8 ac: add 16-bit support to ac_build_umsb()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:30 +02:00
Samuel Pitoiset 3e7f3e2cd1 ac: add 16-bit support to ac_build_isign()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:28 +02:00
Samuel Pitoiset cfd6314cfe ac: add 16-bit constant values for zero and one
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:26 +02:00
Samuel Pitoiset 074e29183c ac: add ac_build_bifield_reverse() helper
Are we missing 64-bit support?

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:23 +02:00
Samuel Pitoiset 371c35e5bb ac: add ac_build_bit_count() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-09-17 15:18:20 +02:00
Marek Olšák be0bd95abf radeonsi: fix GPU hangs with bindless textures and LLVM 7.0
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák cc36ebbdc3 ac: use iN_0/1 constants
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák a5f35aa742 ac: revert new LLVM 7.0 behavior for fdiv
Cc: 18.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-09-10 15:19:56 -04:00
Marek Olšák 60beac9efc ac,radeonsi: use ac_build_fmad
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák 659f2e0fcb ac: add imad & fmad helpers
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák 2276f8f064 ac: add ac_build_s_barrier
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-08-21 20:50:37 -04:00
Marek Olšák fd1121e839 amd: remove support for LLVM 5.0
Users are encouraged to switch to LLVM 6.0 released in March 2018.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-03 18:36:11 -04:00
Daniel Schürmann a6a21e651d ac: add support for 16bit UBO loads
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Daniel Schürmann f582367d49 ac: add 16bit conversion operations
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-23 23:16:25 +02:00
Marek Olšák 4695984dbc ac: fold LLVMContext creation into ac_llvm_context_init
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-07-04 15:48:18 -04:00
Marek Olšák e5e57c3a5e ac: handle undefined EQAA samples in ac_apply_fmask_to_sample
RADV might wanna use this helper too.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:12 -04:00
Timothy Arceri fae3b38770 ac: fix possible truncation of intrinsic name
Fixes the gcc warning:
snprintf’ output between 26 and 33 bytes into a destination of size 32

Fixes: d5f7ebda3e ("ac: add LLVM build functions for subgroup instrinsics")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-08 09:24:15 +10:00
Bas Nieuwenhuizen 4fc2d5e141 amd/common: Fix number of coords for getlod.
The LLVM 6 code reduced it to a non-array call. We need to do that
with the new code too.

This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv.

Fixes: a9a7993441 "amd/common: use the dimension-aware image intrinsics on LLVM 7+"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-07 23:59:52 +02:00
Nicolai Hähnle a9a7993441 amd/common: use the dimension-aware image intrinsics on LLVM 7+
Requires LLVM trunk r329166.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-06-04 21:34:59 +02:00
Bas Nieuwenhuizen 699e1f5aac ac: Use DPP for build_ddxy where possible.
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.

This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.

v2: Use ac_build_quad_swizzle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-23 21:02:45 +02:00
Marek Olšák f9eb1ef870 amd: remove support for LLVM 4.0
It doesn't support GFX9.

Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-17 14:54:41 -04:00
Dave Airlie eba4cf797c ac/llvm: use amdgcn.tbuffer.store instead of SI.tbuffer.store intrinsic
Drop the use of the old intrinsic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-05-17 11:46:53 +10:00
Nicolai Hähnle c0acb596f4 amd/common: use llvm.amdgcn.wqm for explicit derivatives
To comply with an upcoming change in LLVM, see
https://reviews.llvm.org/D46051

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-04 11:02:48 +02:00
Samuel Pitoiset d136a5fad9 ac: fix the number of coordinates for ac_image_get_lod and arrays
This fixes crashes for the following CTS:
dEQP-VK.glsl.texture_functions.query.texturequerylod.*

Cubemaps are the same as 2D arrays.

Fixes: 625dcbbc45 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-23 21:48:38 +02:00
Nicolai Hähnle 24fb3e6aa1 ac/nir: use ac_build_image_opcode for image intrinsics
So that we'll use the dimension-aware intrinsics in the future.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:30:07 +02:00
Nicolai Hähnle 74063431f1 radeonsi: generate image load/store/atomic ops using ac_build_image_opcode
In preparation of dimension-aware LLVM image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:29:57 +02:00
Nicolai Hähnle 625dcbbc45 amd/common: pass address components individually to ac_build_image_intrinsic
This is in preparation for the new image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:23:52 +02:00
Nicolai Hähnle f931583828 amd/common: pass new enum ac_image_dim to ac_build_image_opcode
This is in preparation for the new, dimension-aware LLVM image
intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
2018-04-20 09:23:40 +02:00
Daniel Schürmann d5f7ebda3e ac: add LLVM build functions for subgroup instrinsics
Co-authored-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 01:03:09 +02:00
Daniel Schürmann d19f20e793 ac: make ballot and umsb capable of 64bit inputs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-14 00:52:22 +02:00
Marek Olšák dc04e4bba2 radeonsi: move FMASK shader logic to shared code
We'll need it for FBFETCH in both TGSI and NIR paths.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-02 13:55:22 -04:00
Bas Nieuwenhuizen 4503ff760c ac/nir: Add workaround for GFX9 buffer views.
On GFX9 whether the buffer size is interpreted as elements or bytes
depends on whether IDXEN is enabled in the instruction. If the index
is a constant zero, LLVM optimizes IDXEN to 0.

Now the size in elements is interpreted in bytes which of course
results in out of bounds accesses.

The correct fix is most likely to disable the LLVM optimization,
but we need something to work with LLVM <= 6.0.

radeonsi does the max between stride and element count on the CPU
but that results in the size intrinsics returning the wrong size
for the buffer. This would cause CTS errors for radv.

v2: Also include the store changes.

Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-29 00:03:03 +02:00
Samuel Pitoiset 61a91ca3f5 ac/nir: move unpack_param() to ac_llvm_build.c
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Samuel Pitoiset 28bb6873ec ac/nir: move trim_vector to ac_llvm_build.c
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Samuel Pitoiset 895632baef ac/nir: move cast_ptr() to ac_llvm_build.c
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Samuel Pitoiset bf6368297b ac/nir: move ac_build_alloca() to ac_llvm_build.c
As well as si_build_alloca_undef() and drop the si prefix.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-13 14:05:06 +01:00
Timothy Arceri 42627dabb4 ac: add if/loop build helpers
These have been ported over from radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-03-08 10:12:34 +11:00
Samuel Pitoiset 675dde13b2 ac: update enabled channels mask when optimizing PARAM exports
When the mask is not 0xf we need to update the number of
enabled channels, otherwise the hardware won't emit the
components that are combined.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-03-06 10:37:52 +01:00
Samuel Pitoiset 322a51b549 ac: add ac_build_fsign()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-05 11:04:36 +01:00
Samuel Pitoiset e8bdde2289 ac: add ac_build_isign()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-05 11:04:32 +01:00
Samuel Pitoiset 459e33900f ac: add ac_build_fract()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-05 11:04:30 +01:00
Marek Olšák 931ec80eeb radeonsi: implement 32-bit pointers in user data SGPRs (v2)
User SGPRs changes:
    VS:     14 ->  9
    TCS:    14 -> 10
    TES:    10 ->  6
    GS:      8 ->  4
    GSCOPY:  2 ->  1
    PS:      9 ->  5
    Merged VS-TCS: 24 -> 16
    Merged VS-GS:  18 -> 11
    Merged TES-GS: 18 -> 11

SGPRS: 2170102 -> 2158430 (-0.54 %)
VGPRS: 1645656 -> 1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size: 52094872 -> 52692540 (1.15 %) bytes
Max Waves: 371848 -> 372723 (0.24 %)

v2: - the shader cache needs to take address32_hi into account
    - set amdgpu-32bit-address-high-bits

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
2018-02-17 04:52:17 +01:00
Timothy Arceri 12a2350e6d ac: add 64bit support to ac_find_lsb()
v2: use LLVMBuildTrunc()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Timothy Arceri a9f6b392c7 ac: move get_elem_bits() to ac_llvm_build.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Samuel Pitoiset bd9f7b7635 ac: add ac_build_export_null() helper
Imported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-08 22:11:42 +01:00
Timothy Arceri b7b89bbddb ac/radeonsi: create ac_build_shader_clock() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Marek Olšák 3bf1e036e8 amd: remove support for LLVM 3.9
Only these are supported:
- LLVM 4.0
- LLVM 5.0
- LLVM 6.0
- master (7.0)

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-02 23:47:40 +01:00
Marek Olšák 847d0a393d radeonsi: use pknorm_i16/u16 and pk_i16/u16 LLVM intrinsics
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-02 16:46:22 +01:00
Marek Olšák bac9fa9f17 ac: add glc parameter to ac_build_buffer_load_format
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Marek Olšák be973ed21f radeonsi: load the right number of components for VS inputs and TBOs
The supported counts are 1, 2, 4. (3=4)

The following snippet loads float, vec2, vec3, and vec4:

Before:
    buffer_load_format_x v9, v4, s[0:3], 0 idxen          ; E0002000 80000904
    buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen  ; E00C2000 80020005
    s_waitcnt vmcnt(0)                                    ; BF8C0F70
    buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206
    s_waitcnt vmcnt(0)                                    ; BF8C0F70
    buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen   ; E00C2000 80010507

After:
    buffer_load_format_x v10, v4, s[0:3], 0 idxen         ; E0002000 80000A04
    buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen    ; E0042000 80020805
    buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006
    s_waitcnt vmcnt(0)                                    ; BF8C0F70
    buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen   ; E00C2000 80010307

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Dave Airlie 16dd0eb517 ac/llvm: bump the number of results to 8.
This function can get access for a 64-bit dvec4, which means we
have to load 8 components.

This fixes:
R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 05:37:16 +10:00
Marek Olšák b633999a4e ac: rename and move si_const_array into common code
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-27 02:09:09 +01:00
Samuel Pitoiset 51e14bc3c0 ac: pass the number of channels to ac_build_buffer_load_format()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-26 12:14:27 +01:00
Samuel Pitoiset d7c93b558a ac: add ac_build_buffer_load_common() helper
For both versions of llvm.amdgcn.buffer.load.{format}.*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-26 12:14:27 +01:00
Timothy Arceri 5b9362c248 ac: fix ac_build_varying_gather_values() for packed layouts
This fixes a segfault for varyings not starting at component 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-23 10:00:52 +11:00
Timothy Arceri 38876c88d1 ac: add i64_0 and i64_1 to llvm build context
These will be used in the following patch.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-14 11:40:03 +11:00
Timothy Arceri d7b6b8ba52 ac: add f64_0 to the llvm build context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-12 09:29:18 +11:00
Timothy Arceri c0eb304acd ac: add f64_1 to the llvm build context
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-12 09:29:17 +11:00
Samuel Pitoiset 7239e265eb amd/common: import get_{load,store}_intr_attribs() from RadeonSI
v2: move those helpers to the header and use static inline

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
2018-01-10 19:02:23 +01:00
Marek Olšák a140aeb619 ac: add ac_build_fmin/fmax helpers
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-06 09:51:43 +01:00
Timothy Arceri 4a0c24f2dd ac: rework ac_llvm_extract_elem()
Simplifies the logic a little and asserts index is 0.

Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 12:20:38 +11:00
Timothy Arceri b99ebaa4fd ac: move some helpers to ac_llvm_build.c
We will call these from the radeonsi NIR backend.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Samuel Pitoiset 03ef264146 amd/common: pass the family to ac_llvm_context_init()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-22 10:38:44 +01:00
Samuel Pitoiset 225b198802 amd/common: add ac_build_waitcnt()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:44 +01:00
Samuel Pitoiset d43e72fd8c radeonsi: make use of ac_build_fdiv()
And move the comment to amd/common.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:24:38 +01:00
Timothy Arceri caf15ce670 ac: move build_varying_gather_values() to ac_llvm_build.h and expose
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-12-04 12:52:19 +11:00
Timothy Arceri 7f4966731f ac: add v2f32 to the common code and make use of it
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-03 14:54:46 +11:00