KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Bas Nieuwenhuizen	699e1f5aac	ac: Use DPP for build_ddxy where possible. WQM is pretty reliable now on LLVM 7, so let us just use DPP + WQM. This gives approximately a 1.5% performance increase on the vrcompositor built-in benchmark. v2: Use ac_build_quad_swizzle. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-05-23 21:02:45 +02:00
Marek Olšák	f9eb1ef870	amd: remove support for LLVM 4.0 It doesn't support GFX9. Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-05-17 14:54:41 -04:00
Dave Airlie	eba4cf797c	ac/llvm: use amdgcn.tbuffer.store instead of SI.tbuffer.store intrinsic Drop the use of the old intrinsic. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-05-17 11:46:53 +10:00
Nicolai Hähnle	c0acb596f4	amd/common: use llvm.amdgcn.wqm for explicit derivatives To comply with an upcoming change in LLVM, see https://reviews.llvm.org/D46051 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-04 11:02:48 +02:00
Samuel Pitoiset	d136a5fad9	ac: fix the number of coordinates for ac_image_get_lod and arrays This fixes crashes for the following CTS: dEQP-VK.glsl.texture_functions.query.texturequerylod.* Cubemaps are the same as 2D arrays. Fixes: `625dcbbc45` ("amd/common: pass address components individually to ac_build_image_intrinsic") Cc: 18.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-23 21:48:38 +02:00
Nicolai Hähnle	24fb3e6aa1	ac/nir: use ac_build_image_opcode for image intrinsics So that we'll use the dimension-aware intrinsics in the future. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:30:07 +02:00
Nicolai Hähnle	74063431f1	radeonsi: generate image load/store/atomic ops using ac_build_image_opcode In preparation of dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:29:57 +02:00
Nicolai Hähnle	625dcbbc45	amd/common: pass address components individually to ac_build_image_intrinsic This is in preparation for the new image intrinsics. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:23:52 +02:00
Nicolai Hähnle	f931583828	amd/common: pass new enum ac_image_dim to ac_build_image_opcode This is in preparation for the new, dimension-aware LLVM image intrinsics. Acked-by: Marek Olšák <marek.olsak@amd.com>	2018-04-20 09:23:40 +02:00
Daniel Schürmann	d5f7ebda3e	ac: add LLVM build functions for subgroup instrinsics Co-authored-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-14 01:03:09 +02:00
Daniel Schürmann	d19f20e793	ac: make ballot and umsb capable of 64bit inputs Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-14 00:52:22 +02:00
Marek Olšák	dc04e4bba2	radeonsi: move FMASK shader logic to shared code We'll need it for FBFETCH in both TGSI and NIR paths. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-04-02 13:55:22 -04:00
Bas Nieuwenhuizen	4503ff760c	ac/nir: Add workaround for GFX9 buffer views. On GFX9 whether the buffer size is interpreted as elements or bytes depends on whether IDXEN is enabled in the instruction. If the index is a constant zero, LLVM optimizes IDXEN to 0. Now the size in elements is interpreted in bytes which of course results in out of bounds accesses. The correct fix is most likely to disable the LLVM optimization, but we need something to work with LLVM <= 6.0. radeonsi does the max between stride and element count on the CPU but that results in the size intrinsics returning the wrong size for the buffer. This would cause CTS errors for radv. v2: Also include the store changes. Fixes: `e38685cc62` 'Revert "radv: disable support for VEGA for now."' Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-29 00:03:03 +02:00
Samuel Pitoiset	61a91ca3f5	ac/nir: move unpack_param() to ac_llvm_build.c Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 14:05:06 +01:00
Samuel Pitoiset	28bb6873ec	ac/nir: move trim_vector to ac_llvm_build.c Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 14:05:06 +01:00
Samuel Pitoiset	895632baef	ac/nir: move cast_ptr() to ac_llvm_build.c Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 14:05:06 +01:00
Samuel Pitoiset	bf6368297b	ac/nir: move ac_build_alloca() to ac_llvm_build.c As well as si_build_alloca_undef() and drop the si prefix. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 14:05:06 +01:00
Timothy Arceri	42627dabb4	ac: add if/loop build helpers These have been ported over from radeonsi. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-08 10:12:34 +11:00
Samuel Pitoiset	675dde13b2	ac: update enabled channels mask when optimizing PARAM exports When the mask is not 0xf we need to update the number of enabled channels, otherwise the hardware won't emit the components that are combined. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-06 10:37:52 +01:00
Samuel Pitoiset	322a51b549	ac: add ac_build_fsign() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-05 11:04:36 +01:00
Samuel Pitoiset	e8bdde2289	ac: add ac_build_isign() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-05 11:04:32 +01:00
Samuel Pitoiset	459e33900f	ac: add ac_build_fract() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-05 11:04:30 +01:00
Marek Olšák	931ec80eeb	radeonsi: implement 32-bit pointers in user data SGPRs (v2) User SGPRs changes: VS: 14 -> 9 TCS: 14 -> 10 TES: 10 -> 6 GS: 8 -> 4 GSCOPY: 2 -> 1 PS: 9 -> 5 Merged VS-TCS: 24 -> 16 Merged VS-GS: 18 -> 11 Merged TES-GS: 18 -> 11 SGPRS: 2170102 -> 2158430 (-0.54 %) VGPRS: 1645656 -> 1641516 (-0.25 %) Spilled SGPRs: 9078 -> 8810 (-2.95 %) Spilled VGPRs: 130 -> 114 (-12.31 %) Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread Code Size: 52094872 -> 52692540 (1.15 %) bytes Max Waves: 371848 -> 372723 (0.24 %) v2: - the shader cache needs to take address32_hi into account - set amdgpu-32bit-address-high-bits Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)	2018-02-17 04:52:17 +01:00
Timothy Arceri	12a2350e6d	ac: add 64bit support to ac_find_lsb() v2: use LLVMBuildTrunc() Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-09 09:42:59 +11:00
Timothy Arceri	a9f6b392c7	ac: move get_elem_bits() to ac_llvm_build.c Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-09 09:42:59 +11:00
Samuel Pitoiset	bd9f7b7635	ac: add ac_build_export_null() helper Imported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-08 22:11:42 +01:00
Timothy Arceri	b7b89bbddb	ac/radeonsi: create ac_build_shader_clock() helper Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-07 08:43:08 +11:00
Marek Olšák	3bf1e036e8	amd: remove support for LLVM 3.9 Only these are supported: - LLVM 4.0 - LLVM 5.0 - LLVM 6.0 - master (7.0) Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-02 23:47:40 +01:00
Marek Olšák	847d0a393d	radeonsi: use pknorm_i16/u16 and pk_i16/u16 LLVM intrinsics Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-02 16:46:22 +01:00
Marek Olšák	bac9fa9f17	ac: add glc parameter to ac_build_buffer_load_format Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-01 16:20:19 +01:00
Marek Olšák	be973ed21f	radeonsi: load the right number of components for VS inputs and TBOs The supported counts are 1, 2, 4. (3=4) The following snippet loads float, vec2, vec3, and vec4: Before: buffer_load_format_x v9, v4, s[0:3], 0 idxen ; E0002000 80000904 buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ; E00C2000 80020005 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ; E00C2000 80010507 After: buffer_load_format_x v10, v4, s[0:3], 0 idxen ; E0002000 80000A04 buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ; E0042000 80020805 buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ; E00C2000 80010307 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-02-01 16:20:19 +01:00
Dave Airlie	16dd0eb517	ac/llvm: bump the number of results to 8. This function can get access for a 64-bit dvec4, which means we have to load 8 components. This fixes: R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-31 05:37:16 +10:00
Marek Olšák	b633999a4e	ac: rename and move si_const_array into common code Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-01-27 02:09:09 +01:00
Samuel Pitoiset	51e14bc3c0	ac: pass the number of channels to ac_build_buffer_load_format() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-26 12:14:27 +01:00
Samuel Pitoiset	d7c93b558a	ac: add ac_build_buffer_load_common() helper For both versions of llvm.amdgcn.buffer.load.{format}.*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-26 12:14:27 +01:00
Timothy Arceri	5b9362c248	ac: fix ac_build_varying_gather_values() for packed layouts This fixes a segfault for varyings not starting at component 0. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-01-23 10:00:52 +11:00
Timothy Arceri	38876c88d1	ac: add i64_0 and i64_1 to llvm build context These will be used in the following patch. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-14 11:40:03 +11:00
Timothy Arceri	d7b6b8ba52	ac: add f64_0 to the llvm build context Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-01-12 09:29:18 +11:00
Timothy Arceri	c0eb304acd	ac: add f64_1 to the llvm build context Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-01-12 09:29:17 +11:00
Samuel Pitoiset	7239e265eb	amd/common: import get_{load,store}_intr_attribs() from RadeonSI v2: move those helpers to the header and use static inline Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)	2018-01-10 19:02:23 +01:00
Marek Olšák	a140aeb619	ac: add ac_build_fmin/fmax helpers Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-01-06 09:51:43 +01:00
Timothy Arceri	4a0c24f2dd	ac: rework ac_llvm_extract_elem() Simplifies the logic a little and asserts index is 0. Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-05 12:20:38 +11:00
Timothy Arceri	b99ebaa4fd	ac: move some helpers to ac_llvm_build.c We will call these from the radeonsi NIR backend. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-05 11:58:55 +11:00
Samuel Pitoiset	03ef264146	amd/common: pass the family to ac_llvm_context_init() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-22 10:38:44 +01:00
Samuel Pitoiset	225b198802	amd/common: add ac_build_waitcnt() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-14 22:24:44 +01:00
Samuel Pitoiset	d43e72fd8c	radeonsi: make use of ac_build_fdiv() And move the comment to amd/common. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-14 22:24:38 +01:00
Timothy Arceri	caf15ce670	ac: move build_varying_gather_values() to ac_llvm_build.h and expose Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-12-04 12:52:19 +11:00
Timothy Arceri	7f4966731f	ac: add v2f32 to the common code and make use of it Reviewed-by: Marek Olšák <marek.olsak@amd.com Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-03 14:54:46 +11:00
Timothy Arceri	ee376ac6f4	ac: add v3i32 to the common code and make use of it Reviewed-by: Marek Olšák <marek.olsak@amd.com Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-03 14:54:45 +11:00
Timothy Arceri	309a51411d	ac: add v2i32 to the common code and use it Reviewed-by: Marek Olšák <marek.olsak@amd.com Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-11-03 14:54:45 +11:00

1 2 3

123 Commits