KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Timur Kristóf	637c5a1dd9	aco/wave32: Fix reductions. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Timur Kristóf	e0bcefc3a0	aco/wave32: Use lane mask regclass for exec/vcc. Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-12-04 10:36:01 +00:00
Rhys Perry	56c06c79fc	aco: implement 64-bit integer reductions The multiplication reduction is larger than it could be, but it should be easier to implement this way. No failures with dEQP-VK.subgroups.int64 except those caused by LLVM being used for other stages. v2: don't call setFixed() for v_add carry-out, since setHint sets physReg v3: add and use emit_vadd32() helper v4: use num_opcodes instead of last_opcode Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-11-19 18:58:04 +00:00
Rhys Perry	33277bd66e	aco: refactor reduction lowering helpers Should make 64-bit integer reductions easier to implement. v4: use num_opcodes instead of last_opcode Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)	2019-11-19 18:56:21 +00:00
Rhys Perry	df645fa369	aco: implement VK_KHR_shader_float_controls This actually supports more of the extension than the LLVM backend but we can't enable it because ACO doesn't work with all stages yet. With more of it enabled, some CTS tests fail because our 64-bit sqrt is very imprecise. I can't find any precision requirements for it anywhere, so I'm thinking it might be a CTS issue. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-15 17:36:21 +00:00
Rhys Perry	3204e83768	aco: use DPP instead of exec modification when lowering GFX10 shuffles Seems we can use DPP's row_mask field to get an effect similar to modifying exec. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-12 17:21:38 +00:00
Timur Kristóf	c52ebbcea4	aco: Introduce vgpr_limit to keep track of available VGPRs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Timur Kristóf	d59f702e26	aco: Implement subgroup shuffle in GFX10 wave64 mode. Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	c2eebfe3ea	aco: Remove dead code in reduction lowering. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	3865448012	aco: Fix reductions on GFX10. Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan with all reduction operations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-28 23:52:50 +00:00
Rhys Perry	2ea9e59e8d	aco: move s_andn2_b64 instructions out of the p_discard_if And use a new p_discard_early_exit instruction. This fixes some cases where a definition having the same register as an operand causes issues. v2: rename instruction to p_exit_early_if v2: modify the existing instruction instead of creating a new one v3: merge the "i == num - 1" IFs Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-09 16:19:02 +00:00
Daniel Schürmann	93c8ebfa78	aco: Initial commit of independent AMD compiler ACO (short for AMD Compiler) is a new compiler backend with the goal to replace LLVM for Radeon hardware for the RADV driver. ACO currently supports only VS, PS and CS on VI and Vega. There are some optimizations missing because of unmerged NIR changes which may decrease performance. Full commit history can be found at https://github.com/daniel-schuermann/mesa/commits/backend Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Co-authored-by: Connor Abbott <cwabbott0@gmail.com> Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com> Co-authored-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00

12 Commits