KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Gert Wollny	e20a83eb86	r600/sb: fall back to un-optimized byte code when ra_init fails Some emulated fp64 piglit create code that seems to make the register allocation with sb worse than the original shader created from NIR, so fall back to using the un-optimized shader. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8563>	2021-01-20 12:12:07 +00:00
Dave Airlie	3bb2b2cc45	r600/sb: use different stacks for tracking lds and queue usage. The normal ssa renumbering isn't sufficient for LDS queue access, this uses two stacks, one for the lds queue, and one for the lds r/w ordering. The LDS oq values are incremented in their use in a linear fashion. The LDS rw values are incremented in their definitions and used in the next lds operation to ensure reordering doesn't occur. Acked-By: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-01-18 03:38:09 +00:00
Dave Airlie	5002dd4052	r600/sb: add gcm support to avoid clause between lds read/queue read You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP in the same basic block/clause. This makes sure once we've issues and MOV we don't add another block until we balance it with an LDS read. Acked-By: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-01-18 03:37:42 +00:00
Heiko Przybyl	e933246013	r600/sb: Fix loop optimization related hangs on eg Make sure unused ops and their references are removed, prior to entering the GCM (global code motion) pass, to stop GCM from breaking the loop logic and thus hanging the GPU. Turns out, that sb has problems with loops and node optimizations regarding associative folding: - the global code motion (gcm) pass moves ops up a loop level/basic block until they've fulfilled their total usage count - if there are ops folded into others, the usage count won't be fulfilled and thus the op moved way up to the top - within GCM the op would be visited and their deps would be moved alongside it, to fulfill the src constaints - in a loop, an unused op is moved out of the loop and GCM would move the src value ops up as well - now here arises the problem: if the loop counter is one of the src values it would get moved up as well, the loop break condition would never get hit and the shader turn into an endless loop, resulting in the GPU hanging and being reset A reduced (albeit nonsense) piglit example would be: [require] GLSL >= 1.20 [fragment shader] uniform int SIZE; uniform vec4 lights[512]; void main() { float x = 0; for(int i = 0; i < SIZE; i++) x += lights[2i+1].x; } [test] uniform int SIZE 1 draw rect -1 -1 2 2 Which gets optimized to: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 42 dw ===== 1 gprs ===== 2 stack ========================================= ALU 3 @24 1 y: MOV R0.y, 0 t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z LOOP_START_DX10 @22 PUSH @6 ALU 1 @30 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 2 @32 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x TEX 1 @36 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 1 @40 4 y: ADD R0.y, R0.y, R0.x LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Notice R0.z being the loop counter/break condition relevant register and being never incremented at all. Also some of the loop content has been moved out of it, to fulfill the requirements for the one unused op. With a debug build of mesa this would produce an error like error at : PRED_SETGE_INT __, __, EM.2, R1.x.2\|\|FP@R0.z, C0.x : operand value R1.x.2\|\|FP@R0.z was not previously written to its gpr and the compilation would fail due to this. On a release build it gets passed to the GPU. When using this patch, the loop remains intact: ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== ===== 48 dw ===== 1 gprs ===== 2 stack ========================================= ALU 2 @24 1 y: MOV R0.y, 0 z: MOV R0.z, 0 LOOP_START_DX10 @22 PUSH @6 ALU 1 @28 KC0[CB0:0-15] 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x JUMP @14 POP:1 LOOP_BREAK @20 POP @14 POP:1 ALU 4 @30 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x TEX 1 @40 VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] ALU 2 @44 5 y: ADD R0.y, R0.y, R0.x z: ADD_INT R0.z, R0.z, 1 LOOP_END @4 EXPORT_DONE PIXEL 0 R0.____ EOP ===== SHADER_END =============================================================== Piglit: ./piglit summary console -d results/_gpu_noglx name: unpatched_gpu_noglx patched_gpu_noglx ---- ------------------- ----------------- pass: 18016 18021 fail: 748 743 crash: 7 7 skip: 1124 1124 timeout: 0 0 warn: 13 13 incomplete: 0 0 dmesg-warn: 0 0 dmesg-fail: 0 0 changes: 0 5 fixes: 0 5 regressions: 0 0 total: 19908 19908 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900 Tested-by: Heiko Przybyl <lil_tux@web.de> Tested-on: Barts PRO HD6850 Signed-off-by: Heiko Przybyl <lil_tux@web.de> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-01-03 21:58:52 +01:00
Dave Airlie	3c8ef3a74b	r600g/sb: implement r600 gpr index workaround. (v3.1) r600, rv610 and rv630 all have a bug in their GPR indexing and how the hw inserts access to PV. If the base index for the src is the same as the dst gpr in a previous group, then it will use PV instead of using the indexed gpr correctly. The workaround is to insert a NOP when you detect this. v2: add second part of fix detecting DST rel writes followed by same src base index reads. v3: forget adding stuff to structs, just iterate over the previous node group again, makes it more obvious. v3.1: drop local_nop. Fixes ~200 piglit regressions on rv635 since SB was introduced. Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-12-16 12:44:45 +10:00
Glenn Kennard	dfa10ed264	r600g: Fix missing SET_TEXTURE_OFFSETS SB needs a bit of special handling to handle instructions without obvious side effects, to avoid it deleting them. Fixes failing non-const ARB_gpu_shader5 textureOffsets piglits with sb enabled. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2014-08-19 16:30:13 +02:00
Vadim Girlin	4cb04aa0df	r600g/sb: work around hw issues with stack on eg/cm v2: make it actually work, improve condition Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68503 Cc: "10.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-11-17 01:36:28 +04:00
Vadim Girlin	62c8149472	r600g/sb: fix issue with DCE between GVN and GCM (v2) We can't perform DCE using the liveness pass between GVN and GCM because it relies on the correct schedule, but GVN doesn't care about preserving correctness - it's rescheduled later by GCM. This patch makes dce_cleanup pass perform simple DCE between GVN and GCM instead of relying on liveness pass. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088 Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-10-17 07:57:49 +04:00
Vadim Girlin	07baf9cfd1	r600g/sb: improve alu packing on cayman Scheduler/register allocator in r600-sb was developed and optimized on evergreen (VLIW-5) hardware, so currently it's not optimal for VLIW-4 chips. This patch should improve performance on cayman gpus due to better alu packing, but also it tends to increase register usage, so overall positive effect on performance has to be proven by real benchmarks yet. Some results with bfgminer kernel on cayman: source bytecode: 60 gprs, 3905 alu groups, sbcl before the patch: 45 gprs, 4088 alu groups, sbcl with this patch: 55 gprs, 3474 alu groups. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-07-17 18:29:56 +04:00
Vinson Lee	81d3881367	r600g/sb: Initialize ra_checker member variables. Fixes "Uninitialized scalar field" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2013-07-17 18:27:30 +04:00
Vadim Girlin	725671a83a	r600g/sb: improve optimization of conditional instructions Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-05-27 15:19:20 +04:00
Vadim Girlin	63d09a0cb7	r600g/sb: improve handling of KILL instructions This patch improves handling of unconditional KILL instructions inside the conditional blocks, uncovering more opportunities for if-conversion. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-05-27 01:45:07 +04:00
Vadim Girlin	ff2a611699	r600g/sb: fix scheduling of PRED_SET instructions PRED_SET instructions that update exec mask should be scheduled immediately prior to the "if-then-else" block, because any instruction that is inserted after alu clause with PRED_SET and before conditional block is also conditionally executed by hw (exec mask is already updated at that moment). Propbably it's better to make PRED_SET a part of conditional "if-then-else" block in the IR to handle this more cleanly, but for now this temporary solution should prevent the problem. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-05-27 01:45:07 +04:00
Vadim Girlin	ecde4b07e2	r600g/sb: get rid of standard c++ streams Static initialization of internal libstdc++ data related to iostream causes segfaults with some apps. This patch replaces all uses of std::ostream and std::ostringstream in sb with custom lightweight classes. Prevents segfaults with ut2004demo and probably some other old apps. Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-05-14 17:36:25 +04:00
Vadim Girlin	3e476c311f	r600g/sb: use simple heuristic to limit register pressure It's not a complete register pressure tracking, yet it helps to prevent register allocation problems in some cases where they were observed. The problems are uncovered by false dependencies between fetch instructions introduced by some recent changes in TGSI and/or default backend. Sometimes we have code like this: ... SAMPLE R5.xyzw, R5.xyzw ... store R5.xyzw somewhere MOV R5.x, <next x coord> MOV R5.y, <next y coord> SAMPLE R5.xyzw, R5.xyzw ... <may be repeated a lot of times> With 2D resources, z and w in SAMPLE src reg aren't used and can be simply masked, but shader backend doesn't have this information, so it's considered as data dependency by optimization algorithms.	2013-04-30 21:50:48 +04:00
Vadim Girlin	2cd7691793	r600g/sb: initial commit of the optimizing shader backend	2013-04-30 21:50:47 +04:00

16 Commits