KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Connor Abbott	65660622a1	ir3: Split out variant-specific lowering and optimizations It seems a lot of the lowerings being run the second time were unnecessary. In addition, when const_state is moved to the variant, then it will become impossible to know ahead of time whether a variant needs additional optimizing, which means that ir3_key_lowers_nir() needs to go away. The new approach should have the same effect, since it skips running lowerings that are unnecessary and then skips the opt loop if no optimizations made progress, but it will work better when we move ir3_nir_analyze_ubo_ranges() to be after variant creation. The one maybe controversial thing I did is to make nir_opt_algebraic_late() always happen during variant lowering. I wanted to avoid code duplication, and it seems to me that we should push the _late variants as far back as possible so that later opt_algebraic runs don't miss out on optimization opportunities. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Rob Clark	f484d63617	freedreno/ir3: add helpers to deal with src/dst types Add some helpers to properly maintain src/dst types, and in the cases where opcode depends on src or dst type, maintain that as well. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>	2020-05-19 16:06:17 +00:00
Rob Clark	7b86b5ed7d	freedreno/ir3: fix immed type in create_addr0() We can also remove a bunch of manual src/dst flag munging, since the instruction builders handle this automatically now. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>	2020-05-19 16:06:17 +00:00
Rob Clark	3db5d146e9	freedreno/ir3: fix mismatched flags on split We have to fixup the meta:split half flag, because `ir3_split_dest()` is called before we fixup the dest type. But we should fixup both the split src and dest, as well as the thing it is splitting. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>	2020-05-19 16:06:17 +00:00
Rob Clark	f351e1d137	freedreno/ir3: limit # of tex prefetch by shader size It seems for short frag shaders, too much prefetch can be detrimental. I think what we really want to do is decide after pre-RA sched, when we also know about nop's and what the actual ir3 instruction count is. But that will require re-working how prefetch lowering works. For now this is a super crude heuristic to attempt to approximate a good solution. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>	2020-05-13 03:28:40 +00:00
Eric Anholt	0e51082cfa	freedreno/ir3: Leave bools as 1-bit, storing them in full regs. If use NIR's 1-bit bool representation , we get exactly the bool behavior the hardware provides: CMPS produces true or false, AND/OR/XOR work as intended without extra absnegs, and we can pass those half values directly to other CMPS. We emit an absneg for b2b1 ("turn a memory load into a 1-bit NIR boolean"), but we would have done so for the ir3_n2b() on the use of that value anyway. The most awkward bit is that inot(a@1) is now a sub(1, a), but we can encode the 1 as an immediate so it's fine. No significant changes to GL_TIME_ELAPSED on my set of traces (n=21). instructions in affected programs: 1570638 -> 1548702 (-1.40%) nops in affected programs: 624053 -> 611381 (-2.03%) non-nops in affected programs: 959061 -> 949797 (-0.97%) mov in affected programs: 5258 -> 5252 (-0.11%) cov in affected programs: 15099 -> 15902 (5.32%) dwords in affected programs: 469600 -> 452768 (-3.58%) last-baryf in affected programs: 162211 -> 154726 (-4.61%) full in affected programs: 4881 -> 4797 (-1.72%) sstall in affected programs: 173953 -> 174545 (0.34%) (ss) in affected programs: 10922 -> 10934 (0.11%) (sy) in affected programs: 728 -> 745 (2.34%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4518>	2020-04-30 23:36:09 +00:00
Eric Anholt	95d4a956c0	freedreno/ir3: CSE the up/downconversion of SEL's cond's size. Not many programs hit this, but if you were, say, selecting between vec4s, you'd convert the cond 4 times. instructions in affected programs: 2957 -> 2717 (-8.12%) nops in affected programs: 989 -> 899 (-9.10%) non-nops in affected programs: 1968 -> 1818 (-7.62%) dwords in affected programs: 3232 -> 2752 (-14.85%) last-baryf in affected programs: 102 -> 90 (-11.76%) full in affected programs: 5 -> 4 (-20.00%) sstall in affected programs: 329 -> 329 (0.00%) (ss) in affected programs: 86 -> 105 (22.09%) (sy) in affected programs: 14 -> 12 (-14.29%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4516>	2020-04-13 19:24:52 +00:00
Connor Abbott	de7d90ef53	ir3: Plumb through support for a1.x This will need to be used in some cases for the upcoming bindless support, plus ldc.k instructions which push data from a UBO to const registers. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>	2020-04-09 15:56:55 +00:00
Neil Roberts	61f7a1dfc5	freedreno/ir3: Lower bools to bitsize Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>	2020-03-24 23:21:21 +00:00
Hyunjun Ko	c822460f85	freedreno/ir3: handle half registers for arrays during register allocation. So far we only handle full regs of arrays during pre-allocation. This patch is to handle half regs of arrays and also consider the size of half regs when finding out conflicts. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>	2020-02-24 17:24:13 +00:00
Rob Clark	752aeb7b3f	freedreno/ir3: simplify split from collect In some cases we need to split components out from what was already a collect. That was making it hard to DCE unused components of the collect. (Ie. unused components of fragcoord, etc) So just detect this case and skip the chained collect+split. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>	2020-02-01 02:40:22 +00:00
Rob Clark	c1194e10b2	freedreno/ir3: cleanup after lower_locals_to_regs Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>	2020-02-01 02:40:22 +00:00
Rob Clark	3b8feefd9c	freedreno/ir3: add iterator macros So many open coded list iterators were getting annoying. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-12-13 09:25:40 -08:00
Rob Clark	611258d578	freedreno/ir3: rename fanin/fanout to collect/split If I'm going to refactor a bit to use these meta instructions to also handle input/output, then might as well cleanup the names first. Nouveau also uses collect/split for names of these meta instructions, and I like those names better. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-11-12 13:55:03 -08:00
Rob Clark	5da10704bb	freedreno/ir3: use SSA flag on dest register too We did this in some places before, but not consistantly. But it will be useful for two-pass RA, to identify which registers have already been assigned. While we are cleaning this up, use __ssa_src() and new __ssa_dst() helper more consistently. (If nothing else, this reduces the # of callers of ir3_reg_create() to audit that we didn't miss something) Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-11-09 02:49:14 +00:00
Rob Clark	9e211b57b8	freedreno/ir3: propagate dest flags for collect/fanin We did this properly already for split/fanout. But collect was missed. Extract out a helper to share. This way we avoid copy propagating a mov from high or half reg into an instruction which cannot consume a high/half reg. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-24 13:08:56 -07:00
Rob Clark	0f395f0933	freedreno/ir3: debug cleanup 1) deduplicate IR3_SHADER_DEBUG=disasm versus fs/vs/etc handling 2) standardize shader stage name prints, in particular VERT vs BVERT 3) don't mix stderr and stdout Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-24 13:08:56 -07:00
Rob Clark	f30c256ec0	freedreno/ir3: enable pre-fs texture fetch for a6xx Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-18 21:11:54 +00:00
Rob Clark	73cc2dc084	freedreno/ir3: fix for array/reg store vs meta instructions fishgl.com has a shader which does roughly: foo = texture(...); if (bar) foo = texture(...); after lowering phi webs to regs we end up w/ a vec4 reg (array). But since it was not an indirect access, we try to skip the extra mov. This results that the per-component fanout (split) meta instructions store directly to the reg (array). Which doesn't work out in RA. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-07-29 15:15:31 -07:00
Eric Anholt	01d0bad9ef	freedreno: Remove silly return from ir3_optimize_nir(). We only ever return the shader we were passed in (but internally modified). Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-21 17:14:43 -07:00
Tapani Pälli	287b58f827	ir3: initialize progress false before ir3_nir_lower_imul Removes a compiler warning about uninitialized variable. Fixes: `c02ffd2700` "ir3: Use the new NIR lowering pass for integer multiplication" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Rob Clark <robclark@gmail.com> Reviewed-by: Eduardo Lima <elima@igalia.com>	2019-06-14 08:21:42 +03:00
Eduardo Lima Mitev	c02ffd2700	ir3: Use the new NIR lowering pass for integer multiplication Shader-db stats courtesy of Eric Anholt: total instructions in shared programs: 6480215 -> 6475457 (-0.07%) instructions in affected programs: 662105 -> 657347 (-0.72%) helped: 1209 HURT: 13 total constlen in shared programs: 1432704 -> 1427769 (-0.34%) constlen in affected programs: 100063 -> 95128 (-4.93%) helped: 512 HURT: 0 total max_sun in shared programs: 875561 -> 873387 (-0.25%) max_sun in affected programs: 46179 -> 44005 (-4.71%) helped: 1087 HURT: 0 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Hyunjun Ko	43d80a3e20	freedreno/ir3: adjust the bitsize of regs when an array loading. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Rob Clark	b15c46e6bf	freedreno/ir3: move const_state to ir3_shader For a6xx, we construct/emit a single VS const state used for both binning pass and draw pass. So far we were mostly getting lucky that there were not (obvious) mismatches between the const_state (like different lowered immediates) between the binning and draw pass VS ir3_shader_variant. And I guess this situation will come up more as GS and tess is added into the equation. Since really everything about the const state is not specific to the variant, move this. The main exception is lowered immediates, but these are the last to appear in the layout, and it doesn't hurt for each new shader variant to just append any immed's it lowers to the end of the immediate state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	5690f83bb5	freedreno/ir3: split out const_state setup Next patch moves const_state to ir3_shader, before the compile context is created. So move the code around in prep to call it earlier. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	23e7a34466	freedreno/ir3: consolidate const state Combine the offsets of differenet parts of the constant space with (what was formerly known as) ir3_driver_const_layout. Bunch of churn, but no functional change. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	ef3eecd66b	freedreno/ir3: move ir3_pointer_size() Move to ir3_compiler so it doesn't depend on the compile context. Prep work for moving constant state from variant (where we have compile context) to shader (where we do not). Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	ca3eb5db66	freedreno/ir3: add some ubo range related asserts And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Kristian H. Kristensen	18ce6ac632	freedreno/ir3: Mark ir3_context_error() as NORETURN Fixes a few warnings. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	893425a607	freedreno/ir3: Push UBOs to constant file We have a rather big constant file and it seems that the best way to use it is to upload all UBOs and lower UBO access the load_uniform. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-27 13:26:02 -07:00
Rob Clark	0df0fc28a5	freedreno/ir3: rename put_dst() This was overlooked when it moved to ir3_context.c and ceased to be static.. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	7fe9e790e7	freedreno: fix crash w/ masked non-SSA dst Fixes dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read regression. Fixes: `c1a27ba9ba` freedreno/ir3: HIGH reg w/a for a6xx Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	f5ee8c54ed	freedreno/ir3: fix legalize for vecN inputs The wrmask is handled in regmask_get()/regmask_set(), but it wasn't being propagated from SSA src to dst. So for example, an SSBO read value that is passed in as src2.y component to atomic op, wasn't getting the (sy) flag set. Causing lots of fail. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	c1a27ba9ba	freedreno/ir3: HIGH reg w/a for a6xx It seems like some instructions (noticed this w/ cat3), cannot read HIGH regs.. cat1 (mov/cov) can, and possibly some/all of cat2. The blob seems to stick w/ an extra mov into low regs. So lets do the same. This fixes WGID on a6xx, which unsurprisingly is related to a lot of deqp compute fails. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	947848524d	freedreno/ir3: add a6xx+ SSBO/image support Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	2e0ea3f09c	freedreno/ir3: add image/ssbo <-> ibo/tex mapping Images and SSBOs don't map directly to the hw. They end up being part texture and part something else. Starting with a6xx, the hack used for a5xx to smash the image tex state into hw texture state starting from MAX counting down won't work, because we start using tex state also for SSBO read. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	feee3050d3	freedreno/ir3: split out a4xx+ instructions Note that image/ssbo support is currently only implemented for a5xx. But the instruction encoding is the same for a4xx. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	3453814622	freedreno/ir3: fix fallout of extra assert Fixes the following crash that happened after `d6110d4d` The problem happens if we first compile a "vanilla" shader with nothing lowered in NIR, which perform the final lowering passes on so->shader-> nir (including nir_lower_locals_to_regs()), and then later we have compile a shader with some lowering. The second time through we would have already done nir_lower_locals_to_regs(). Arguably this was already a bug, just one we hadn't noticed yet. Fixes: `d6110d4d54` intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-21 19:04:22 -05:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Rob Clark	9517037bdc	freedreno/ir3: code-motion Split up ir3_compiler_nir.c a bit before starting to add new stuff for a6xx SSBO/image instructions. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00

40 Commits