KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Christian Gmeiner	e67bca3fe7	nir: make lower_sample_tex_compare a common pass This pass was originally written for d3d12, but is useful for hardware that lacks sample compare support like some etnaviv GPU models. Also rename the lowering pass and some surrounding code to nir_lower_tex_shadow as suggested by Emma. I'd like to use the pass that's already in tree. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>	2022-01-28 18:40:53 +00:00
Emma Anholt	61400f8a2d	nir/lower_locals_to_regs: Do an ad-hoc copy propagate on our generated MOV. I noticed the inefficiency in NIR-to-TGSI output while trying to debug a failure handling some arrays in r600. While this makes reading CTS shaders easier, the effect in the real world is pretty limited. From softpipe shader-db: total instructions in shared programs: 2929840 -> 2929836 (<.01%) instructions in affected programs: 118 -> 114 (-3.39%) Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>	2022-01-25 06:01:13 +00:00
Bas Nieuwenhuizen	d1530a3f3b	Revert "nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants" This reverts commit `a1af902531`. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5423 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14532>	2022-01-21 16:58:11 +00:00
Rhys Perry	af51efe195	nir/builder: assume scalar alignment if not provided Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>	2022-01-21 13:45:33 +00:00
Rhys Perry	e9e1a44872	nir/builder: set write mask if not provided Zero isn't really a valid write mask. If it's provided, use a full write mask. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>	2022-01-21 13:45:33 +00:00
Rhys Perry	495debebad	nir/algebraic: optimize expressions using fmulz/ffmaz Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	14b8227083	nir: add some missing nir_alu_type_get_base_type Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	f2fbba7920	nir/algebraic: optimize open-coded fmulz/ffmaz This pattern will be found in future versions of D3D9 DXVK. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	312a284980	nir/algebraic: add ignore_exact() wrapper Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	7f05ea3793	nir: add nir_op_fmulz and nir_op_ffmaz Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Emma Anholt	cac6f633b2	nir/opt_offsets: Use nir_ssa_scalar to chase offset additions. For nir_to_tgsi, I want to be able to fold into the base from a vector load_const, which the ad-hoc scalar chasing couldn't handle. r300: total instructions in shared programs: 1278731 -> 1256502 (-1.74%) instructions in affected programs: 457909 -> 435680 (-4.85%) total flowcontrol in shared programs: 8316 -> 8313 (-0.04%) flowcontrol in affected programs: 5 -> 2 (-60.00%) total temps in shared programs: 213687 -> 213774 (0.04%) temps in affected programs: 13140 -> 13227 (0.66%) total consts in shared programs: 952850 -> 949929 (-0.31%) consts in affected programs: 386352 -> 383431 (-0.76%) Fixes: #5781 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>	2022-01-19 22:28:34 +00:00
Emma Anholt	645ca56425	nir/opt_offsets: Also apply the max offset to top-level constant folding. nir_to_tgsi wants this for disabling folding into shared var accesses at all. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>	2022-01-19 22:28:34 +00:00
Emma Anholt	ec4b9909f0	nir/opt_offsets: Disable unsigned wrap checks on non-native-integers HW. Since we don't have 32-bit ints, these checks for 32-bit unsigned wrapping don't help and just reduce optimization opportunities (particularly for DX9 addressing math). Doesn't affect any current consumers. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>	2022-01-19 22:28:34 +00:00
Emma Anholt	700d2fbd0a	nir: Add a .base field to nir_load_ubo_vec4. This lets nir-to-tgsi fold the constant offset of addressing calculations into the CONST[] reference, which is important for D3D9-era compatibility: HW of that age has limited uniform space, and if we do the addressing math as math in the shader for dynamic indexing, the nir_load_consts end up taking up uniforms we don't have available. r300: total instructions in shared programs: 1279699 -> 1279167 (-0.04%) instructions in affected programs: 134796 -> 134264 (-0.39%) total instructions in shared programs: 1279699 -> 1279167 (-0.04%) instructions in affected programs: 134796 -> 134264 (-0.39%) total temps in shared programs: 213912 -> 213736 (-0.08%) temps in affected programs: 2166 -> 1990 (-8.13%) total consts in shared programs: 953237 -> 952973 (-0.03%) consts in affected programs: 45980 -> 45716 (-0.57%) Acked-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>	2022-01-19 22:28:34 +00:00
Dave Airlie	ccbf700d6c	nir: remove gl.h include from nir headers. This saves a lot of pointless gl.h includes across the board, it moves the one place that needs GLenum into a separate file only used in those passes that require it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Dave Airlie	1352e0ba0c	mesa/*: add a shader primitive type to get away from GL types. This creates an internal shader_prim enum, I've fixed up most users to use it instead of GL types. don't store the enum in shader_info as it changes size, and confuses other things. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Connor Abbott	9c9e8c3349	nir: Reorder ffma and fsub combining It's relatively common to do something like "a * b - c", which on most GPUs can be implemented in a single instruction. Before opt_algebraic_late this will be something like "fadd(fmul(a, b), fneg(c))", and we want to turn it info "ffma(a, b, fneg(c))". But because the fsub pattern was first we instead turned it into "fsub(fmul(a, b), c)". Fix this by reordering them. Selected shader-db results on freedreno: total instructions in shared programs: 1561330 -> 1551619 (-0.62%) instructions in affected programs: 780272 -> 770561 (-1.24%) helped: 1941 HURT: 491 helped stats (abs) min: 1 max: 147 x̄: 7.98 x̃: 4 helped stats (rel) min: 0.07% max: 30.77% x̄: 4.36% x̃: 3.17% HURT stats (abs) min: 1 max: 307 x̄: 11.76 x̃: 5 HURT stats (rel) min: 0.09% max: 18.71% x̄: 2.26% x̃: 1.38% 95% mean confidence interval for instructions value: -4.57 -3.41 95% mean confidence interval for instructions %-change: -3.21% -2.84% Instructions are helped. total nops in shared programs: 358926 -> 356263 (-0.74%) nops in affected programs: 167116 -> 164453 (-1.59%) helped: 1395 HURT: 859 helped stats (abs) min: 1 max: 108 x̄: 6.80 x̃: 3 helped stats (rel) min: 0.17% max: 100.00% x̄: 19.15% x̃: 10.57% HURT stats (abs) min: 1 max: 307 x̄: 7.95 x̃: 3 HURT stats (rel) min: 0.00% max: 381.82% x̄: 20.04% x̃: 10.00% 95% mean confidence interval for nops value: -1.77 -0.59 95% mean confidence interval for nops %-change: -5.55% -2.87% Nops are helped. total non-nops in shared programs: 1202404 -> 1195356 (-0.59%) non-nops in affected programs: 496682 -> 489634 (-1.42%) helped: 1951 HURT: 265 helped stats (abs) min: 1 max: 39 x̄: 4.02 x̃: 3 helped stats (rel) min: 0.07% max: 15.38% x̄: 2.97% x̃: 1.96% HURT stats (abs) min: 1 max: 22 x̄: 2.97 x̃: 2 HURT stats (rel) min: 0.05% max: 10.00% x̄: 1.14% x̃: 0.75% 95% mean confidence interval for non-nops value: -3.38 -2.99 95% mean confidence interval for non-nops %-change: -2.60% -2.36% Non-nops are helped. total systall in shared programs: 288317 -> 292975 (1.62%) systall in affected programs: 87876 -> 92534 (5.30%) helped: 388 HURT: 431 helped stats (abs) min: 1 max: 214 x̄: 14.39 x̃: 8 helped stats (rel) min: 0.25% max: 100.00% x̄: 22.12% x̃: 11.96% HURT stats (abs) min: 1 max: 232 x̄: 23.77 x̃: 7 HURT stats (rel) min: 0.00% max: 1300.00% x̄: 51.71% x̃: 17.30% 95% mean confidence interval for systall value: 3.07 8.30 95% mean confidence interval for systall %-change: 9.49% 23.97% Systall are HURT. (The systall hurt is probably just due to having having fewer instructions to hide latency with.) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14554>	2022-01-18 17:44:50 +00:00
Qiang Yu	2cee73f0f7	nir: fix nir_tex_instr hash not count is_sparse field This fixes nir_opt_cse miss replace a non-sparse tex instruction with a sparse tex instruction and fail the nir_validate_shader(). Fixes: `3a7972f72a` ("nir,spirv: add sparse texture fetches") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14362>	2022-01-18 16:10:35 +08:00
Rhys Perry	d95a0b52e4	nir/unsigned_upper_bound: don't follow 64-bit f2u32() Fixes Doom Eternal crash. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Fixes: `72ac3f6026` ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14555>	2022-01-17 10:59:21 +00:00
Emma Anholt	f6ffefba3e	nir: Apply nir_opt_offsets to nir_intrinsic_load_uniform as well. Doing this for ir3 required adding a struct for limits of how much base to fold in (which NTT wants as well for its case of shared vars), otherwise the later work to lower to the 1<<9 word limit would emit more instructions. The shader-db results are that sometimes the reduction in NIR instruction count results in the fewer sampler prefetches due to the shader being estimated to be shorter (dota2, nexuiz): total instructions in shared programs: 8996651 -> 8996776 (<.01%) total cat5 in shared programs: 86561 -> 86577 (0.02%) Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>	2022-01-16 19:11:29 +00:00
Emma Anholt	b024102d7c	freedreno/ir3: Use nir_opt_offset for removing constant adds for shared vars. Saves some work in carchase and manhattan31: instructions in affected programs: 2842 -> 2818 (-0.84%) nops in affected programs: 1131 -> 1105 (-2.30%) non-nops in affected programs: 1236 -> 1238 (0.16%) mov in affected programs: 57 -> 61 (7.02%) dwords in affected programs: 2144 -> 2150 (0.28%) cat0 in affected programs: 1195 -> 1169 (-2.18%) cat1 in affected programs: 151 -> 155 (2.65%) cat2 in affected programs: 142 -> 140 (-1.41%) sstall in affected programs: 190 -> 178 (-6.32%) (ss) in affected programs: 63 -> 63 (0.00%) systall in affected programs: 532 -> 511 (-3.95%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>	2022-01-16 19:11:29 +00:00
Daniel Schürmann	79a987ad2a	nir/opt_if: also merge break statements with ones after the branch This optimizations turns loop { ... if (cond1) { if (cond2) { do_work_1(); break; } else { do_work_2(); } do_work_3(); break; } else { ... } } into: loop { ... if (cond1) { if (cond2) { do_work_1(); } else { do_work_2(); do_work_3(); } break; } else { ... } } As this optimizations moves code into the NIF statement, it re-iterates on the branch legs in case of success. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>	2022-01-13 02:30:32 +00:00
Daniel Schürmann	dad609d152	nir/opt_if: merge two break statements from both branch legs This optimization turns loop { ... if (cond) { do_work_1(); break; } else { do_work_2(); break; } } into: loop { ... if (cond) { do_work_1(); } else { do_work_2(); } break; } Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>	2022-01-13 02:30:32 +00:00
Daniel Schürmann	8a78706643	nir: refactor nir_opt_move This patch is a rewrite of nir_opt_move. Differently from the previous version, each instruction is checked if it can be moved downwards and then inserted before the first user of the definition. The advantage is that less insert operations are performed, the original order is kept if two movable instructions have the same first user, and instructions without user in the same block are moved towards the end. v2: Only return true if an instruction really changed the position. Don't care for discards, this will be handled by another MR. v3: fix self-referring phis and update according to nir_can_move_instr(). v4: use nir_can_move_instr() and nir_instr_ssa_def() v5: deduplicate some code Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3657>	2022-01-12 13:41:54 +00:00
Marcin Ślusarz	f286ecf906	nir: handle per-view clip/cull distances Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>	2022-01-11 22:45:23 +00:00
Marcin Ślusarz	0d6f83cbf1	nir: remove invalid assert affecting per-view variables per-view variables can have arbitrary (but > 0) number of array levels Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>	2022-01-11 22:45:23 +00:00
Marcin Ślusarz	4fed440724	nir: add load_mesh_view_count and load_mesh_view_indices intrinsics Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14263>	2022-01-11 22:45:23 +00:00
Rhys Perry	67fc7a1763	nir/uniform_atomics: fix is_atomic_already_optimized without workgroups dims_needed would have been zero, so this would always returned true for non-compute stages. Also fix this for variable workgroup sizes. Improves Shadow of the Tomb Raider RX 6800 performance by 10.6%, 11.5% and 4.5% (day_of_dead, jungle and paititi scenes). radv_perf before and after: {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '62.913333333333334', 'min_fps': '62.81', 'max_fps': '62.98', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '64.02666666666666', 'min_fps': '63.93', 'max_fps': '64.11', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '74.81666666666666', 'min_fps': '74.72', 'max_fps': '74.88', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '69.57', 'min_fps': '69.52', 'max_fps': '69.63', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '71.41000000000001', 'min_fps': '71.31', 'max_fps': '71.5', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '78.16666666666667', 'min_fps': '78.07', 'max_fps': '78.23', 'interations': '3'} Performance now seems slightly better than AMDVLK 2021.Q4.3: {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'day_of_dead', 'avg_fps': '68.02666666666666', 'min_fps': '67.95', 'max_fps': '68.16', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'jungle', 'avg_fps': '70.24666666666667', 'min_fps': '69.83', 'max_fps': '70.51', 'interations': '3'} {'app': 'SotTR', 'resolution': '3840x2160', 'preset': 'VeryHigh', 'antialiasing': 'off', 'scene': 'paititi', 'avg_fps': '77.19', 'min_fps': '77.18', 'max_fps': '77.2', 'interations': '3'} fossil-db (Sienna Cichlid): Totals from 40 (0.03% of 134621) affected shaders: CodeSize: 62676 -> 65996 (+5.30%) Instrs: 11372 -> 12111 (+6.50%) Latency: 144122 -> 142848 (-0.88%); split: -1.09%, +0.21% InvThroughput: 19686 -> 19847 (+0.82%); split: -0.06%, +0.87% VClause: 304 -> 306 (+0.66%) SClause: 603 -> 604 (+0.17%); split: -0.83%, +1.00% Copies: 780 -> 858 (+10.00%) Branches: 235 -> 329 (+40.00%) PreSGPRs: 1072 -> 1083 (+1.03%); split: -0.37%, +1.40% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14407>	2022-01-10 19:57:38 +00:00
Rhys Perry	b00138090e	nir/lower_shader_calls: fix store_scratch write_mask Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14447>	2022-01-10 19:01:04 +00:00
Danylo Piliaiev	b8d486f298	nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8 Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but not signed dot product. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Gert Wollny	6f348d9c99	nir_lower_io: propagate the "invariant" flag to outputs Ultimately this is consumed by nir-to-tgsi and needed by virglrenderer to correctly declare output variables. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>	2022-01-07 16:35:43 +00:00
Emma Anholt	558a600629	nir_to_tgsi: Enable fdot_replicates flag. That's how the TGSI math opcodes work. This lets lower_vec_to_regs coalesce the DP output into the .yzw channels, giving an impressive shader-db win on softpipe: total instructions in shared programs: 2929840 -> 2794036 (-4.64%) instructions in affected programs: 1651438 -> 1515634 (-8.22%) total temps in shared programs: 372730 -> 332744 (-10.73%) temps in affected programs: 118151 -> 78165 (-33.84%) and a minor one on r300: total instructions in shared programs: 51238 -> 51149 (-0.17%) instructions in affected programs: 2621 -> 2532 (-3.40%) total vinst in shared programs: 15655 -> 15618 (-0.24%) vinst in affected programs: 468 -> 431 (-7.91%) total temps in shared programs: 9838 -> 9828 (-0.10%) temps in affected programs: 59 -> 49 (-16.95%) and a bigger one on i915g: total instructions in shared programs: 398064 -> 395901 (-0.54%) instructions in affected programs: 29271 -> 27108 (-7.39%) total tex_indirect in shared programs: 12261 -> 12233 (-0.23%) tex_indirect in affected programs: 98 -> 70 (-28.57%) LOST: 0 GAINED: 5 The r300 change is less impressive because it does some backend copy-prop, but also because intermediate storage of DPs now takes a vec4 instead of a scalar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14200>	2022-01-07 09:58:24 +00:00
Jesse Natalie	c09c0b351f	nir_opt_dead_cf: Remove dead ifs An if that looks like: if (x) { } else { } That has no phis following it is dead. Currently these are only removed by peephole select, but that means that 'x' is considered used until that pass is run, which can make it difficult to apply sane lowering in the case where loading 'x' requires complex or expensive transformations, but 'x' is really unused. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14400>	2022-01-07 05:15:48 +00:00
Alyssa Rosenzweig	24ea7cbb06	nir: Extend store_combined_output_pan Extend store_combined_output_pan to take a dual source blend input in addition to colour, depth, and stencil inputs. Use the last source for this, and represent the type with the DEST_TYPE index. This is a hack but there is no SRC2_TYPE and NIR doesn't seem to mind as long as we know what we mean. This allows the backend to emit a combined "blend render target #0" instruction taking two sources. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>	2022-01-02 01:12:05 +00:00
Alyssa Rosenzweig	5c168f09eb	nir: Eliminate store_combined_output_pan BASE It's meaningless for this intrinsic and is just adding noise to the lowering pass. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>	2022-01-02 01:12:05 +00:00
Daniel Schürmann	17ecd0b31a	nir/opt_algebraic: lower fneg_hi/lo to fmul This pattern, found in the FSR upscaling shader, helps the vectorization efforts by keeping the chain of vectorized instructions intact. Radeon can optimize it to per-component fneg modifiers. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>	2021-12-21 13:23:37 +01:00
Caio Oliveira	729df14e45	nir: Handle volatile semantics for loading HelperInvocation builtin SPV_EXT_demote_to_helper_invocation added OpDemoteToHelperInvocation operation to turn an invocation into a helper invocation, but the value of HelperInvocation (a builtin from Input storage class) couldn't be modified dynamically without breaking compatibility. For the extension the operation OpIsHelperInvocation was added to get the dynamic value. For SPIR-V 1.6, the demote operation was promoted, but now to get the dynamic value the shader must issue a load to HelperInvocation with Volatile memory access semantics. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14209>	2021-12-17 16:37:14 -08:00
Rhys Perry	a65285f54b	nir/opt_access: infer CAN_REORDER for global access fossil-db (Sienna Cichlid): Totals from 352 (0.26% of 134621) affected shaders: VGPRs: 17240 -> 17272 (+0.19%) CodeSize: 1753640 -> 1755744 (+0.12%); split: -0.04%, +0.16% Instrs: 323190 -> 323801 (+0.19%); split: -0.03%, +0.22% Latency: 3241205 -> 3241293 (+0.00%); split: -0.10%, +0.10% InvThroughput: 568927 -> 568067 (-0.15%); split: -0.16%, +0.00% SClause: 12109 -> 10444 (-13.75%); split: -13.76%, +0.01% Copies: 27802 -> 27717 (-0.31%); split: -0.56%, +0.26% PreSGPRs: 14699 -> 14690 (-0.06%) PreVGPRs: 15793 -> 15799 (+0.04%) fossil-db (Polaris10): Totals from 348 (0.26% of 135668) affected shaders: SGPRs: 21446 -> 21574 (+0.60%); split: -0.15%, +0.75% VGPRs: 17004 -> 16996 (-0.05%); split: -0.09%, +0.05% CodeSize: 1782796 -> 1783060 (+0.01%); split: -0.03%, +0.05% Instrs: 337828 -> 337921 (+0.03%); split: -0.03%, +0.06% Latency: 3726328 -> 3726721 (+0.01%); split: -0.09%, +0.10% InvThroughput: 1307917 -> 1299841 (-0.62%); split: -0.62%, +0.00% VClause: 4327 -> 4337 (+0.23%); split: -0.09%, +0.32% SClause: 12178 -> 10529 (-13.54%); split: -13.55%, +0.01% Copies: 40227 -> 40244 (+0.04%); split: -0.19%, +0.24% PreSGPRs: 14946 -> 14937 (-0.06%) PreVGPRs: 15637 -> 15643 (+0.04%) fossil-db (Pitcairn): Totals from 351 (0.26% of 135668) affected shaders: SGPRs: 20382 -> 20619 (+1.16%); split: -0.79%, +1.95% CodeSize: 1789732 -> 1789836 (+0.01%); split: -0.04%, +0.04% MaxWaves: 1947 -> 1949 (+0.10%) Instrs: 352274 -> 352318 (+0.01%); split: -0.04%, +0.06% Latency: 4057829 -> 4058226 (+0.01%); split: -0.08%, +0.09% InvThroughput: 1332245 -> 1317578 (-1.10%); split: -1.11%, +0.01% VClause: 8581 -> 8583 (+0.02%); split: -0.13%, +0.15% SClause: 12187 -> 10552 (-13.42%); split: -13.43%, +0.02% Copies: 44906 -> 44915 (+0.02%); split: -0.24%, +0.26% PreSGPRs: 16571 -> 16562 (-0.05%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>	2021-12-17 18:51:24 +00:00
Rhys Perry	403ae3b48e	nir/algebraic: optimize more 64-bit imul with constant source Two 64-bit shifts and an addition are usually faster than the several multiplications nir_lower_int64 creates. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>	2021-12-17 18:51:24 +00:00
Rhys Perry	c56cf157c5	nir/opt_load_store_vectorize: improve ssbo/global alias analysis If either the global access or the ssbo access is restrict, they shouldn't alias. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>	2021-12-17 18:51:24 +00:00
Jason Ekstrand	deec7a590b	anv,nir: Use sample_pos_or_center in lower_wpos_center Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	e8acc5a7ea	nir: Add a new sample_pos_or_center system value Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Marcin Ślusarz	504e5cb4e8	nir/print: print const value near each use of const ssa variable Without/with NIR_DEBUG=print,print_const: -vec4 32 ssa_60 = fadd ssa_59, ssa_58 +vec4 32 ssa_60 = fadd ssa_59 /(0xbf800000, 0x3e800000, 0x00000000, 0x3f800000) = (-1.000000, 0.250000, 0.000000, 1.000000)/, ssa_58 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13880>	2021-12-17 10:04:50 +00:00
Marcin Ślusarz	23f8f836e0	nir/print: group hex and float vectors together Vectors are much easier to follow in this format, because developer cares either about hex or float values, never both. Before/after: -vec4 32 ssa_222 = load_const (0x00000000 /* 0.000000 /, 0x00000000 / 0.000000 /, 0x3f800000 / 1.000000 /, 0x3f800000 / 1.000000 /) +vec4 32 ssa_222 = load_const (0x00000000, 0x00000000, 0x3f800000, 0x3f800000) = (0.000000, 0.000000, 1.000000, 1.000000) -vec1 32 ssa_174 = load_const (0xbf800000 / -1.000000 */) +vec1 32 ssa_174 = load_const (0xbf800000 = -1.000000) Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13880>	2021-12-17 10:04:50 +00:00
Marcin Ślusarz	d2b4051ea9	nir/print: move print_load_const_instr up ... to avoid forward declarations in future commit Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13880>	2021-12-17 10:04:50 +00:00
Marcin Ślusarz	f7e63ec5d8	nir/print: compact printing of intrinsic indices Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>	2021-12-16 09:43:13 +00:00
Marcin Ślusarz	d8fa625bb3	nir/print: expand printing of io semantics.gs_streams gs_streams can be set for at least 2 other intrinsics. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>	2021-12-16 09:43:13 +00:00
Marcin Ślusarz	be25db9f0f	nir/print: simplify printing of IO semantics Some of the tested flags are set for other intrinsics and they are printed only when set, so there's no point in checking exact intrinsic name or shader stage. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>	2021-12-16 09:43:13 +00:00
Caio Oliveira	b1156f23a2	Revert "nir: disable a NIR test due to undebuggable & locally unreproducible CI failures" This reverts commit `6eb3fe2d4f`. The root cause was a bug in Meson when using the new gtest protocol and a test failed before producing the XML file expected by it. This was fixed in later versions of Meson, so we've bumped the required meson version to use that feature. The failure should now be properly identified, so re-enabling the NIR test. Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Daniel Stone <daniels@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14204>	2021-12-15 23:28:09 +00:00
Caio Oliveira	dcc7b19cae	nir: Initialize nir_register::divergent Fixes: `c7fc44f9eb` ("nir/from_ssa: Respect and populate divergence information") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14205>	2021-12-15 22:39:06 +00:00
Juan A. Suarez Romero	b8f6685bb5	nir: use call_once() to init debug variable For data-race safety, let's use this function to ensure NIR debug is initialized only once. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14057>	2021-12-14 08:01:17 +00:00
Juan A. Suarez Romero	18c039b2e1	tgsi-to-nir: initialize NIR_DEBUG envvar This envvar is initialized when creating a NIR shader, but it needs to be used before. So initialize it here. v2 (Juan): - Use static variable for first initialization. Fixes: `f77ccdfb4a` ("nir: add NIR_DEBUG envvar") Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14057>	2021-12-14 08:01:17 +00:00
Jordan Justen	211e0606c7	nir/lower_tex: Add filter for tex offset lowering Rework: * Add callback_data (s-b Jason) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>	2021-12-13 16:56:23 -08:00
Samuel Pitoiset	be53b3d1bf	nir/lower_tex: add lower_lod_zero_width On AMD, the hardware will return 0 for the raw LOD if the sum of the absolute values of derivatives is 0 but Vulkan expects the value to be in the [-inf, -22.0f] range. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14147>	2021-12-13 10:00:07 +00:00
Marcin Ślusarz	87f03b1662	nir: limit lower_clip_cull_distance_arrays input to traditional stages Compute, task, mesh & raytracing stages don't support ClipDistance/CullDistance as input. This change is not needed for correctness. Just something I stumbled on. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14149>	2021-12-13 08:32:23 +00:00
Marek Olšák	2785141c16	nir: add nir_has_divergent_loop function Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>	2021-12-11 20:07:35 +00:00
Marek Olšák	26b522eae5	nir: serialize divergent fields Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>	2021-12-11 20:07:35 +00:00
Marek Olšák	6eb3fe2d4f	nir: disable a NIR test due to undebuggable & locally unreproducible CI failures debian-vulkan but not any other CI pipeline consistently fails with: FileNotFoundError: [Errno 2] No such file or directory: 'nir_tests.xml' I have to assume that either debian-vulkan is broken, or the NIR test infrastructure is broken. That's not all. I got the same failure when I wanted to add a new test, which means the CI is preventing us from adding new NIR tests, which is a very serious problem with the CI or NIR tests. The python error doesn't imply that it's a test failure, so something else is broken. If you don't want such commits to happen again, print better error messages. See also the discussion in the MR. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>	2021-12-11 20:07:35 +00:00
Marek Olšák	2ab310b78b	nir: handle more intrinsics in divergence analysis Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>	2021-12-11 20:07:35 +00:00
Emma Anholt	d199d65c3a	nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr. We weren't doing much motion in nir-to-tgsi because we considered all our lowered load-ubos as unmovable. softpipe shader-db: total temps in shared programs: 563942 -> 563136 (-0.14%) temps in affected programs: 9833 -> 9027 (-8.20%) r300 shader-db: instructions in affected programs: 22858 -> 23575 (3.14%) temps in affected programs: 2039 -> 1813 (-11.08%) (NIR had given r300 -19% instrs for +40% temps, so this feels like a worthwhile trade back). Reivewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14138>	2021-12-11 02:12:27 +00:00
Emma Anholt	de33205f88	nir/algebraic: Move all the individual transforms to a common table. Cuts 28% of the remaining relocations in libvulkan_intel.so, shrinks binary size by 290kb. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	a29b54f014	nir/algebraic: Mark the automaton's filter tables as const. Moves it to .rodata instead of .data. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	45a8d11b6e	nir/algebraic: Pack various bitfields in the nir_search_value_union. This gets our union's size down to 22 bytes (now smaller than any of the union's types were before we made the union!). Cuts another 48kb off of the drivers. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	53f49b7066	nir/algebraic: Move relocations for variable conds to a table. This helps concentrate the dirty pages from the relocations, reduces how many relocations there are, and reduces the size of each variable assuming variables mostly don't have conditions or the conditions are mostly reused). Reduces libvulkan_intel.so size by 49kb. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	8485a78977	nir/algebraic: Move relocations for expression conds to a table. This helps concentrate the dirty pages from the relocations, reduces how many relocations there are, and reduces the size of each expression (assuming expressions mostly don't have conditions or the conditions are mostly reused). Reduces libvulkan_intel.so size by 8.7kb. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	7635379dc7	nir/algebraic: Remove array-of-cond code You can't have an array of them after removing many-comm-expr, there's no space in the struct. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	5d82c61a30	nir/algebraic: Replace relocations for nir_search values with a table. Even with packing all 3 types into a 40-byte union (nir_search_constant being 24 bytes and nir_search_expression having formerly been 32), and having a single array of them, this cuts 1.7MB from each of libvulkan_intel.so and libgallium_dri.so. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:09:00 +00:00
Emma Anholt	e7d8717375	nir/algebraic: Drop the check for cache == None. The cache is always set. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:08:59 +00:00
Emma Anholt	a263474d3b	nir/algebraic: Move some generated-code algebraic opt args into a struct. I'm going to be adding some more tables to reduce relocations in the generated code, so move the current tables to a struct for arg-passing sanity. Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>	2021-12-07 07:08:59 +00:00
Ian Romanick	b88202b0e4	nir/constant_folding: Optimize txb with bias of constant zero to tex v2: Fail gracefully when bias_idx < 0. See comment in the code for the rationale. See also issue #5722. All Haswell and newer Intel GPUs had similar results. (Ice Lake shown) total instructions in shared programs: 19757733 -> 19753431 (-0.02%) instructions in affected programs: 277248 -> 272946 (-1.55%) helped: 1644 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.62 x̃: 2 helped stats (rel) min: 0.05% max: 11.11% x̄: 2.11% x̃: 1.61% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.35% max: 0.35% x̄: 0.35% x̃: 0.35% 95% mean confidence interval for instructions value: -2.72 -2.51 95% mean confidence interval for instructions %-change: -2.19% -2.03% Instructions are helped. total cycles in shared programs: 938517439 -> 938384079 (-0.01%) cycles in affected programs: 19548849 -> 19415489 (-0.68%) helped: 1358 HURT: 269 helped stats (abs) min: 1 max: 2328 x̄: 133.01 x̃: 16 helped stats (rel) min: <.01% max: 41.12% x̄: 1.40% x̃: 0.48% HURT stats (abs) min: 1 max: 1302 x̄: 175.70 x̃: 30 HURT stats (rel) min: <.01% max: 69.03% x̄: 6.24% x̃: 1.04% 95% mean confidence interval for cycles value: -99.14 -64.79 95% mean confidence interval for cycles %-change: -0.47% 0.19% Inconclusive result (%-change mean confidence interval includes 0). LOST: 21 GAINED: 32 All Ivy Bridge and older Intel GPUs had similar results. (Ivy Bridge shown) total instructions in shared programs: 15302017 -> 15301485 (<.01%) instructions in affected programs: 22565 -> 22033 (-2.36%) helped: 168 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 3.17 x̃: 3 helped stats (rel) min: 0.04% max: 4.39% x̄: 3.05% x̃: 3.27% 95% mean confidence interval for instructions value: -3.45 -2.89 95% mean confidence interval for instructions %-change: -3.19% -2.91% Instructions are helped. total cycles in shared programs: 550119761 -> 549989147 (-0.02%) cycles in affected programs: 12834251 -> 12703637 (-1.02%) helped: 164 HURT: 0 helped stats (abs) min: 20 max: 4547 x̄: 796.43 x̃: 294 helped stats (rel) min: 0.23% max: 53.84% x̄: 2.05% x̃: 0.37% 95% mean confidence interval for cycles value: -942.62 -650.24 95% mean confidence interval for cycles %-change: -3.17% -0.94% Cycles are helped. fossil-db results: Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown) Instructions in all programs: 142073649 -> 141307526 (-0.5%) SENDs in all programs: 6876848 -> 6876778 (-0.0%) Loops in all programs: 38283 -> 38283 (+0.0%) Cycles in all programs: 8410049681 -> 8402902960 (-0.1%) Spills in all programs: 190623 -> 190599 (-0.0%) Fills in all programs: 297780 -> 297756 (-0.0%) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14025>	2021-12-06 19:50:42 +00:00
Lionel Landwerlin	0cbcc15afe	nir: add a ray query optimization pass Just remove queries that are never used or proceeded with. The latter case leading to undefined values. v2: Don't use nir_shader_instructions_pass() to find variables (Caio) Simplify replacement (Caio) v3: Don't track all the queries intrinsic effects (Caio) Rename things to represent only read queries (Caio) Use set instead of hash_table (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>	2021-12-04 20:46:35 +00:00
Lionel Landwerlin	5a9cdab170	nir: track variables representing ray queries v2: Fix missing ray_query variable check (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>	2021-12-04 20:46:35 +00:00
Lionel Landwerlin	0d6f050b46	nir: add intrinsics for ray queries Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>	2021-12-04 20:46:35 +00:00
Lionel Landwerlin	0800ec2c77	nir: add a new access flag to allow access in helper invocations v2: Add nir_print support Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>	2021-12-04 20:46:35 +00:00
Lionel Landwerlin	54489b3c09	nir/print: printout ACCESS_STREAM_CACHE_POLICY Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>	2021-12-04 20:46:35 +00:00
Lionel Landwerlin	f98984ad13	nir/lower_io: include the variable access in the lowered intrinsic Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>	2021-12-04 20:46:35 +00:00
Marcin Ślusarz	b717872e08	intel/compiler: Get mesh_global_addr from the Inline Parameter for Task/Mesh Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>	2021-12-04 00:41:46 +00:00
Timur Kristóf	f28adc711f	nir: Print task and mesh shader I/O variable names. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14007>	2021-12-03 21:34:45 +00:00
Timur Kristóf	7e66da89f8	nir: Fix sorting per-primitive outputs. Fixes: `59860d4873` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14006>	2021-12-03 17:06:47 +00:00
Rhys Perry	a2d8c5b26d	nir/algebraic: optimize a*#b & -4 fossil-db (Sienna Cichlid): Totals from 611 (0.47% of 128647) affected shaders: CodeSize: 3096680 -> 3090976 (-0.18%) Instrs: 570494 -> 569249 (-0.22%) Latency: 5765865 -> 5759619 (-0.11%) InvThroughput: 969840 -> 967608 (-0.23%) VClause: 9690 -> 9688 (-0.02%) Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03% PreVGPRs: 28290 -> 28288 (-0.01%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>	2021-12-03 13:41:07 +00:00
Rhys Perry	2368c36427	nir/opt_offsets: remove need to loop try_extract_const_addition fossil-db (Sienna Cichlid): Totals from 1 (0.00% of 134572) affected shaders: no stat changes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14009>	2021-12-03 11:51:49 +00:00
Rhys Perry	5c0fe11072	nir/opt_offsets: fix try_extract_const_addition recursion This initially looks like a miscompilation bug, but I don't think it's actually possible for it to create incorrect code. fossil-db (Sienna Cichlid): Totals from 32 (0.02% of 134572) affected shaders: VGPRs: 1336 -> 1320 (-1.20%) CodeSize: 90552 -> 89468 (-1.20%) Instrs: 17007 -> 16852 (-0.91%); split: -0.92%, +0.01% Latency: 429040 -> 428136 (-0.21%); split: -0.21%, +0.00% InvThroughput: 84966 -> 84572 (-0.46%); split: -0.47%, +0.00% Copies: 1458 -> 1468 (+0.69%); split: -0.07%, +0.75% Branches: 382 -> 384 (+0.52%) PreSGPRs: 970 -> 968 (-0.21%) PreVGPRs: 1029 -> 1011 (-1.75%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14009>	2021-12-03 11:51:49 +00:00
Juan A. Suarez Romero	f77ccdfb4a	nir: add NIR_DEBUG envvar Move all the NIR related debug environmental variables in a single NIR_DEBUG one. Use NIR_DEBUG=help to print all the available options. v2: - Use a macro to simplify (Marcin, Jason) - Remove wrong changes (Marcin) v3 (Marcin): - Remove rendundant NIR mentioning in option descriptions. - Unwrap option descriptions. - Ensure the constant is unsigned. - Use extern array to remove switch. v4: - Add missing kernel shader (Jason). - Add unlikely() (Marcin). Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13840>	2021-12-03 11:15:29 +00:00
Emma Anholt	06fe04b4d7	nir: Make nir_build_alu() variants per 1-4 arg count. This saves a bunch of generated code to pack up the extra NULLs to get to 4 args, and saves executing the conditions in nir_build_alu() to then skip those NULLs. Saves another 27kb on disk. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>	2021-12-01 22:12:19 +00:00
Emma Anholt	e770ec1182	nir: Uninline a bunch of nir.h functions. I aimed for "things that look like big switch statements, or cases where the compiler is unlikely to be able to constant-propagate an argument into something useful." Saves another 80kb on disk. No perf difference on iris shader-db, n=23. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>	2021-12-01 22:12:19 +00:00
Lionel Landwerlin	8e568d3f00	nir/opt_deref: don't try to cast empty structures Found while running valgrind : ==3583454== Invalid read of size 4 ==3583454== at 0xF48336: glsl_get_struct_field_offset (nir_types.cpp:84) ==3583454== by 0xC7CD0D: opt_replace_struct_wrapper_cast (nir_deref.c:1068) ==3583454== by 0xC7CDD9: opt_deref_cast (nir_deref.c:1087) ==3583454== by 0xC7DD8E: nir_opt_deref_impl (nir_deref.c:1369) ==3583454== by 0xC7DF4E: nir_opt_deref (nir_deref.c:1428) ==3583454== by 0xA63F3C: brw_kernel_from_spirv (brw_kernel.c:325) ==3583454== by 0xA3BC2C: main (intel_clc.c:481) ==3583454== Address 0xe4f7e88 is 24 bytes after a block of size 48 in arena "client" Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13952>	2021-12-01 08:24:39 +00:00
Mykhailo Skorokhodov	391569e911	nir: Fix read depth for predecessors In some non-trivial cases (the amber script file in the merge request description) phi instruction has more than 32 elements in predecessors tree and that isn't recursion, just large tree. In that case, phis not fully converted into a register or mov, but successfully removed. The fix removes the counter and adds container of visited blocks. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3690 Cc: mesa-stable Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13710>	2021-11-30 00:12:48 +00:00
Rhys Perry	32a8b391e3	nir/tests: add DCE test for loops following a jump Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>	2021-11-29 22:22:24 +00:00
Rhys Perry	cc5dd15417	nir/cf: fix insertion of loops/ifs after jumps Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>	2021-11-29 22:22:24 +00:00
Rhys Perry	2fe13aa2ad	nir/dce: fix DCE of loops with a halt or return instruction in the pre-header If there is a halt or return instruction right before a loop with a single continue, we would have taken the fast path intended for loops without continues. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `71a985d80b` ("nir/dce: perform DCE for unlooped instructions in a single pass") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>	2021-11-29 22:22:24 +00:00
Ilia Mirkin	b7f423006a	nir/lower_clip: support clipdist array + no vars This runs after the "to io" lowering on freedreno. Support this case. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>	2021-11-28 04:44:56 +00:00
Ilia Mirkin	7efb1c4b29	nir/lower_clip: increment num_inputs/outputs by appropriate amount The inputs/outputs are meant to be in vec4 units. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>	2021-11-28 04:44:56 +00:00
Ilia Mirkin	3bf47700e2	nir/lower_clip: location offset goes into offset, not base Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>	2021-11-28 04:44:56 +00:00
Ilia Mirkin	a8930e6302	nir/lower_clip: replace bogus comment about gl_ClipDistance reading in GL gl_ClipDistance most definitely can be read in fragment shaders since GLSL 1.30. This is also accessible in ES with EXT_clip_cull_distance. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>	2021-11-28 04:44:56 +00:00
Marek Olšák	e54264c84f	nir: add shader_info::source_sha1, its initialization and printing Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13869>	2021-11-26 11:58:27 +00:00
Rhys Perry	34510ce3cc	nir/lower_subgroups: fix left shift of -1 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5365 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12901>	2021-11-24 16:45:05 +00:00
Rhys Perry	811a7a2d31	nir/lower_tex: don't calculate texture_mask for texture_index>=32 With Vulkan, texture_index can be 32 or larger, which creates a shift exponent larger than 31 (undefined behaviour). Since we don't use texture_mask with Vulkan, just initialize it to 0. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5365 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12901>	2021-11-24 16:45:04 +00:00
Rhys Perry	b425100781	spirv: run nir_copy_prop before nir_rematerialize_derefs_in_use_blocks_impl spirv_to_nir sometimes wraps derefs in vec2 or mov instructions as part of its texture handling. These get in the way of nir_rematerialize_derefs_in_use_blocks_impl. Running copy propagation should get rid of the extra move instructions and get us back to intact deref chains for everything except variable pointer use-cases. fossil-db (Sienna Cichlid): Totals from 6 (0.00% of 134572) affected shaders: CodeSize: 92656 -> 93088 (+0.47%) Instrs: 17060 -> 17138 (+0.46%) Latency: 224408 -> 227539 (+1.40%) InvThroughput: 37402 -> 37924 (+1.40%) VClause: 408 -> 402 (-1.47%) Copies: 1065 -> 1107 (+3.94%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5668 Fixes: `14a12b771d` ("spirv: Rework our handling of images and samplers") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13924>	2021-11-24 15:43:51 +00:00
Danylo Piliaiev	99388f0c27	freedreno/ir3: handle global atomics Only for a6xx since we don't know the instructions for global atomics on previous gens. Per Qualcomm's docs in OpenCL atomics are only supported since a5xx together with Generic memory space. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>	2021-11-23 18:26:37 +00:00
Emma Anholt	7603187aec	nir: Un-inline more of nir_builder.h. Cuts another 470KB of libnir.a in my release build. Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>	2021-11-22 20:40:47 +00:00
Emma Anholt	d9bfcf5f5b	nir: Un-inline nir_builder_alu_instr_finish_and_insert() This function is big and I don't think it will won't get meaningfully constant-propagated during inlining without LTO. Move it to a .c file so we just have one copy, saving 2.8MB from libnir.a on an amd64 release build. text data bss total filename before: 18953406 7768312 687260 27408978 build-release/driver-symlinks/iris_dri.so 9734366 5542453 481692 15758511 build-release/lib/libvulkan_intel.so 28687772 13310765 1168952 43167489 (TOTALS) after: 15478350 7767864 687260 23933474 build-release/driver-symlinks/iris_dri.so 6810366 5541685 481692 12833743 build-release/lib/libvulkan_intel.so 22288716 13309549 1168952 36767217 (TOTALS) No statistically significant performance difference on iris shader-db, n=8. Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>	2021-11-22 20:40:47 +00:00
Ilia Mirkin	3b5b4b5d45	nir: apply interpolated input intrinsics setting when lowering clipdist For drivers that use this in fragment shaders, load_input is going to produce incorrect results (flat-shaded values). Fixes clipping tests on a4xx. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>	2021-11-22 20:11:19 +00:00
Ilia Mirkin	df934873e1	nir: always keep the clip distance array size updated Drivers expect to know the number of clip distances irrespective of whether compact arrays are used or not. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>	2021-11-22 20:11:19 +00:00
Connor Abbott	508f917d8c	util/dag: Make edge data a uintptr_t Nobody was actually using it as a pointer, and I'm going to introduce a shared function which relies on it not being a pointer so let's fix this once and for all. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13722>	2021-11-17 13:41:47 +00:00
Samuel Pitoiset	011ea32585	nir: fix constant expression of ibitfield_extract This fixes dEQP-VK.graphicsfuzz.cov-condition-bitfield-extract-integer. For example, nir_ibitfield_extract(3, 1, 2) should return 1. Cc: 21.3 mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13791>	2021-11-16 17:32:21 +00:00
Timur Kristóf	59860d4873	nir: Group per-primitive outputs at the end for driver location assign. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	f23f7ef316	nir: Don't compact per-vertex and per-primitive outputs together. Prevent nir_compact_varyings from putting per-vertex and per-primitive output components in the same slot. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	e1e461d11c	nir: Lower cull and clip distance arrays for mesh shaders. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	6a502a0a2c	nir: Add new option to lower invocation ID from invocation index. Add this as an option to nir_lower_compute_system_values_options instead of just relying on the shader's options. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	7562e34463	nir, spirv: Don't mark NV_mesh_shader primitive indices as per-primitive. They are not per-primitive in NV_mesh_shader, but a flat array. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	d79d9a7a06	nir: Fix nir_lower_io with per primitive outputs. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	9cf4124be0	nir: Print Mesh Shader specific info. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Timur Kristóf	5aa39253cb	nir: Rename nir_get_io_vertex_index_src and include per-primitive I/O. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>	2021-11-16 07:46:55 +00:00
Ilia Mirkin	185826a400	nir: remove double-validation of src component counts The nir_tex_instr_src_size helper already sorts this out correctly, no need to do it twice, and validate_src takes care of it. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13781>	2021-11-16 01:23:41 +00:00
Daniel Schürmann	1e4c6e059e	nir/fold_16bit_sampler_conversions: skip sparse residency tex instructions The residency return value mismatches between NIR and Radeon. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13592>	2021-11-15 18:28:20 +00:00
Rhys Perry	719b48f85d	nir/lower_system_values: replace local_invocation_id components with zero fossil-db (Sienna Cichlid): Totals from 360 (0.28% of 128647) affected shaders: VGPRs: 7912 -> 7272 (-8.09%); split: -8.59%, +0.51% CodeSize: 542456 -> 544688 (+0.41%); split: -0.32%, +0.73% MaxWaves: 10866 -> 10952 (+0.79%) Instrs: 95973 -> 96010 (+0.04%); split: -0.34%, +0.38% Latency: 4366023 -> 4344664 (-0.49%); split: -0.90%, +0.41% InvThroughput: 19656659 -> 18297185 (-6.92%); split: -6.92%, +0.00% VClause: 3242 -> 3116 (-3.89%); split: -4.04%, +0.15% SClause: 3422 -> 3504 (+2.40%); split: -0.20%, +2.60% Copies: 8854 -> 9376 (+5.90%); split: -0.89%, +6.79% Branches: 2329 -> 2326 (-0.13%); split: -0.39%, +0.26% PreSGPRs: 7620 -> 7841 (+2.90%); split: -0.43%, +3.33% PreVGPRs: 5765 -> 5504 (-4.53%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel-schuermann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13757>	2021-11-12 18:59:51 +00:00
Alyssa Rosenzweig	e257344a82	nir/lower_pntc_ytransform: Support PointCoordIsSysval Pattern match the point coord sysval and support lowering it as well. This is required to handle flipped framebuffers on Bifrost. However, what this pass normalizes to is the opposite of the hardware mode we used on Bifrost before, so we need to swap modes at the same time to prevent regressions. Fixes Piglit glsl-fs-pointcoord and glsl-fs-pointcoord_gles2 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13073>	2021-11-12 12:34:14 +00:00
Marek Olšák	33b4eb149e	nir: add new SSA instruction scheduler grouping loads into indirection groups Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13604>	2021-11-08 21:20:11 +00:00
Filip Gawin	f32dcb6fe1	nir: assert that variables in optimize_atomic are initialized If you gonna view context of function parse_atomic_op, then you gonna know that index for array (data_src) can be unitialized. Imho this approach is cleaner than doing stuff inside parse_atomic_op. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12995>	2021-11-08 15:10:07 +00:00
Rhys Perry	12294026d5	nir/algebraic: optimize Cyberpunk 2077's open-coded bitfieldReverse() fossil-db (Sienna Cichlid): Totals from 9 (0.01% of 128647) affected shaders: CodeSize: 29900 -> 28640 (-4.21%) Instrs: 5677 -> 5443 (-4.12%) Latency: 96561 -> 95025 (-1.59%) Copies: 571 -> 544 (-4.73%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13673>	2021-11-05 09:31:04 +00:00
Mike Blumenkrantz	16f838576c	nir/lower_io_to_scalar: add support for bo and shared io Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13485>	2021-10-27 16:46:01 +00:00
Alyssa Rosenzweig	d8b1afdc85	nir/lower_blend: Use correct clamp for SNORM nir_lower_blend was written against the OpenGL ES 3.2 specification, which does not support blending SNORM render targets. The ES spec says that non-floating point buffers get clamped to [0, 1] before blending. The story is not so simple: SNORM buffers are blendable in OpenGL and must clamped to [-1, 1] rather than [0, 1]. Handle this case. NIR does have the fsat_signed_mali instruction to clamp to [-1, 1], but it is only implemented in Panfrost, and this pass is in common code. Open code it instead. Panfrost optimizes the open coded version, so this is good enough. Fixes SNORM subtests of Piglit arb_texture_view-rendering-formats. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13499>	2021-10-26 19:16:36 +00:00
Danylo Piliaiev	b7c7abded7	nir/serialize: Make more space for intrinsic_op allowing 1024 ops We are close to the limit of 512 intrinsics, make more space to be able to support up to 1024 intrinsics. Take one bit from packed_const_indices, they shouldn't suffer in a common case. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13456>	2021-10-25 16:17:09 +00:00
Danylo Piliaiev	1eee1fda11	nir/lower_amul: do not lower 64bit amul to imul24 Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13300>	2021-10-21 18:59:57 +00:00
Caio Marcelo de Oliveira Filho	662fbc0120	nir: Use a single binary for gtests Less artifacts and less time running linker. The load_store_vectorizer test is still split since we need to update gitlab-ci scripts to skip certain tests in certain builds. Added a TODO with the concrete suggestion. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13414>	2021-10-20 18:26:31 +00:00
Jason Ekstrand	b62b2fa4b9	compiler/types: Add a wrap_in_arrays helper This has been copied+pasted 3 times now. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>	2021-10-16 05:49:34 +00:00
Jason Ekstrand	5818d47ae6	spirv: Use texture types for sampled images Instead of using gsamplerND types for sampled images, use the new gtextureND types for sampled images and reserve gsamplerND for combined image+samplers. Combined image+sampler bindings still get a gsamplerND type. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>	2021-10-16 05:49:34 +00:00
Jason Ekstrand	b8a0bf2343	nir/deref: Also optimize samplerND -> textureND casts Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>	2021-10-16 05:49:34 +00:00
Jason Ekstrand	2ab5546a96	nir: Allow texture types Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>	2021-10-16 05:49:34 +00:00
Jason Ekstrand	3ace6b968b	compiler/types: Add a texture type This is separate from images and samplers. It's a texture (not a storage image) without a sampler. We also add C-visible helpers to convert between sampler and image types. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>	2021-10-16 05:49:34 +00:00
Jason Ekstrand	d343aef942	nir/serialize: Pack deref modes better With nir_var_image, we've now run out of bits in our packed blob for deref instructions. We could revert to an unpacked blob or we could be a bit more clever about how we encode deref modes and pack them into 5 bits. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>	2021-10-16 03:47:10 +00:00
Jason Ekstrand	9272a952c9	nir: Re-arrange the variable modes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>	2021-10-16 03:47:10 +00:00
Jason Ekstrand	956199e870	nir: s/nir_var_mem_image/nir_var_image/g We typically use nir_var_mem_* for stuff that has an explicit byte-based memory layout. Images are opaque. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>	2021-10-16 03:47:10 +00:00
Dylan Baker	e73096bd6d	meson: use gtest protocol for gtest based tests when possible With the `gtest` protocol meson will add some extra arguments to the test to generate better junit results, which may be useful. This protocol is only available in meson 0.55.0+, so keep using the default `exitcode` protocol for meson older than that. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8484>	2021-10-16 03:22:24 +00:00
Jason Ekstrand	58f605e4d4	nir: Drop our attempt at typed-based image mode validation This is broken for bindless images declared as local variables. It turns out nir_variable::data::bindless is only used for uniforms and we already assume anything in nir_var_function_temp or similar is bindless. We could try to make a tricky assert but now that we have everything else passing but now that we've got everyone converted the extra validation probably isn't necessary. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13384>	2021-10-15 22:35:59 +00:00
Jason Ekstrand	4c5a88d735	nir: Validate image variable modes We can also significantly simplify the foreach_image_variable helper. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>	2021-10-15 14:58:56 +00:00
Jason Ekstrand	6818811fc4	nir/lower_readonly_images_to_tex: Also rewrite variable modes Storage images will start using nir_var_mem_image but sampled images still use nir_var_uniform. If we're going to rewrite types, we need to rewrite the modes as well. Otherwise, nir_validate will get grumpy and drivers might get confused. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>	2021-10-15 14:58:56 +00:00
Jason Ekstrand	2a53c33fbe	nir: Add a nir_foreach_image_variable() iterator Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>	2021-10-15 14:58:55 +00:00
Caio Marcelo de Oliveira Filho	de3705edb0	nir: Add nir_var_mem_image Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>	2021-10-15 14:58:55 +00:00
Caio Marcelo de Oliveira Filho	872750bb96	nir/schedule: Handle nir_intrisic_scoped_barrier Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>	2021-10-15 14:58:55 +00:00
Mike Blumenkrantz	f769f34680	nir/print: print bindless info as applicable this is useful to know Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13204>	2021-10-14 15:11:38 +00:00
Ian Romanick	ae99ea6f4d	nir/loop_unroll: Always unroll loops that iterate at most once Two carchase compute shaders (shader-db) and two Fallout 4 fragment shaders (fossil-db) were helped. Based on the NIR of the shaders, all four had structures like for (i = 0; i < 1; i++) { ... for (...) { ... } } All HSW+ platforms had similar results. (Ice Lake shown) total loops in shared programs: 6033 -> 6031 (-0.03%) loops in affected programs: 4 -> 2 (-50.00%) helped: 2 HURT: 0 All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 143692018 -> 143692006 (-0.0%) SENDs in all programs: 6947154 -> 6947154 (+0.0%) Loops in all programs: 38285 -> 38283 (-0.0%) Cycles in all programs: 8434822225 -> 8434476815 (-0.0%) Spills in all programs: 191665 -> 191665 (+0.0%) Fills in all programs: 298822 -> 298822 (+0.0%) In the presense of loop unrolling like this, the change in cycles is not accurate. v2: Rearrange the logic in the if-condition to read a little better. Suggested by Tim. Closes: #5089 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13323>	2021-10-13 20:11:13 -07:00
Qiang Yu	50c0451424	nir/linker: rename replace_constant_input to replace_varying_input_by_constant_load To align with replace_varying_input_by_uniform_load and better describe what it does. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>	2021-10-13 04:45:15 +00:00
Qiang Yu	2604625043	nir/linker: support uniform when optimizing varying Varying assigned from uniform won't change after interpolation, so move uniform load to fragment shader to eliminate the varying. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>	2021-10-13 04:45:15 +00:00
Filip Gawin	28a6e45a0f	nir: avoiding reading unitialized memory when using nir_dest_copy Deeper in chain of calls, function "src_has_indirect" is used (which reads "is_ssa" and "reg.indirect"). Fixes: `d1eae6f36b` ("nir: Properly clean up nir_src/dest indirects") Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13317>	2021-10-13 02:21:20 +00:00
Connor Abbott	b516208a55	nir/lower_ubo_vec4: Fix align_mul=8 special case In order for the load to never straddle the load can't extend past 8 bytes, not 16. For example a vec2 load with align_mul = 8 and align_offset = 4 can straddle. Fixes assertion failures when we stop pushing UBOs in the preamble on a6xx. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>	2021-10-12 11:30:52 +00:00
Jason Ekstrand	878d8d96c7	nir/lower_discard_or_demote: Fix metadata Passes generally shouldn't use nir_metadata_all unless they don't change the program in any significant way. Some of these passes insert new instructions so they should definitely not be preserving most of it. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13261>	2021-10-08 23:24:49 +00:00
Chia-I Wu	8cce6281e6	util/vector: make util_vector_init harder to misuse Make u_vector_init a wrapper to u_vector_init_pot. Let both take (element_count, element_size) as parameters. Motivated by `eed0fc4caf` ("vulkan/wsi/wayland: fix an invalid u_vector_init call") v2: rename u_vector_init_pot to u_vector_init_pow2 Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Simon Ser <contact@emersion.fr> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13201>	2021-10-08 00:15:11 +00:00
Boris Brezillon	56251f924d	nir: Add a nir_sysvals_to_varyings() helper Allow backends to turn some sysvals into input varyings so the frontend (in our case spirv_to_nir()) doesn't have to bother selecting which one is expected. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>	2021-10-07 19:45:35 +00:00
Jason Ekstrand	b71bdc3404	nir/algebraic: Add some opts for comparisons of comparisons Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Jason Ekstrand	7abf3955ca	nir/algebraic: Add some boolean optimizations Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Jason Ekstrand	c8b2be0b95	nir/algebraic: Lower fisfinite Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Rhys Perry	f3723822a4	nir/lower_tex: add lower_to_fragment_fetch_amd Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>	2021-10-07 15:36:39 +00:00
Rhys Perry	225fe37c14	nir: add _amd suffix to fragment_mask_fetch and fragment_fetch texops Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>	2021-10-07 15:36:39 +00:00
Marcin Ślusarz	3a18963b08	nir/print: pad 64-bit constants with zeroes ... just like other-size constants are. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13223>	2021-10-07 10:49:15 +00:00
Emma Anholt	7dde279db5	nir-to-tgsi: Avoid emitting TXL just for lod 0 on non-vertex shaders. Prompted by comparing virgl fails and finding that it has issues with immediate args to TXL/TXB, at least. Acked-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>	2021-10-06 03:44:17 +00:00
Ian Romanick	cb28361642	nir/algebraic: Small optimizations for SpvOpFOrdNotEqual and SpvOpFUnordEqual No shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 144380118 -> 143692823 (-0.5%) SENDs in all programs: 6920822 -> 6920822 (+0.0%) Loops in all programs: 38299 -> 38299 (+0.0%) Cycles in all programs: 8434782176 -> 8423078994 (-0.1%) Spills in all programs: 206830 -> 204469 (-1.1%) Fills in all programs: 318737 -> 313660 (-1.6%) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>	2021-10-06 01:53:47 +00:00
Alyssa Rosenzweig	3e8f540753	nir: Add Mali-specific derivative opcodes Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying: fabs(fdd_must_abs_mali(v)) = fabs(fdd(v)) The sign of their result is undefined. On Bifrost and Valhall, these unsigned derivatives can be implemented more efficiently than the correctly-signed counterparts, since the sign fixup requires extra ALU instructions. On backends where this is the case, it is useful to optimize fabs(fdd(v)) to fabs(fdd_must_abs_mali(v)). This pattern comes up with the GLSL builtin `fwidth`. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>	2021-10-06 00:40:57 +00:00
Lionel Landwerlin	d0a3a11258	nir/lower_io: preserve all metadata when no progress Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13168>	2021-10-05 11:23:23 +00:00
Marcin Ślusarz	e26328582a	nir: preserve all metadata when nir_opt_vectorize doesn't make progress Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>	2021-10-05 10:02:54 +00:00
Marcin Ślusarz	54df09c8d4	nir: preserve all metadata when nir_propagate_invariant doesn't make progress Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>	2021-10-05 10:02:54 +00:00
Marcin Ślusarz	804c56f1a2	nir: preserve all metadata when nir_lower_int_to_float doesn't make progress Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>	2021-10-05 10:02:54 +00:00
Boris Brezillon	7cd402c9c8	nir/lower_blend: Shrink blended result if needed Make sure the new and old sources have the same number of components, otherwise the NIR validation pass complains. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>	2021-09-30 16:54:42 +02:00
Boris Brezillon	3e07b8d4f8	nir/lower_blend: Make sure we're not passed scaled formats SCALED formats are interpreted as floats, but not in the usual [0, 1] or [-1, 1] range, meaning that the blend lowering logic can't directly apply to those. Assert that the format being passed is not a scaled format. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>	2021-09-30 16:54:42 +02:00
Boris Brezillon	15b4cab4d5	nir/lower_blend: Don't lower RTs whose format is set to NONE The caller doesn't necessarily want to lower blend operations on all render targets since some of them might be supported natively (panvk will be in that case). Let's just skip RTs that have a format set to PIPE_FORMAT_NONE to allow that. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>	2021-09-30 16:54:42 +02:00
Boris Brezillon	637cd5ac00	nir/lower_blend: Pad src to a 4-component vector nir_ssa_for_src() is not supposed to pad the src vector if dst->num_components > src->num_components. Let's pad things explicitly with nir_pad_vector(). Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>	2021-09-30 16:54:42 +02:00
Boris Brezillon	641bed3103	nir: Make sure src->num_components < dst->num_components in nir_ssa_for_src() The NIR validation complains if the swizzle accesses a component that's not present in the source. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>	2021-09-30 16:54:42 +02:00
Lionel Landwerlin	daa8a81d99	nir: fix opt_memcpy src/dst mixup Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `f6667cb0ce` ("nir: Add a memcpy optimization pass") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13079>	2021-09-28 16:36:08 +00:00
Rhys Perry	e43007af56	nir/opt_if: add opt_if_rewrite_uniform_uses Turns: if (a == (b=readfirstlane(a))) use(a) into: if (a == (b=readfirstlane(a))) use(b) Improves divergence analysis and lets us scalarize use(a). Improves Cyberpunk 2077 performance. fossil-db (Sienna Cichlid, Cyberpunk 2077): Totals from 57 (10.56% of 540) affected shaders: VGPRs: 4904 -> 4040 (-17.62%) CodeSize: 624360 -> 626828 (+0.40%); split: -0.06%, +0.46% MaxWaves: 656 -> 824 (+25.61%) Instrs: 119770 -> 119447 (-0.27%); split: -0.49%, +0.22% Latency: 1950256 -> 1633110 (-16.26%); split: -16.26%, +0.00% InvThroughput: 364852 -> 292089 (-19.94%) VClause: 1512 -> 1008 (-33.33%) SClause: 2693 -> 3196 (+18.68%) Copies: 10050 -> 9955 (-0.95%); split: -3.34%, +2.40% Branches: 3476 -> 3547 (+2.04%) PreSGPRs: 4003 -> 5076 (+26.80%) PreVGPRs: 4709 -> 3810 (-19.09%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12472>	2021-09-24 18:41:18 +00:00
Rhys Perry	69f9a96af1	nir: add nir_src_components_read() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12472>	2021-09-24 18:41:18 +00:00
Caio Marcelo de Oliveira Filho	240e60ba76	nir/lower_io_to_vector: Allow Task/Mesh to load from outputs Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12951>	2021-09-24 14:35:15 +00:00
Bas Nieuwenhuizen	0d8bd8518d	nir: Support ray launch size in divergence analysis. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen	56b06c09b4	nir: Add AMD rt intrinsics. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Bas Nieuwenhuizen	b6be96a2bd	radv: Modify load_sbt_amd intrinsic to get the descriptor. That way we can get the address to the entry, which is needed for some nir builtins because extra data in the entry can be used as shader input. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>	2021-09-21 01:53:39 +00:00
Timur Kristóf	872d21820f	nir: Exclude non-generic patch variables from get_variable_io_mask. These are I/O variables which are not going to be removed anyway. However, get_variable_io_mask handles their location incorrectly. Found using the GCC undefined behavior sanitizer. Fixes the following error: runtime error: shift exponent 4294967258 is too large for 64-bit type 'long unsigned int' Closes: #5319 Fixes: `cf5f8f55c3` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12719>	2021-09-20 18:08:16 +00:00
Ian Romanick	d7ba52cce9	nir/edgeflags: Add a flag to indicate the edge flag input is needed Most modern hardware needs the edge flag added as a hidden vertex input and needs code added to the vertex shader to copy the input to an output. Intel hardware is a little different. Gfx4 and Gfx5 hardware works in the previously described mannter. Gfx6+ hardware needs the edge flag as a specific vertex shader input, and that input is magically processed by fixed-function hardware without need for extra shader code. This flag signals only that the vertex shader input is needed. It would be nice if we could decouple adding the vertex shader input from generating the copy-to-output code, but that has proven to be challenging. Not having that code causes other passes to want to eliminate that shader input. v2: Convert conditional to assertion. This pass is only called for vertex shaders. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>	2021-09-17 16:36:08 -07:00
Rhys Perry	a1af902531	nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants This allows for more MAD/FMA instructions to be created. fossil-db (Sienna Cichlid): Totals from 50134 (33.46% of 149839) affected shaders: VGPRs: 2436536 -> 2436000 (-0.02%); split: -0.05%, +0.03% SpillSGPRs: 13136 -> 13135 (-0.01%); split: -0.02%, +0.02% CodeSize: 206621424 -> 206278292 (-0.17%); split: -0.23%, +0.07% MaxWaves: 1116804 -> 1117448 (+0.06%); split: +0.07%, -0.01% Instrs: 38977460 -> 38862886 (-0.29%); split: -0.33%, +0.04% Latency: 832425389 -> 827432260 (-0.60%); split: -0.63%, +0.03% InvThroughput: 184193457 -> 183563350 (-0.34%); split: -0.37%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7458>	2021-09-17 17:28:26 +00:00
Jason Ekstrand	6c7d23e6ca	nir: Stop sweeping indirects They're no longer ralloc'd. Fixes: `879a569884` "nir: Switch from ralloc to malloc for NIR instructions." Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>	2021-09-16 11:28:36 +00:00
Jason Ekstrand	d1eae6f36b	nir: Properly clean up nir_src/dest indirects Now that they're no longer ralloc'd, we have to be much more careful about indirects. We have to make sure every time a source or destination is overwritten, its indirect (if any) is freed. We also have to choose a memory ownership convention for the rewrite functions. Assuming that they will be called with the source from some other instruction, we choose to always make a copy of the indirect (if any). It's the responsibility of the caller to ensure its copy of the indirect is freed. Unfortunately, all this extra logic is going to make nir_instr_rewrite/move_src/dest more expensive because they now have all the logic of nir_src/dest_copy instead of a simple struct assignment. Fortunately, the vast majority of rewrite calls are done by nir_ssa_def_rewrite_uses which is an SSA-only fast-path. Fixes: `879a569884` "nir: Switch from ralloc to malloc for NIR instructions." Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>	2021-09-16 11:28:36 +00:00
Emma Anholt	aed4c0b5a9	nir: Drop the unused instr arg for src/dest copy functions. Now that we don't use ralloc, we don't need this arg to get at the right ralloc ctx. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	879a569884	nir: Switch from ralloc to malloc for NIR instructions. By replacing the 48-byte ralloc header with our exec_node gc_node (16 bytes), runtime of shader-db on my system across this series drops -4.21738% +/- 1.47757% (n=5). Inspired by discussion on #5034. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	feee5e6974	nir/tests: Fix transmuting an SSA dest to be non-SSA With the de-ralloc changes, having the register dest not have its .reg properly initialized caused crashes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	1edff520e2	nir/lower_phis_to_scalar: Use nir_instr_free() to free instrs. Preparation for de-rallocing instrs. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	d1a2870f78	nir: Add all allocated instructions to a GC list. Right now we're using ralloc to GC our NIR instructions, but ralloc has significant overhead for its recursive nature so it would be nice to use a simpler mechanism for GCing instructions. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:06 +00:00
Emma Anholt	22788d68eb	nir: Consistently pass the instr to nir_src_copy(). The arg says it's supposed to be the instr, not the shader. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	5e37cfb7fe	nir: Consistently pass the shader to the shader arg of instr creation. We were using the ralloc parent in some places, which should work out to be the shader I think, but to de-ralloc the instrs we should just pass the existing shader pointer in. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	7a4bbe60c1	nir/from_ssa: Use nir_instr_free() to free instrs instead of ralloc. This code was being tricky with passing a mem_ctx instead of the shader, then freeing the mem_ctx when the pass was done and all the parallel copies had been removed from the shader. Use the right type for instr creation and do a bit of manual list management to prepare the way for non-ralloc NIR instrs. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	b99efb8af0	nir: Pull the instr list free function out to a helper. With the de-rallocing, we're going to have some more places that free a list of instrs. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Emma Anholt	36d9bdca0b	nir: Add a nir_instr_free() to replace ralloc_free(instr). This will gain another step shortly. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>	2021-09-14 17:53:05 +00:00
Ian Romanick	7956a701d8	nir/lower_gs_intrinsics: Make nir_lower_gs_intrinsics be idempotent Calling this lower pass twice in a row would cause spurious set_vertex_and_primitive_count(0, undef) intrinsics after the proper set_vertex_and_primitive_count intrinsic. This pretty much turns any geometry shader into garbage. Fix this by treating nir_intrinsic_emit_vertex_with_counter and nir_intrinsic_end_primitive_with_counter just like the non-_with_counter versions. If no blocks would need set_vertex_and_primitive_count intrinsics added, exit the pass before doing any work. This prevents the need for DCE to do extra clean up later. Since this pass is potentially called multiple times via multiple invocations of a finalize_nir callback, it is (hypothetically?) possible that control flow could be changed to add new blocks that need this intrinsic. The check implemented in this commit should be robust against that possibility. v2: Add a_block_needs_set_vertex_and_primitive_count. Suggested by Timur. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>	2021-09-14 09:13:07 -07:00
Ian Romanick	edf357b233	nir/lower_gs_intrinsics: Return progress if append_set_vertex_and_primitive_count makes progress Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Fixes: `542d40d698` ("nir: Add new GS intrinsics that maintain a count of emitted vertices.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>	2021-09-14 09:12:47 -07:00
Bas Nieuwenhuizen	b05cd10b8e	nir: Avoid visiting instructions multiple times in nir_instr_free_and_dce. Sadly need to poke a bit in the src internals to avoid using yet another heap allocated datastructure. Fixes: `5251548572` ("nir: Add a nir_instr_remove that recursively removes dead code.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5323 Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12726>	2021-09-09 21:35:03 +00:00
Rhys Perry	c1f724b2b9	nir: fix serialization of loop/if control Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `e76ae39ae2` ("nir: add support for user defined select control") Fixes: `b56451f82c` ("nir: add support for user defined loop control") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12778>	2021-09-09 10:32:30 +00:00
Qiang Yu	7054c1b7fd	nir/linker: pack varyings with different interpolation qualifier Driver like radeonsi load varying in a scalar manner, so prefer to pack varying with different interpolation qualifier into same slot to save space. But driver like panfrost/bifrost can load varying in vector manner, so prefer to pack varying with same interpolation qualifier. Driver can add interpolation qualifiers which are able to be packed into same varying slot to pack_varying_options nir option. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>	2021-09-09 06:00:58 +00:00
Qiang Yu	5a24aed1ac	nir/lower_io_to_vector: check centroid & sample when merge variable These qualifiers should be respected for different varying load code generation. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>	2021-09-09 06:00:58 +00:00
Rob Clark	b8b475ad4e	nir/lower_amul: Fix usage of nir_foreach_src() nir_foreach_src() bails after cb returns false for any src. Which isn't the behavior we were looking for. Move progress flag to state struct instead, so we don't skip visiting some sources. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>	2021-09-06 15:58:05 +00:00
Rob Clark	5800fde1bb	nir/lower_amul: Handle load/store_global These need more than 24b. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>	2021-09-06 15:58:05 +00:00
Enrico Galli	9461fe5cf1	nir: Add CAN_REORDER to load_ubo_dxil Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12707>	2021-09-03 16:21:03 +00:00
Rhys Perry	41ecef7855	nir: add sdot_2x16 and udot_2x16 opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Rhys Perry	ae00f5af61	nir: separate lower_add_sat Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Timur Kristóf	33630090a2	nir: Add comment to explain the sad_u8x4 opcode. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12649>	2021-09-01 08:42:03 +00:00
Emma Anholt	33182c555f	nir/nir_lower_uniforms_to_ubo: Set the explicit stride of the UBO 0 uniform. Normal UBOs have explicit strides on them, make our lowered one behave the same. Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>	2021-08-31 20:12:16 +00:00
Emma Anholt	01759d3fb2	nir: Set .driver_location for GLSL UBO/SSBOs when we lower to block indices. Without this, there's no way to match the UBO nir_variable declarations to the load_ubo intrinsics referencing their data. Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>	2021-08-31 20:12:16 +00:00
Timur Kristóf	548b383310	nir: Fix local_invocation_index upper bound for non-compute-like stages. The lowered LS and NGG stages use local_invocation_index and they can benefit from the unsigned upper bound because they can emit a less expensive integer multiplication instruction. This was working in the past, but accidentally borked by a refactor. Fossil DB changes on Sienna Cichlid: Totals from 956 (0.74% of 128647) affected shaders: CodeSize: 2354172 -> 2344712 (-0.40%) Instrs: 434359 -> 434327 (-0.01%) Latency: 1883949 -> 1876814 (-0.38%) InvThroughput: 762638 -> 757405 (-0.69%) Fossil DB changes on Sienna Cichlid (with NGGC enabled): Totals from 57873 (44.99% of 128647) affected shaders: CodeSize: 155844192 -> 155607064 (-0.15%) Instrs: 29799184 -> 29799152 (-0.00%) Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00% InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00% Fixes: `8af6766062` Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>	2021-08-30 14:05:33 +00:00
Timur Kristóf	a25fd1787a	nir: Add unsigned upper bound for extract opcodes. This helps with some cases of extract, such as: - Emitting more optimal integer multiplications - Better address calculation - Possibly others Fossil DB results on Sienna Cichlid: Totals from 4064 (3.16% of 128647) affected shaders: VGPRs: 262040 -> 262032 (-0.00%) CodeSize: 28856648 -> 28811892 (-0.16%); split: -0.18%, +0.02% Instrs: 5370279 -> 5367827 (-0.05%); split: -0.08%, +0.04% Latency: 74230112 -> 74016671 (-0.29%); split: -0.29%, +0.01% InvThroughput: 12082532 -> 12036365 (-0.38%); split: -0.39%, +0.01% VClause: 108506 -> 108721 (+0.20%); split: -0.03%, +0.22% SClause: 217731 -> 216602 (-0.52%); split: -0.67%, +0.15% Copies: 265689 -> 270811 (+1.93%); split: -0.26%, +2.19% PreSGPRs: 201982 -> 204907 (+1.45%); split: -0.01%, +1.46% PreVGPRs: 236099 -> 236079 (-0.01%) Fossil DB results on Sienna Cichlid with NGGC enabled: Totals from 60375 (46.93% of 128647) affected shaders: VGPRs: 2212576 -> 2212568 (-0.00%) CodeSize: 180870420 -> 179684816 (-0.66%); split: -0.66%, +0.00% Instrs: 34386715 -> 34213682 (-0.50%); split: -0.51%, +0.01% Latency: 199676290 -> 198987998 (-0.34%); split: -0.35%, +0.00% InvThroughput: 32288299 -> 31736433 (-1.71%); split: -1.71%, +0.00% VClause: 621521 -> 621743 (+0.04%); split: -0.00%, +0.04% SClause: 900447 -> 899392 (-0.12%); split: -0.16%, +0.04% Copies: 3439529 -> 3445305 (+0.17%); split: -0.02%, +0.19% PreSGPRs: 2216297 -> 2219220 (+0.13%); split: -0.00%, +0.13% PreVGPRs: 1842887 -> 1842867 (-0.00%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>	2021-08-30 14:05:33 +00:00
Caio Marcelo de Oliveira Filho	10a03e30cf	nir: Allow Task/Mesh to lower compute system values Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:43 +00:00
Caio Marcelo de Oliveira Filho	4f52681a2d	nir: Don't lower Task/Mesh I/O to temporaries These won't work since a workgroup can span more than one thread, and the temporaries are not shared memory. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:43 +00:00
Caio Marcelo de Oliveira Filho	27697d5eb8	nir/divergence_analysis: Handle Task/Mesh shaders Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	bf5f6add01	nir/lower_io: Identify Mesh output as arrayed Mesh shader outputs are either: - non-array builtins - array builtins that are either per-primitive or per-vertex - user-defined outputs that must be either per-primitive or per-vertex So we can identify any array output as "arrayed" for the purposes of I/O lowering. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	cd394017c8	nir: Add per-primitive I/O intrinsics Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	f95daad3a2	nir: Add a way to identify per-primitive variables Per-primitive is similar to per-vertex attributes, but applies to all fragments of the primitive without any interpolation involved. Because they are regular input and outputs, keep track in shader_info of which I/O is per-primitive so we can distinguish them after deref lowering. These fields can be used combined with the regular `inputs_read`, `outputs_written` and `outputs_read`. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho	927584fa67	nir: Update documentation for location to mention Task/Mesh Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>	2021-08-28 03:56:42 +00:00
Filip Gawin	46f3582c6f	nir: fix ifind_msb_rev by using appropriate type As you can see comparion "x < 0" doesn't make sense if x is unsigned. Fixes: `a5747f8a` ("nir: add opcodes for *find_msb_rev and lowering ") Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12548>	2021-08-26 18:35:31 +00:00
Filip Gawin	9083e9a483	nir: fix shadowed variable in nir_lower_bit_size.c Fixes: `6d79298992` ("nir/lower_bit_size: fix lowering of {imul,umul}_high") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12527>	2021-08-26 18:04:22 +00:00
Lionel Landwerlin	a13e79843e	nir: prevent peephole from generating invalid NIR We can't append instructions following a return/halt instruction because the control flow helpers will modify the successor of the block containing the return/halt. And the NIR validator enforces that the return/halt must have the end of the function as successor. This tends to happen following lower_shader_calls lowering which inserts halts. This probably doesn't prevent the optimization, it'll just happen in one of the return shaders after the halt has been removed. v2: Move prev block ending check earlier in the function (Daniel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>	2021-08-25 11:38:21 +00:00
Samuel Pitoiset	cff106c4b6	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(fabs(b), -a) and fmin(-fmax(b, a)) to fmin(-fabs(b), -a). fossils-db (Sienna Cichlid): Totals from 34 (0.02% of 150170) affected shaders: CodeSize: 388540 -> 387748 (-0.20%) Instrs: 74621 -> 74423 (-0.27%) Latency: 1039407 -> 1039011 (-0.04%) InvThroughput: 208364 -> 208150 (-0.10%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12519>	2021-08-25 07:18:24 +02:00
Ian Romanick	a6db40605e	nir/algebraic: Add some extract optimizations These help quite a bit when vectored versions of SpvOpSDotKHR and friends are emitted as packed versions and then lowered. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	839495efc6	nir/algebraic: Add lowering for dot_4x8 instructions v2: Fix copy-and-paste bugs in lowering patterns. v3: Add has_sudot_4x8 flag. Requested by Rhys. v4: Since the names of the opcodes changed from dp4 to dot_4x8, also change the names of the lowering helpers. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	806cd2341c	nir/algebraic: Basic patterns for dot_4x8 v2: Add and modify patterns to let constant folding do better. v3: Remove '(is_not_zero)' from the patterns that try to combine addends. I honestly don't know why I had it there in the first place, and nothing in my deep git logs could help clue me in. Noticed by Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by Alyssa and Jason. Add missing sudot_4x8 patterns. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	6c18a3b497	nir/opcodes: Add integer dot-product opcodes Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd, sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These represent the combinations of integer dot-product and add that operate on packed source vectors. That is, the four 8-bit values for each vector is stored in a single 32-bit integer. Some hardware may prefer to operate on unpacked byte vectors. When such hardware comes to Mesa, we'll have to figure out how to name things. v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes are not 2-source commutative. v3: Rename all opcodes to be more like some existing 4x8 opcodes. Suggested by Timur. Change type of packed vector sources to uint32, change types of constant folding variables to have explicit size, and delete some extra casts. All suggested by Jason. v4: Fix typo previously noticed by Alyssa but missed in v2. v5: Add has_sudot_4x8 flag. Requested by Rhys. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	7d8bf7c167	nir/lower_bit_size: Support add_sat and sub_sat Without this, lowered saturating ALU instructions would only clamp to the range of the new type instead of the range of the old type. v2: Use nir_iclamp. Suggested by Jason. Use new u_{int,uint}N_{min,max}() helpers. Fixes: `090e282407` ("nir: Add a saturated unsigned integer add opcode") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Rhys Perry	3d228b6926	nir/gcm: pin some instructions which require uniform sources fossil-db (Sienna Cichlid, GCM enabled): Totals from 6192 (4.12% of 150170) affected shaders: VGPRs: 548392 -> 542040 (-1.16%) SpillSGPRs: 3702 -> 3990 (+7.78%); split: -0.54%, +8.32% CodeSize: 62418488 -> 62481516 (+0.10%); split: -0.07%, +0.17% MaxWaves: 70582 -> 71718 (+1.61%) Instrs: 11768497 -> 11795079 (+0.23%); split: -0.07%, +0.30% Latency: 445891848 -> 523561297 (+17.42%); split: -0.07%, +17.49% InvThroughput: 115675481 -> 121494913 (+5.03%); split: -0.09%, +5.12% VClause: 164914 -> 164934 (+0.01%); split: -0.05%, +0.06% SClause: 405991 -> 395302 (-2.63%); split: -2.64%, +0.00% Copies: 907216 -> 926429 (+2.12%); split: -1.11%, +3.23% Branches: 456373 -> 457478 (+0.24%); split: -0.13%, +0.38% PreSGPRs: 648030 -> 642953 (-0.78%); split: -0.88%, +0.10% PreVGPRs: 522425 -> 516355 (-1.16%); split: -1.16%, +0.00% Seems to affect Detroit: Become Human and Cyberpunk 2077. The Cyberpunk 2077 changes look like a fixed bug. At least some of the Detroit: Become Human changes could probably be removed with better divergence analysis. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12444>	2021-08-24 16:52:31 +00:00
Rhys Perry	884ac52eaa	nir: consider push constant loads as always dynamically uniform Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12444>	2021-08-24 16:52:31 +00:00
Daniel Schürmann	2cf164feb9	nir/opt_algebraic: optimize flrp(fadd, fadd, x) only if fadd are used_once Totals from 201 (0.13% of 150170) affected shaders: (GFX10.3) VGPRs: 13880 -> 13856 (-0.17%) CodeSize: 1517328 -> 1518124 (+0.05%); split: -0.04%, +0.10% MaxWaves: 3184 -> 3192 (+0.25%) Instrs: 285487 -> 285569 (+0.03%); split: -0.06%, +0.08% Latency: 7774066 -> 7780877 (+0.09%); split: -0.10%, +0.19% InvThroughput: 1936341 -> 1935287 (-0.05%); split: -0.07%, +0.02% SClause: 11446 -> 11448 (+0.02%); split: -0.01%, +0.03% Copies: 17500 -> 17506 (+0.03%); split: -0.51%, +0.55% Branches: 8174 -> 8180 (+0.07%); split: -0.13%, +0.21% PreVGPRs: 12507 -> 12427 (-0.64%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>	2021-08-24 16:10:30 +00:00
Daniel Schürmann	89a842b2b6	nir/loop_analyze: consider instruction cost of nir_op_flrp Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>	2021-08-24 16:10:30 +00:00
Rhys Perry	aeb1b4c30c	nir/lower_io: use nir_vector_insert_imm() This creates a single nir_op_vecn instead of a nir_op_vecn and several copies. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>	2021-08-24 10:35:19 +00:00
Samuel Pitoiset	f4b858e746	Revert "nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)" This is wrong for negative values. This reverts commit `07cd30ca29`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12515>	2021-08-24 08:58:38 +00:00
Samuel Pitoiset	07cd30ca29	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a) Found with Cyberpunk 2077. fossils-db (GFX10.3): Totals from 128 (2.34% of 5465) affected shaders: CodeSize: 769720 -> 767656 (-0.27%); split: -0.27%, +0.00% Instrs: 145748 -> 145229 (-0.36%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11604>	2021-08-23 17:53:38 +00:00
Daniel Schürmann	59f2c85845	nir: return false for loops in contains_other_jump() Allows to unwrap more loops. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12473>	2021-08-19 13:51:17 +00:00
Qiang Yu	e6790d4a31	nir/inline_uniforms: support loop Be able to inline uniforms in loop for unrolling it. Nested loop/if is also supported. Some example: for (i = 0; i < count; i++) ... uniform "count" will be inlined. But note this does not make sure the loop will be unrolled (ie. count = 1000). for (i = 0; i < count; i++) for (j = init; j < 10; j++) if (type == 2) ... uniform "count", "init" and "type" will be inlined. It is intentional to not be too aggressive to add uniforms to avoid false positive case while be able to support most common usage. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>	2021-08-19 02:17:35 +00:00
Qiang Yu	3c93ebbae5	nir/loop_analyze: skip unsupported induction variable early Instead of fail in trip count calculation, just don't mark such kind of variable as induction from the beginning. Don't bother inline uniform to deal with such kind of variable either. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>	2021-08-19 02:17:35 +00:00
Qiang Yu	0b9639c35d	nir/loop_analyze: record induction variables for each loop For being used by uniform inline lowering pass. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>	2021-08-19 02:17:35 +00:00
Qiang Yu	c86ec09d11	nir/loop_analyze: move nir_is_supported_terminator_condition() to header To be shared with uniform inline. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>	2021-08-19 02:17:35 +00:00
Qiang Yu	a406fff78a	nir/inline_uniforms: support vector uniform Collect per vector component dependency and lower vector uniform load to scalar if any component need to be inlined. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>	2021-08-19 02:17:35 +00:00
Qiang Yu	9d796b21ac	nir/inline_uniforms: add uniforms in condition atomically Unless all uniforms in the condition can be inlined we can lower the if/loop. So we rollback added uniforms when one of uniforms in a if condition fail to be added. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>	2021-08-19 02:17:35 +00:00
Ian Romanick	f0a8a9816a	nir: intel/compiler: Add and use nir_op_pack_32_4x8_split A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results in a lot of shifts and MOVs. When that pattern can be recognized, the individual 8-bit components can be packed much more efficiently. v2: Rebase on `b4369de27f` ("nir/lower_packing: use shader_instructions_pass") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Ian Romanick	89f639c0ca	nir/algebraic: Remove spurious conversions from inside logic ops Not only does this eliminate a bunch of unnecessary type converting MOVs, but it can also enable some SWAR. The dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does something about like: c = a.x ^ b.x; d = a.y ^ b.y; e = a.z ^ b.z; After this change, it looks more like: uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b); c = extract_u8(t, 0); d = extract_u8(t, 1); e = extract_u8(t, 2); On Ice Lake, this results in: SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Ian Romanick	a147717a93	nir/algebraic: Optimize some extract forms resulting from 8-bit lowering This eliminates some spurious, size-converting moves. For example, on Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag: SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends v2: Condition two of the patterns on !options->lower_extract_byte. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Mike Blumenkrantz	649251ad4e	nir/lower_vectorize_tess_levels: set num_components for vectorized loads this otherwise explodes when rewriting e.g., a single array component load to a vec4 Fixes: `f5adf27fb9` ("nir,radv: add and use nir_vectorize_tess_levels()") fixes zmike/mesa#94 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12419>	2021-08-18 12:18:15 +00:00
Timothy Arceri	edfcc4f022	nir: fix GCM when GVN enabled Enabling GVN uncovered a bug where we would crash if the pass thinking about pushing something into a loop. Fixes: `6538b3e566` ("nir: add heuristic for instructions in loops with GCM") Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12242>	2021-08-17 03:15:49 +00:00
Rhys Perry	cfc4433015	nir,glsl_to_nir: use nir_fdot() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	28acc4120f	nir: lower fdot to ffma if lower_ffma=false fossil-db (GFX10.3): Totals from 57689 (39.44% of 146267) affected shaders: VGPRs: 2873712 -> 2873432 (-0.01%); split: -0.01%, +0.00% CodeSize: 227661100 -> 227583572 (-0.03%); split: -0.08%, +0.04% MaxWaves: 1289562 -> 1289598 (+0.00%); split: +0.01%, -0.00% Instrs: 43115433 -> 43083308 (-0.07%); split: -0.12%, +0.05% Latency: 869947191 -> 870279826 (+0.04%); split: -0.06%, +0.10% InvThroughput: 199425811 -> 199434448 (+0.00%); split: -0.04%, +0.05% fossil-db (GFX10): Totals from 2 (0.00% of 146267) affected shaders: Latency: 8123 -> 8107 (-0.20%) fossil-db (GFX9): Totals from 2 (0.00% of 146401) affected shaders: (no stat changes) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	174a4f36f9	nir: create ffma from builders more often We will not be able to combine instructions into ffma later if they are exact, so create them from the start. They can be lowered later if they are unwanted. fossil-db (GFX10.3): Totals from 16589 (11.34% of 146267) affected shaders: VGPRs: 938872 -> 938704 (-0.02%) SpillSGPRs: 11334 -> 10785 (-4.84%) CodeSize: 96551964 -> 96498040 (-0.06%); split: -0.08%, +0.02% MaxWaves: 338760 -> 338772 (+0.00%) Instrs: 18356857 -> 18350486 (-0.03%); split: -0.06%, +0.02% Latency: 561563310 -> 561414360 (-0.03%); split: -0.08%, +0.05% InvThroughput: 145629673 -> 145594740 (-0.02%); split: -0.04%, +0.01% fossil-db (GFX10): Totals from 16252 (11.11% of 146267) affected shaders: VGPRs: 893820 -> 893744 (-0.01%) SpillSGPRs: 11334 -> 10785 (-4.84%) CodeSize: 95890244 -> 95839124 (-0.05%); split: -0.08%, +0.02% MaxWaves: 367704 -> 367734 (+0.01%) Instrs: 18199741 -> 18194437 (-0.03%); split: -0.06%, +0.03% Latency: 560912971 -> 560854179 (-0.01%); split: -0.07%, +0.06% InvThroughput: 142899814 -> 142877939 (-0.02%); split: -0.03%, +0.02% fossil-db (GFX9): Totals from 16287 (11.12% of 146401) affected shaders: SGPRs: 1312784 -> 1312768 (-0.00%); split: -0.05%, +0.05% VGPRs: 931440 -> 931444 (+0.00%); split: -0.00%, +0.00% SpillSGPRs: 14623 -> 14597 (-0.18%) CodeSize: 94428788 -> 94344404 (-0.09%); split: -0.10%, +0.01% MaxWaves: 90105 -> 90109 (+0.00%) Instrs: 18486905 -> 18473434 (-0.07%); split: -0.08%, +0.01% Latency: 720947295 -> 720818323 (-0.02%); split: -0.07%, +0.05% InvThroughput: 365240104 -> 365224659 (-0.00%); split: -0.02%, +0.01% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	ed70b256ce	nir: add ffma creation helpers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	4ec4d862c2	nir/algebraic: add is_used_once to dot product reassociation optimization This improves register usage. fossil-db (Sienna Cichlid, on top of !9805): Totals from 4317 (2.88% of 149839) affected shaders: VGPRs: 352592 -> 351704 (-0.25%); split: -1.48%, +1.23% SpillSGPRs: 182 -> 248 (+36.26%) CodeSize: 31601192 -> 31587624 (-0.04%); split: -0.09%, +0.04% MaxWaves: 56964 -> 57298 (+0.59%); split: +2.48%, -1.90% Instrs: 5973557 -> 5974122 (+0.01%); split: -0.05%, +0.06% Latency: 72088175 -> 72253033 (+0.23%); split: -0.36%, +0.59% InvThroughput: 14978160 -> 14798919 (-1.20%); split: -1.29%, +0.09% VClause: 100994 -> 98645 (-2.33%); split: -3.05%, +0.73% SClause: 278206 -> 276820 (-0.50%); split: -0.54%, +0.04% Copies: 200264 -> 199556 (-0.35%); split: -1.17%, +0.82% Branches: 86410 -> 85930 (-0.56%); split: -0.56%, +0.01% PreSGPRs: 207355 -> 207759 (+0.19%); split: -0.00%, +0.20% PreVGPRs: 314646 -> 310911 (-1.19%); split: -1.35%, +0.17% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	f95a16be72	nir/algebraic: reassociate add chains for more MAD/FMA-friendly code fossil-db (GFX10.3): Totals from 25866 (17.68% of 146267) affected shaders: VGPRs: 1625456 -> 1644936 (+1.20%); split: -0.05%, +1.24% SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03% CodeSize: 161604460 -> 161458052 (-0.09%); split: -0.11%, +0.02% MaxWaves: 454842 -> 452160 (-0.59%); split: +0.04%, -0.63% Instrs: 30652596 -> 30456446 (-0.64%); split: -0.65%, +0.01% Latency: 723098749 -> 722084247 (-0.14%); split: -0.21%, +0.07% InvThroughput: 166023468 -> 165506875 (-0.31%); split: -0.36%, +0.05% fossil-db (GFX10): Totals from 25866 (17.68% of 146267) affected shaders: VGPRs: 1593576 -> 1611976 (+1.15%); split: -0.09%, +1.25% SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03% CodeSize: 162294468 -> 162154456 (-0.09%); split: -0.11%, +0.02% MaxWaves: 477448 -> 474166 (-0.69%); split: +0.10%, -0.79% Instrs: 30820164 -> 30625805 (-0.63%); split: -0.65%, +0.02% Latency: 723190249 -> 722273445 (-0.13%); split: -0.20%, +0.08% InvThroughput: 163114872 -> 162582966 (-0.33%); split: -0.37%, +0.04% fossil-db (GFX9): Totals from 25866 (17.67% of 146401) affected shaders: SGPRs: 2167808 -> 2169920 (+0.10%); split: -0.09%, +0.19% VGPRs: 1649404 -> 1667592 (+1.10%); split: -0.43%, +1.53% CodeSize: 161273556 -> 161281996 (+0.01%); split: -0.07%, +0.08% MaxWaves: 114910 -> 113519 (-1.21%); split: +0.10%, -1.31% Instrs: 31557180 -> 31403708 (-0.49%); split: -0.50%, +0.02% Latency: 899594793 -> 898786283 (-0.09%); split: -0.19%, +0.10% InvThroughput: 412265691 -> 411551698 (-0.17%); split: -0.28%, +0.11% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	110bcb4919	nir/algebraic: add various ffma optimizations fossil-db (GFX10.3): Totals from 7532 (5.15% of 146267) affected shaders: VGPRs: 414696 -> 414304 (-0.09%); split: -0.18%, +0.08% CodeSize: 33393444 -> 33375908 (-0.05%); split: -0.13%, +0.08% MaxWaves: 149854 -> 150094 (+0.16%); split: +0.27%, -0.11% Instrs: 6279823 -> 6271364 (-0.13%); split: -0.18%, +0.05% Latency: 60308898 -> 60296025 (-0.02%); split: -0.13%, +0.11% InvThroughput: 13770542 -> 13745192 (-0.18%); split: -0.24%, +0.06% fossil-db (GFX10): Totals from 7532 (5.15% of 146267) affected shaders: VGPRs: 406664 -> 405564 (-0.27%); split: -0.39%, +0.12% CodeSize: 33544656 -> 33527568 (-0.05%); split: -0.13%, +0.08% MaxWaves: 158584 -> 158858 (+0.17%); split: +0.30%, -0.13% Instrs: 6316242 -> 6307913 (-0.13%); split: -0.18%, +0.05% Latency: 60243290 -> 60232844 (-0.02%); split: -0.13%, +0.11% InvThroughput: 13643345 -> 13620171 (-0.17%); split: -0.24%, +0.07% fossil-db (GFX9): Totals from 7543 (5.15% of 146401) affected shaders: SGPRs: 546384 -> 547472 (+0.20%); split: -0.08%, +0.28% VGPRs: 412636 -> 411896 (-0.18%); split: -0.27%, +0.09% CodeSize: 33216196 -> 33210564 (-0.02%); split: -0.12%, +0.11% MaxWaves: 38771 -> 38789 (+0.05%); split: +0.17%, -0.12% Instrs: 6419878 -> 6414891 (-0.08%); split: -0.18%, +0.11% Latency: 70972327 -> 70922754 (-0.07%); split: -0.15%, +0.08% InvThroughput: 33949039 -> 33909258 (-0.12%); split: -0.20%, +0.08% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	82d0600ba2	nir: swap fadd operands in nir_atan() This shouldn't do anything but will make testing a later patch easier. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:44 +00:00
Eric Engestrom	4d9acfa533	python: drop explicit output_encoding='utf-8' in mako templates Python 3 handles unicode strings by default, so we can drop all that. Suggested-by: Dylan Baker <dylan@pnwbakers.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>	2021-08-14 21:44:32 +00:00
Eric Engestrom	93cb3aca03	Revert "python: Explicitly add the 'L' suffix on Python 3" This reverts commit `ad363913e6`. This code was added to be able to compare the output file while porting the script from python2 to python3, but this has long been finished and the extra complexity is not needed anymore. Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>	2021-08-14 21:44:32 +00:00
Eric Engestrom	f1eae2f8bb	python: drop python2 support Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>	2021-08-14 21:44:32 +00:00
Caio Marcelo de Oliveira Filho	0092edfec0	nir/dead_cf: Do not remove loops with loads that can't be reordered If a loop is followed by a barrier, the ordering between a load inside the loop and other memory operations after the barrier may have to be preserved depending on the type of memory involved. This is relevant when the memory is writeable by other invocations. In such case, it is not valid to completely eliminate the loop. This commit doesn't attempt to precisely catch the barrier case, as analysis could become too complex. It simply assumes it can't drop the loops that contain certain types of loads unless those are known to be safe to reorder (via the access flag). Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4475 Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938>	2021-08-14 01:48:03 +00:00
Bas Nieuwenhuizen	aa8179e33f	nir/inline_functions: Handle halting functions. Without this stitch_blocks complains about ending in a jump with a non-empty block after the inserted body. I hit this with CTS raytracing tests where we tried to inline a function that basically ended up being something like { ignore_ray_intersection halt } I kept the nop path when possible as that does not leave a mess for the optimization loop to optimize. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12163>	2021-08-13 21:18:13 +00:00
Bas Nieuwenhuizen	fa6cd6e00d	nir/lower_scratch: Ensure we don't lower vars with unsupported usage. Need to avoid lowering temps when they are used by other instructions, like the rt instructions (some of the shader call parameters get converted to temp variables and we will lower them later with the explicit io lowering pass as we need to guarantee they will end up in scratch). Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12162>	2021-08-13 20:56:30 +00:00
Rhys Perry	04bd2a1245	nir: remove src/compiler/nir/nir_control_flow Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12357>	2021-08-13 17:51:42 +01:00
Emma Anholt	673cc9323a	nir: Move phi src setup to a helper. Cleans up the ralloc/list push code all over the tree. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11772>	2021-08-13 16:11:57 +00:00
Vinson Lee	8d679f4f4e	nir: Initialize evaluate_cube_face_index_amd dst.x. Fix defect reported by Coverity Scan. Uninitialized scalar variable (UNINIT) uninit_use: Using uninitialized value dst.x. Fixes: `a1a2a8dfda` ("nir: add AMD_gcn_shader extended instructions") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12290>	2021-08-12 23:13:52 -07:00
Lionel Landwerlin	01b0935d31	nir/lower_shader_calls: remove empty phis This is confusing opt_cse. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `8dfb240b1f` ("nir: Add raytracing shader call lowering pass.") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11953>	2021-08-11 15:10:07 +03:00
Marcin Ślusarz	e1b325f587	nir/builder: invalidate metadata per function Fixes: `a62098fff2` ("nir: Add a helper for general instruction-modifying passes.") Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12324>	2021-08-11 11:23:30 +00:00
Pierre-Eric Pelloux-Prayer	7684d57a05	nir: add a pass to optimize "gl_FragDepth = gl_FragCoord.z" away gl_FragDepth default value is gl_FragCoord.z so if a shader does: gl_FragDepth = gl_FragCoord.z we can drop this assignment. v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth is wrote multiple times (Jason) v3: - move to its own pass (Jason) - handle var = NULL (Rhys) v4: refactoring (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>	2021-08-11 11:00:11 +02:00
Ian Romanick	84d2e53789	Revert "nir/algebraic: Convert some f2u to f2i" Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666, the assumption fundamental to this optimization is false. Section 2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation. The wording of the section is somewhat confusing (because it doesn't clearly delineate between signed and unsigned integers), but the last two rows of the table make it clear that F->UD conversion clamps negative float values to 0. All other hardware mentioned in that thread seems to behave the same way. The real problem is that, with hardware that behaves in this ways, converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern that would be produced from 0x80000000 to 0x7fffffff. This reverts commit `ad05920258`. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>	2021-08-10 22:16:13 +00:00
Ian Romanick	3ba66ebbc8	nir/opcodes: Use u_intN_(min\|max) uadd_sat was updated using sed, so I didn't even notice the surrounding opcodes. Oops. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>	2021-08-10 22:16:13 +00:00
Alyssa Rosenzweig	9b57a81815	nir/lower_mediump: Fix metadata in all passes Fixes: `fb29cef8dd` ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>	2021-08-10 20:55:33 +00:00
Alyssa Rosenzweig	03c18f7efc	nir/lower_mediump_io: Don't remap base unless needed Otherwise drivers that don't use 16-bit slots for varyings will get confused and have their driver_locations scribbled over. This has caused multiple problems for both Panfrost and Asahi this week. Given the only other user of the pass for varyings is radeonsi, which needs both together, I think this is the least controversial fix. Fixes: `fb29cef8dd` ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>	2021-08-10 20:55:33 +00:00
Mike Blumenkrantz	ec66c58138	nir: add imm_vec3 to round these out Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12253>	2021-08-09 14:45:30 +00:00
Rhys Perry	d764de6460	nir/tests: add tests for umod/imod/irem optimizations Both nir_opt_algebraic and nir_opt_idiv_const have optimizations for umod/imod/irem by constants. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	e008eb1224	nir: fix signed overflow for iadd constant folding Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	b627b9fcec	nir/idiv_const: optimize imod/irem fossil-db changes (Sienna Cichlid): Totals from 223 (0.15% of 150170) affected shaders: CodeSize: 384564 -> 370824 (-3.57%) Instrs: 74518 -> 71961 (-3.43%) Latency: 351620 -> 344640 (-1.99%) InvThroughput: 80122 -> 74846 (-6.58%) VClause: 919 -> 920 (+0.11%) SClause: 2879 -> 2877 (-0.07%); split: -0.10%, +0.03% Copies: 3099 -> 3103 (+0.13%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	96168301f9	nir/idiv_const: improve idiv(n, INT_MIN) This lowering is smaller and -INT64_MIN is probably UB (signed overflow). No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	4e2b94331b	nir/algebraic: improve irem by power-of-two optimization Requires one less instruction. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	2bb49e4587	nir/search: don't consider INT_MIN a negative power-of-two ineg(INT_MIN)/iabs(INT_MIN) won't work as expected. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	b009467b81	nir/algebraic: add optimizations for imul(a, INT_MIN) is_pos_power_of_two would catch this, but nir_op_imul has signed sources, so is_neg_power_of_two catches it instead, which creates a useless nir_op_ineg. fossil-db (Sienna Cichlid): Totals from 1014 (0.68% of 150170) affected shaders: CodeSize: 3592296 -> 3592288 (-0.00%); split: -0.00%, +0.00% Instrs: 671211 -> 670426 (-0.12%) Latency: 5268917 -> 5268479 (-0.01%); split: -0.01%, +0.00% InvThroughput: 2187349 -> 2187343 (-0.00%); split: -0.00%, +0.00% VClause: 8634 -> 8636 (+0.02%) Copies: 97585 -> 97604 (+0.02%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	65cd5a0f22	nir/algebraic: don't optimize umod/imod/irem if lower_bitops=true Match the udiv/idiv/imul by power-of-two optimizations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	ec4b425f59	nir/algebraic: fix imod by negative power-of-two If "a" is a multiple of "b", then the result would have been "b" instead of 0. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `0ef5f3552f` ("nir: add strength reduction pattern for imod/irem with pow2 divisor.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Dave Airlie	ad92c2b253	nir: add fisnormal lowering just lower the 32-bit version for now. Thanks to alyssa for this suggested lowering. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>	2021-08-06 14:27:48 +10:00
Dave Airlie	330e28155f	nir: add 32-bit bool of fisfinite Add the bool lowering as well. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>	2021-08-06 12:06:21 +10:00
Connor Abbott	8115cde3ba	tu, freedreno/a6xx, ir3: Rewrite tess PrimID handling The previous handling conflated RelPatchID and PrimID, which would result in incorrect gl_PrimitiveID when doing draw splitting and didn't work with PrimID passthrough which fills the VPC slot with the "correct" PrimID value from the tess factor BO which we left 0. Replace PrimID in the tess lowering pass with a new RelPatchID sysval, and relace PrimID with RelPatchID in the VS input code in turnip/freedreno at the same time so that there is no net change in the tess lowering code. However, now we have to add new mechanisms for getting the user-level PrimID: - In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS. This means we have to add another register to our VS->TCS ABI. I decided to put PrimID in r0.z, after the TCS header and RelPatchID, because it might not be read in the TCS. - If any stage after the TCS uses PrimID, the TCS stores it in the first dword of the tess factor BO, and it is read by the fixed-function tessellator and accessed in the TES via the newly-uncovered DSPRIMID field. If we have tess and GS, the TES passes this value through to the GS in the same way as the VS does. PrimID passthrough for reading it in the FS when there's tess but no GS also "just works" once we start storing it in the TCS. In particular this fixes dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>	2021-08-05 16:35:41 +00:00
Jason Ekstrand	0ddac113f8	nir: Removing uses of SSA defs destroys SSA liveness The liveness information will be a superset of real liveness so it's unlikely something will explode if it tries to use it. However, it is out-of-date and should be re-run if someone really wants it. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12186>	2021-08-03 21:36:53 +00:00
Ian Romanick	72259a870f	util: Add and use functions to calculate min and max int for a size Many places need to know the maximum or minimum possible value for a given size integer... so everyone just open-codes their favorite version. There is some potential to hit either undefined or implementation-defined behavior, so having one version that Just Works seems beneficial. v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in u_intmin. Noticed by CI. Lol. Rename functions `s/u_(uint\|int)(min\|max)/u_\1N_\2/g`. Suggested by Jason. Add some unit tests that would have caught the copy-and-paste bug before wasting CI time. Change the implementation of u_intN_min to use the same pattern as stdint.h. This avoids the integer division. Noticed by Jason. v3: Add changes to convert_clear_color (src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>	2021-08-03 12:55:02 -07:00
Timothy Arceri	6538b3e566	nir: add heuristic for instructions in loops with GCM Moving instructions out of large loops tends to cause excessive spilling. This appears to be a good limit. In future it might make sense to make this a NIR options so other drivers can set their own limits. Tiger Lake total instructions in shared programs: 20930180 -> 20926952 (-0.02%) instructions in affected programs: 280768 -> 277540 (-1.15%) helped: 734 HURT: 192 helped stats (abs) min: 1 max: 61 x̄: 5.16 x̃: 4 helped stats (rel) min: 0.04% max: 10.64% x̄: 3.23% x̃: 3.14% HURT stats (abs) min: 1 max: 52 x̄: 2.90 x̃: 1 HURT stats (rel) min: 0.03% max: 9.76% x̄: 1.13% x̃: 0.61% 95% mean confidence interval for instructions value: -3.89 -3.08 95% mean confidence interval for instructions %-change: -2.49% -2.16% Instructions are helped. total cycles in shared programs: 841825217 -> 838817552 (-0.36%) cycles in affected programs: 122088078 -> 119080413 (-2.46%) helped: 941 HURT: 100 helped stats (abs) min: 1 max: 160080 x̄: 3274.31 x̃: 2660 helped stats (rel) min: <.01% max: 41.64% x̄: 5.50% x̃: 4.80% HURT stats (abs) min: 1 max: 41856 x̄: 734.62 x̃: 26 HURT stats (rel) min: <.01% max: 7.29% x̄: 0.44% x̃: 0.27% 95% mean confidence interval for cycles value: -3236.56 -2541.85 95% mean confidence interval for cycles %-change: -5.26% -4.60% Cycles are helped. total sends in shared programs: 977905 -> 977782 (-0.01%) sends in affected programs: 2279 -> 2156 (-5.40%) helped: 119 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.60% max: 14.29% x̄: 6.93% x̃: 6.67% 95% mean confidence interval for sends value: -1.09 -0.98 95% mean confidence interval for sends %-change: -7.42% -6.45% Sends are helped. LOST: 2 GAINED: 0 Ice Lake total instructions in shared programs: 19865361 -> 19861747 (-0.02%) instructions in affected programs: 185789 -> 182175 (-1.95%) helped: 593 HURT: 47 helped stats (abs) min: 1 max: 27 x̄: 6.17 x̃: 4 helped stats (rel) min: 0.19% max: 8.65% x̄: 4.53% x̃: 4.60% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.03% max: 0.23% x̄: 0.11% x̃: 0.04% 95% mean confidence interval for instructions value: -5.93 -5.37 95% mean confidence interval for instructions %-change: -4.32% -4.06% Instructions are helped. total loops in shared programs: 6120 -> 6117 (-0.05%) loops in affected programs: 6 -> 3 (-50.00%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% total cycles in shared programs: 961777176 -> 959404350 (-0.25%) cycles in affected programs: 172224180 -> 169851354 (-1.38%) helped: 936 HURT: 80 helped stats (abs) min: 1 max: 9566 x̄: 2621.08 x̃: 2550 helped stats (rel) min: <.01% max: 41.77% x̄: 4.22% x̃: 3.84% HURT stats (abs) min: 1 max: 59146 x̄: 1006.34 x̃: 24 HURT stats (rel) min: <.01% max: 3.78% x̄: 0.44% x̃: 0.25% 95% mean confidence interval for cycles value: -2513.72 -2157.20 95% mean confidence interval for cycles %-change: -4.13% -3.57% Cycles are helped. total sends in shared programs: 1019995 -> 1019872 (-0.01%) sends in affected programs: 2283 -> 2160 (-5.39%) helped: 119 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.60% max: 14.29% x̄: 6.91% x̃: 6.67% 95% mean confidence interval for sends value: -1.09 -0.98 95% mean confidence interval for sends %-change: -7.39% -6.42% Sends are helped. LOST: 4 GAINED: 0 Skylake total instructions in shared programs: 17994337 -> 17993846 (<.01%) instructions in affected programs: 146294 -> 145803 (-0.34%) helped: 190 HURT: 47 helped stats (abs) min: 1 max: 12 x̄: 2.83 x̃: 3 helped stats (rel) min: 0.14% max: 4.29% x̄: 1.08% x̃: 0.90% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.03% max: 0.22% x̄: 0.11% x̃: 0.04% 95% mean confidence interval for instructions value: -2.30 -1.84 95% mean confidence interval for instructions %-change: -0.95% -0.74% Instructions are helped. total loops in shared programs: 6029 -> 6023 (-0.10%) loops in affected programs: 12 -> 6 (-50.00%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -50.00% -50.00% Loops are helped. total cycles in shared programs: 939062940 -> 938023548 (-0.11%) cycles in affected programs: 169671482 -> 168632090 (-0.61%) helped: 980 HURT: 134 helped stats (abs) min: 1 max: 25000 x̄: 1075.57 x̃: 1052 helped stats (rel) min: <.01% max: 42.75% x̄: 2.51% x̃: 1.32% HURT stats (abs) min: 1 max: 837 x̄: 109.45 x̃: 20 HURT stats (rel) min: <.01% max: 5.71% x̄: 0.73% x̃: 0.21% 95% mean confidence interval for cycles value: -1005.89 -860.17 95% mean confidence interval for cycles %-change: -2.39% -1.84% Cycles are helped. total sends in shared programs: 1026848 -> 1026724 (-0.01%) sends in affected programs: 2302 -> 2178 (-5.39%) helped: 120 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.60% max: 14.29% x̄: 6.91% x̃: 6.67% 95% mean confidence interval for sends value: -1.09 -0.98 95% mean confidence interval for sends %-change: -7.40% -6.43% Sends are helped. LOST: 1 GAINED: 1 Broadwell total instructions in shared programs: 17605621 -> 17605154 (<.01%) instructions in affected programs: 145691 -> 145224 (-0.32%) helped: 184 HURT: 48 helped stats (abs) min: 1 max: 12 x̄: 2.83 x̃: 3 helped stats (rel) min: 0.13% max: 4.29% x̄: 1.09% x̃: 0.93% HURT stats (abs) min: 1 max: 7 x̄: 1.12 x̃: 1 HURT stats (rel) min: 0.03% max: 0.48% x̄: 0.12% x̃: 0.04% 95% mean confidence interval for instructions value: -2.26 -1.77 95% mean confidence interval for instructions %-change: -0.95% -0.73% Instructions are helped. total loops in shared programs: 5968 -> 5963 (-0.08%) loops in affected programs: 10 -> 5 (-50.00%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -50.00% -50.00% Loops are helped. total cycles in shared programs: 1000679489 -> 998592756 (-0.21%) cycles in affected programs: 173421234 -> 171334501 (-1.20%) helped: 993 HURT: 153 helped stats (abs) min: 1 max: 766608 x̄: 2118.49 x̃: 1080 helped stats (rel) min: <.01% max: 54.61% x̄: 2.61% x̃: 1.73% HURT stats (abs) min: 1 max: 2200 x̄: 110.61 x̃: 11 HURT stats (rel) min: <.01% max: 5.68% x̄: 0.63% x̃: 0.06% 95% mean confidence interval for cycles value: -3191.23 -450.54 95% mean confidence interval for cycles %-change: -2.47% -1.89% Cycles are helped. total sends in shared programs: 996341 -> 996222 (-0.01%) sends in affected programs: 2151 -> 2032 (-5.53%) helped: 115 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.60% max: 14.29% x̄: 7.07% x̃: 6.67% 95% mean confidence interval for sends value: -1.09 -0.98 95% mean confidence interval for sends %-change: -7.55% -6.58% Sends are helped. Haswell total instructions in shared programs: 16038375 -> 16038121 (<.01%) instructions in affected programs: 216797 -> 216543 (-0.12%) helped: 185 HURT: 217 helped stats (abs) min: 1 max: 12 x̄: 2.84 x̃: 3 helped stats (rel) min: 0.13% max: 4.23% x̄: 1.30% x̃: 1.20% HURT stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.03% max: 5.66% x̄: 0.61% x̃: 0.40% 95% mean confidence interval for instructions value: -0.85 -0.41 95% mean confidence interval for instructions %-change: -0.40% -0.14% Instructions are helped. total loops in shared programs: 5947 -> 5942 (-0.08%) loops in affected programs: 10 -> 5 (-50.00%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -50.00% -50.00% Loops are helped. total cycles in shared programs: 967655093 -> 965746713 (-0.20%) cycles in affected programs: 197288924 -> 195380544 (-0.97%) helped: 950 HURT: 195 helped stats (abs) min: 1 max: 782820 x̄: 2274.79 x̃: 1260 helped stats (rel) min: <.01% max: 54.26% x̄: 3.02% x̃: 1.71% HURT stats (abs) min: 1 max: 15790 x̄: 1295.73 x̃: 21 HURT stats (rel) min: <.01% max: 119.85% x̄: 7.76% x̃: 0.11% 95% mean confidence interval for cycles value: -3014.22 -319.19 95% mean confidence interval for cycles %-change: -1.83% -0.55% Cycles are helped. total sends in shared programs: 934894 -> 934765 (-0.01%) sends in affected programs: 2192 -> 2063 (-5.89%) helped: 115 HURT: 2 helped stats (abs) min: 1 max: 4 x̄: 1.14 x̃: 1 helped stats (rel) min: 0.60% max: 28.57% x̄: 7.68% x̃: 6.67% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67% 95% mean confidence interval for sends value: -1.23 -0.98 95% mean confidence interval for sends %-change: -8.28% -6.24% Sends are helped. LOST: 1 GAINED: 18 Ivy Bridge total instructions in shared programs: 15269357 -> 15269398 (<.01%) instructions in affected programs: 190484 -> 190525 (0.02%) helped: 77 HURT: 206 helped stats (abs) min: 1 max: 6 x̄: 2.47 x̃: 3 helped stats (rel) min: 0.14% max: 5.31% x̄: 1.46% x̃: 1.65% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 0.03% max: 2.38% x̄: 0.42% x̃: 0.40% 95% mean confidence interval for instructions value: -0.06 0.35 95% mean confidence interval for instructions %-change: -0.21% 0.03% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4001 -> 3996 (-0.12%) loops in affected programs: 10 -> 5 (-50.00%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -50.00% -50.00% Loops are helped. total cycles in shared programs: 562045564 -> 561063543 (-0.17%) cycles in affected programs: 200924872 -> 199942851 (-0.49%) helped: 748 HURT: 160 helped stats (abs) min: 2 max: 14926 x̄: 1692.94 x̃: 1620 helped stats (rel) min: <.01% max: 53.29% x̄: 3.17% x̃: 1.87% HURT stats (abs) min: 2 max: 15726 x̄: 1776.86 x̃: 36 HURT stats (rel) min: <.01% max: 114.43% x̄: 10.66% x̃: 0.21% 95% mean confidence interval for cycles value: -1237.33 -925.71 95% mean confidence interval for cycles %-change: -1.54% 0.08% Inconclusive result (%-change mean confidence interval includes 0). total sends in shared programs: 893348 -> 893330 (<.01%) sends in affected programs: 187 -> 169 (-9.63%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1 helped stats (rel) min: 4.08% max: 22.22% x̄: 11.70% x̃: 10.10% 95% mean confidence interval for sends value: -1.56 -1.02 95% mean confidence interval for sends %-change: -14.92% -8.48% Sends are helped. LOST: 1 GAINED: 19 Sandy Bridge total instructions in shared programs: 11785227 -> 11785774 (<.01%) instructions in affected programs: 78403 -> 78950 (0.70%) helped: 65 HURT: 505 helped stats (abs) min: 1 max: 4 x̄: 2.22 x̃: 3 helped stats (rel) min: 0.14% max: 4.17% x̄: 1.19% x̃: 1.38% HURT stats (abs) min: 1 max: 5 x̄: 1.37 x̃: 1 HURT stats (rel) min: 0.24% max: 3.33% x̄: 1.57% x̃: 1.72% 95% mean confidence interval for instructions value: 0.85 1.07 95% mean confidence interval for instructions %-change: 1.16% 1.36% Instructions are HURT. total loops in shared programs: 2441 -> 2437 (-0.16%) loops in affected programs: 8 -> 4 (-50.00%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -50.00% -50.00% Loops are helped. total cycles in shared programs: 497178796 -> 496669298 (-0.10%) cycles in affected programs: 51483322 -> 50973824 (-0.99%) helped: 476 HURT: 137 helped stats (abs) min: 2 max: 7502 x̄: 1079.36 x̃: 1260 helped stats (rel) min: <.01% max: 42.50% x̄: 2.31% x̃: 0.86% HURT stats (abs) min: 2 max: 754 x̄: 31.23 x̃: 18 HURT stats (rel) min: <.01% max: 3.01% x̄: 0.09% x̃: 0.02% 95% mean confidence interval for cycles value: -901.99 -760.32 95% mean confidence interval for cycles %-change: -2.20% -1.36% Cycles are helped. total sends in shared programs: 642919 -> 642915 (<.01%) sends in affected programs: 32 -> 28 (-12.50%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 11.11% max: 14.29% x̄: 12.70% x̃: 12.70% 95% mean confidence interval for sends value: -1.00 -1.00 95% mean confidence interval for sends %-change: -15.61% -9.78% Sends are helped. Iron Lake total instructions in shared programs: 8180061 -> 8180248 (<.01%) instructions in affected programs: 65004 -> 65191 (0.29%) helped: 59 HURT: 253 helped stats (abs) min: 1 max: 4 x̄: 2.24 x̃: 3 helped stats (rel) min: 0.16% max: 2.23% x̄: 1.04% x̃: 1.29% HURT stats (abs) min: 1 max: 5 x̄: 1.26 x̃: 1 HURT stats (rel) min: 0.21% max: 3.85% x̄: 0.93% x̃: 0.60% 95% mean confidence interval for instructions value: 0.43 0.77 95% mean confidence interval for instructions %-change: 0.45% 0.68% Instructions are HURT. total loops in shared programs: 863 -> 861 (-0.23%) loops in affected programs: 4 -> 2 (-50.00%) helped: 2 HURT: 0 total cycles in shared programs: 239357490 -> 238907668 (-0.19%) cycles in affected programs: 17314006 -> 16864184 (-2.60%) helped: 176 HURT: 34 helped stats (abs) min: 4 max: 13400 x̄: 2558.05 x̃: 2920 helped stats (rel) min: 0.01% max: 35.58% x̄: 3.76% x̃: 2.69% HURT stats (abs) min: 2 max: 14 x̄: 11.59 x̃: 14 HURT stats (rel) min: <.01% max: 0.06% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -2440.68 -1843.34 95% mean confidence interval for cycles %-change: -3.78% -2.51% Cycles are helped. GM45 total instructions in shared programs: 4985293 -> 4985401 (<.01%) instructions in affected programs: 58807 -> 58915 (0.18%) helped: 57 HURT: 202 helped stats (abs) min: 1 max: 4 x̄: 2.26 x̃: 3 helped stats (rel) min: 0.15% max: 2.23% x̄: 1.06% x̃: 1.29% HURT stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1 HURT stats (rel) min: 0.21% max: 3.85% x̄: 0.76% x̃: 0.48% 95% mean confidence interval for instructions value: 0.22 0.61 95% mean confidence interval for instructions %-change: 0.24% 0.48% Instructions are HURT. total loops in shared programs: 639 -> 638 (-0.16%) loops in affected programs: 2 -> 1 (-50.00%) helped: 1 HURT: 0 total cycles in shared programs: 153794236 -> 153546274 (-0.16%) cycles in affected programs: 9947778 -> 9699816 (-2.49%) helped: 110 HURT: 31 helped stats (abs) min: 4 max: 13400 x̄: 2257.51 x̃: 1796 helped stats (rel) min: 0.01% max: 35.58% x̄: 4.33% x̃: 2.45% HURT stats (abs) min: 2 max: 14 x̄: 11.74 x̃: 14 HURT stats (rel) min: <.01% max: 0.06% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -2113.77 -1403.42 95% mean confidence interval for cycles %-change: -4.27% -2.47% Cycles are helped. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2899 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>	2021-08-03 10:54:50 +00:00
Timothy Arceri	a7f2e683de	nir: move nir_block_ends_in_break() to nir.h Will be used in a following commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>	2021-08-03 10:54:50 +00:00
Timothy Arceri	a9ed4538ab	nir: add indirect loop unrolling to compiler options This is where it should be rather than having to pass it into the optimisation pass every time. It also allows us to call the loop analysis pass without having to duplicate these options which we will do later in this series. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>	2021-08-03 10:54:50 +00:00
Timur Kristóf	da9f4b2e67	nir, aco: Remove vertex and primitive count overwrite intrinsic. It's no longer needed. No Fossil DB changes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>	2021-08-02 11:38:25 +00:00
Timur Kristóf	1bbea90f50	aco, nir, ac: Simplify sequence of getting initial NGG VS edge flags. Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags at once of every vertex. This takes fewer VALU instructions than previously. Fossil DB results on Sienna Cichlid (with NGGC on): Totals from 56917 (44.24% of 128647) affected shaders: CodeSize: 161028288 -> 158751628 (-1.41%) Instrs: 30917985 -> 30519571 (-1.29%) Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01% InvThroughput: 21280238 -> 20927401 (-1.66%) Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00% No Fossil DB changed with NGGC off. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>	2021-08-02 11:38:25 +00:00
Emma Anholt	9ffd00bcf1	nir_to_tgsi: Pack our tex coords into vec4 nir_tex_src_backend[12]. For TGSI, we need the coordinate, comparator, bias, and LOD all together in the first two vec4 args, and by doing it in the backend we were generating extra MOVs. softpipe shader-db results: total instructions in shared programs: 2985416 -> 2953625 (-1.06%) instructions in affected programs: 499937 -> 468146 (-6.36%) total temps in shared programs: 544769 -> 565869 (3.87%) temps in affected programs: 105469 -> 126569 (20.01%) i915g shader-db: total instructions in shared programs: 371625 -> 369594 (-0.55%) instructions in affected programs: 24903 -> 22872 (-8.16%) total tex_indirect in shared programs: 11381 -> 11365 (-0.14%) tex_indirect in affected programs: 43 -> 27 (-37.21%) LOST: 7 GAINED: 16 The temps increase is the pre-existing issue that we never release temps for NIR regs, which doesn't matter much for softpipe (just memory/cache footprint) but does for i915g as seen by shaders that no longer compile (though overall we seem to win). Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11912>	2021-07-29 09:05:05 -07:00
Enrico Galli	16ef26ffcb	nir_lower_readonly_images_to_tex: Fix typeo on image arrays Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12119>	2021-07-29 01:44:45 +00:00
Lionel Landwerlin	7e3bad0f8e	nir/lower_shader_calls: adding missing stack offset alignment Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `8dfb240b1f` ("nir: Add raytracing shader call lowering pass.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12112>	2021-07-28 23:04:21 +00:00
Daniel Schürmann	bc500da67d	nir/shrink_vectors: shrink vecN properly This patch allows to shrink vecN instructions where one or more components at any position are unused. Stat changes for softpipe: total instructions in shared programs: 2986101 -> 2985416 (-0.02%) instructions in affected programs: 51216 -> 50531 (-1.34%) Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>	2021-07-26 09:24:37 +00:00
Daniel Schürmann	36fe7398c0	nir/shrink_vectors: shrink ALU properly ALU instructions of which not all components are read, can be shrunk to the number of read components. Previously, this would only remove trailing components. This patch enables to remove components from any position. Stat changes for softpipe: total instructions in shared programs: 3001291 -> 2984698 (-0.55%) instructions in affected programs: 225585 -> 208992 (-7.36%) total loops in shared programs: 1389 -> 1358 (-2.23%) loops in affected programs: 36 -> 5 (-86.11%) Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>	2021-07-26 09:24:37 +00:00
Daniel Schürmann	8317fe314c	nir/opt_shrink_vectors: reverse iteration order This pass should be backwards in order to reach the fixed point in linear time. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>	2021-07-26 09:24:37 +00:00
Daniel Schürmann	d27417b597	nir: consider write_mask in nir_ssa_def_components_read() Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>	2021-07-26 09:24:37 +00:00
Daniel Schürmann	73905c4d01	nir/opt_shrink_vectors: don't shrink vectors used by intrinsics Store intrinsics shrink the sources by creating a new vecN. Other intrinsics cannot shrink their sources. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>	2021-07-26 09:24:37 +00:00
Daniel Schürmann	ece99eb69f	nir/lower_alu_to_scalar: don't skip gaps in write_mask Otherwise, this may lead to segmentation faults. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>	2021-07-26 09:24:37 +00:00
Jason Ekstrand	1431f6c765	nir: Validate newly documented texture restrictions Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Acked-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>	2021-07-23 15:53:57 +00:00
Mike Blumenkrantz	499cc7a9ec	nir/validate: refactor validate_assert to have a return value Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>	2021-07-23 15:53:57 +00:00
Jason Ekstrand	74ec2b12be	nir/lower_tex: Rework invalid implicit LOD lowering Only fragment and some compute shaders support implicit derivatives. They're totally meaningless without helper invocations and some understanding of the dispatch pattern. We've got code to lower nir_texop_tex in these shader stages to use an explicit derivative of 0 but it was pretty badly broken: 1. It only handled nir_texop_tex, not nir_texop_txb or nir_texop_lod. 2. It didn't take min_lod into account 3. It was conflated with adding a missing LOD parameter to opcodes which expect one such as nir_texop_txf. While not really a bug, this does make it way harder to reason about the code. 4. Unless you set a flag (which most drivers don't), it left the opcode nir_texop_tex instead of nir_texop_txl which it should have been. This reworks it to go through roughly the same path as other LOD lowering only with a constant lod of 0 instead of calling out to nir_texop_lod. We also get rid of the lower_tex_without_implicit_lod flag because most drivers set it and those that don't are probably subtly broken. If someone really wants to get nir_texop_tex in their vertex shaders, they can write a new patch to add the flag back in. Fixes: `e382890e25` "nir: set default lod to texture opcodes that..." Fixes: `d5ac5d6e83` "nir: Add option to lower tex to txl when..." Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>	2021-07-23 15:53:57 +00:00
Jason Ekstrand	fa717a202c	docs,nir: Document NIR texture instructions Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>	2021-07-23 15:53:57 +00:00
Jason Ekstrand	4465ca296d	nir: Suffix all the MCS texture stuff _intel It's intel-specific, used to get at MSAA compression information. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>	2021-07-23 15:53:57 +00:00
Jason Ekstrand	60b5faf572	nir/lower_tex: Add a lower_txs_cube_array option Several bits of hardware require the division by 6 to happen in the shader. May as well have common lowering for it. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>	2021-07-22 14:22:35 -05:00
Jason Ekstrand	c6102dda0a	nir/lower_image: Handle index and bindless image_size Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>	2021-07-22 14:22:35 -05:00

... 4 5 6 7 8 ...

3759 Commits