KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Ian Romanick	317a88b920	nir/algebraic: Additional D3D Boolean optimization I observed this pattern in several shaders in Hand of Fate 2 while investigating bugzilla #111490. This also led to the related bugzilla #111578. The shaders from HoF2 are not in shader-db. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Skylake and Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 16222621 -> 16205419 (-0.11%) instructions in affected programs: 798418 -> 781216 (-2.15%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35 helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09% 95% mean confidence interval for instructions value: -33.22 -29.56 95% mean confidence interval for instructions %-change: -3.11% -2.56% Instructions are helped. total cycles in shared programs: 364676209 -> 363345763 (-0.36%) cycles in affected programs: 112810504 -> 111480058 (-1.18%) helped: 546 HURT: 7 helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340 helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08% HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43 HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35% 95% mean confidence interval for cycles value: -2884.33 -1927.41 95% mean confidence interval for cycles %-change: -1.59% -1.21% Cycles are helped. total spills in shared programs: 8870 -> 8514 (-4.01%) spills in affected programs: 1230 -> 874 (-28.94%) helped: 161 HURT: 0 total fills in shared programs: 21901 -> 21348 (-2.52%) fills in affected programs: 2120 -> 1567 (-26.08%) helped: 155 HURT: 5 Broadwell and Haswell had similar results. (Broadwell shown) total instructions in shared programs: 14994910 -> 14975495 (-0.13%) instructions in affected programs: 839033 -> 819618 (-2.31%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49 helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22% 95% mean confidence interval for instructions value: -37.46 -33.40 95% mean confidence interval for instructions %-change: -3.12% -2.70% Instructions are helped. total cycles in shared programs: 386032453 -> 384450722 (-0.41%) cycles in affected programs: 117807357 -> 116225626 (-1.34%) helped: 547 HURT: 6 helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926 helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31% HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29 HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65% 95% mean confidence interval for cycles value: -3060.28 -2660.27 95% mean confidence interval for cycles %-change: -1.59% -1.37% Cycles are helped. total spills in shared programs: 23372 -> 21869 (-6.43%) spills in affected programs: 11730 -> 10227 (-12.81%) helped: 352 HURT: 0 total fills in shared programs: 34747 -> 35351 (1.74%) fills in affected programs: 11013 -> 11617 (5.48%) helped: 3 HURT: 347 Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956420 -> 11956126 (<.01%) instructions in affected programs: 14898 -> 14604 (-1.97%) helped: 98 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -2.18% -1.98% Instructions are helped. total cycles in shared programs: 178791217 -> 178790792 (<.01%) cycles in affected programs: 149763 -> 149338 (-0.28%) helped: 91 HURT: 7 helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16 helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18% HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322 HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41% 95% mean confidence interval for cycles value: -18.94 10.27 95% mean confidence interval for cycles %-change: -1.28% 0.49% Inconclusive result (value mean confidence interval includes 0).	2019-09-19 14:22:22 -07:00
Ian Romanick	92f70df8c3	nir/algebraic: Do not apply late DPH optimization in vertex processing stages Some shaders do not use 'invariant' in vertex and (possibly) geometry shader stages on some outputs that are intended to be invariant. For various reasons, this optimization may not be fully applied in all shaders used for different rendering passes of the same geometry. This can result in Z-fighting artifacts (at best). For now, disable this optimization in these stages. In tessellation stages applications seem to use 'precise' when necessary, so allow the optimization in those stages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490 Fixes: `09705747d7` ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16194726 -> 16344745 (0.93%) instructions in affected programs: 2855172 -> 3005191 (5.25%) helped: 6 HURT: 20279 helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44% HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7 HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56% 95% mean confidence interval for instructions value: 7.34 7.45 95% mean confidence interval for instructions %-change: 8.48% 8.67% Instructions are HURT. total cycles in shared programs: 364471296 -> 365014683 (0.15%) cycles in affected programs: 32421530 -> 32964917 (1.68%) helped: 2925 HURT: 16144 helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5 helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15% HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15 HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87% 95% mean confidence interval for cycles value: 21.58 35.41 95% mean confidence interval for cycles %-change: 4.36% 4.52% Cycles are HURT.	2019-09-19 14:21:31 -07:00
Andres Gomez	3f782cdd25	nir/algebraic: mark float optimizations returning one parameter as inexact With the arrival of VK_KHR_shader_float_controls algebraic optimizations for float types of the form (('fop', a, b), a) become inexact depending on the execution mode. For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case of a denorm value for the "a" parameter, we cannot return it still as a denorm, it needs to be flushed to zero. Therefore, we mark now all those operations as inexact. Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-17 23:39:18 +03:00
Ian Romanick	e07248d2a8	nir/algebraic: Clean up value range analysis-based optimizations Fix the a / b ordering in some compares. Delete duplicate patterns. Add a table explaining things. While I was cleaning this up, I managed to confuse myself. The table helped sort that out. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-29 13:15:52 -07:00
Ian Romanick	ccb236d1bc	nir/algebraic: Mark some value range analysis-based optimizations imprecise This didn't fix bug #111308, but it was found will trying to find the actual cause of that bug. Fixes piglit tests (new in piglit!110): - fs-fract-of-NaN.shader_test - fs-lt-nan-tautology.shader_test - fs-ge-nan-tautology.shader_test No shader-db changes on any Intel platform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: `b77070e293` ("nir/algebraic: Use value range analysis to eliminate tautological compares") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-29 13:15:52 -07:00
Ian Romanick	d3fd1c761a	nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is enabled This caused a problem on Sandybridge where an open-coded bitfieldReverse() function could be optimized to a nir_op_bitfield_reverse that would generate an unsupported BFREV instruction in the backend. This was encountered in some Unreal4 tech demos in shader-db. The bug was not previously noticed because we don't actually try to run those demos on Sandybridge. The fixes tag is a bit a lie. The actual bug was introduced about 26,000 commits earlier in `371c4b3c48` ("nir: Recognize open-coded bitfield_reverse."). Without the NIR lowering pass, the flag needed to avoid the optimization does not exist. Hopefully nobody will care to fix this on an earlier Mesa release. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `7afa26d4e3` ("nir: Add lowering for nir_op_bitfield_reverse.")	2019-08-28 11:38:51 -07:00
Daniel Schürmann	7fa1740035	nir/algebraic: some subtraction optimizations Changes with RADV/ACO: Totals from affected shaders: SGPRS: 444087 -> 455543 (2.58 %) VGPRS: 436468 -> 436768 (0.07 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13448928 -> 13353520 (-0.71 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 68060 -> 67979 (-0.12 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-21 08:51:49 +00:00
Rhys Perry	0a790c3019	nir/algebraic: add a few masking-before-unpack optimizations Helps some Dawn of War 3 and F1 2017 shaders with ACO: Totals from affected shaders: SGPRS: 2136 -> 2128 (-0.37 %) VGPRS: 1624 -> 1628 (0.25 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 168068 -> 164332 (-2.22 %) bytes LDS: 44 -> 44 (0.00 %) blocks Max Waves: 222 -> 221 (-0.45 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-16 12:13:01 +01:00
Ian Romanick	0e6581b87d	nir/algebraic: Reassociate shift-by-constant of shift-by-constant v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-14 11:15:37 -07:00
Ian Romanick	73aaeac0a3	nir/algebraic: Reassociate add-and-shift to be shift-and-add A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 -> 16250704 (-0.17%) instructions in affected programs: 1440284 -> 1413230 (-1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: -5.67 -5.31 95% mean confidence interval for instructions %-change: -2.26% -2.16% Instructions are helped. total cycles in shared programs: 367118526 -> 365895358 (-0.33%) cycles in affected programs: 93504145 -> 92280977 (-1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: -387.31 -220.78 95% mean confidence interval for cycles %-change: -2.11% -1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-14 11:15:32 -07:00
Ian Romanick	5544b2cbbd	nir/algebraic: Use value range analysis to eliminate useless unary ops Sandy Bridge is the big winner because it lies at something of a crossroads. It supports a fairly high OpenGL version, and it still has the old style math box. The high OpenGL version means a lot more shaders can run on it. The old style math box means extra moves are necessary to resolve source modifiers on operands to complex math instructions like COS, SQRT, and RCP. v2: Remove a couple patterns that are now redundant. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16282006 -> 16278207 (-0.02%) instructions in affected programs: 174555 -> 170756 (-2.18%) helped: 661 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 5.75 x̃: 3 helped stats (rel) min: 0.06% max: 23.68% x̄: 2.81% x̃: 1.94% 95% mean confidence interval for instructions value: -6.16 -5.34 95% mean confidence interval for instructions %-change: -3.02% -2.60% Instructions are helped. total cycles in shared programs: 367168597 -> 367134284 (<.01%) cycles in affected programs: 1105276 -> 1070963 (-3.10%) helped: 460 HURT: 150 helped stats (abs) min: 1 max: 568 x̄: 96.60 x̃: 82 helped stats (rel) min: 0.02% max: 32.50% x̄: 7.99% x̃: 4.27% HURT stats (abs) min: 1 max: 901 x̄: 67.49 x̃: 39 HURT stats (rel) min: 0.07% max: 20.00% x̄: 4.90% x̃: 4.22% 95% mean confidence interval for cycles value: -65.68 -46.82 95% mean confidence interval for cycles %-change: -5.59% -4.05% Cycles are helped. Sandy Bridge total instructions in shared programs: 10824272 -> 10802557 (-0.20%) instructions in affected programs: 1237988 -> 1216273 (-1.75%) helped: 8199 HURT: 0 helped stats (abs) min: 1 max: 41 x̄: 2.65 x̃: 2 helped stats (rel) min: 0.12% max: 20.00% x̄: 2.04% x̃: 1.73% 95% mean confidence interval for instructions value: -2.70 -2.59 95% mean confidence interval for instructions %-change: -2.07% -2.00% Instructions are helped. total cycles in shared programs: 154009894 -> 153843598 (-0.11%) cycles in affected programs: 10650486 -> 10484190 (-1.56%) helped: 4973 HURT: 1533 helped stats (abs) min: 1 max: 3904 x̄: 40.20 x̃: 20 helped stats (rel) min: 0.02% max: 41.72% x̄: 2.63% x̃: 1.67% HURT stats (abs) min: 1 max: 453 x̄: 21.94 x̃: 8 HURT stats (rel) min: 0.02% max: 41.91% x̄: 1.54% x̃: 0.58% 95% mean confidence interval for cycles value: -28.02 -23.10 95% mean confidence interval for cycles %-change: -1.74% -1.56% Cycles are helped. LOST: 0 GAINED: 2 GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8135196 -> 8134888 (<.01%) instructions in affected programs: 31920 -> 31612 (-0.96%) helped: 169 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 1.82 x̃: 2 helped stats (rel) min: 0.43% max: 3.23% x̄: 1.23% x̃: 1.16% 95% mean confidence interval for instructions value: -2.01 -1.64 95% mean confidence interval for instructions %-change: -1.32% -1.15% Instructions are helped. total cycles in shared programs: 188575724 -> 188574092 (<.01%) cycles in affected programs: 406840 -> 405208 (-0.40%) helped: 169 HURT: 0 helped stats (abs) min: 4 max: 72 x̄: 9.66 x̃: 10 helped stats (rel) min: 0.07% max: 2.16% x̄: 0.57% x̃: 0.47% 95% mean confidence interval for cycles value: -10.72 -8.59 95% mean confidence interval for cycles %-change: -0.63% -0.50% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:14 -07:00
Ian Romanick	8d14380971	nir/algebraic: Use value range analysis to convert fmin to fsat All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16297320 -> 16282006 (-0.09%) instructions in affected programs: 2434498 -> 2419184 (-0.63%) helped: 8091 HURT: 1 helped stats (abs) min: 1 max: 51 x̄: 1.89 x̃: 2 helped stats (rel) min: 0.04% max: 14.29% x̄: 0.98% x̃: 0.95% HURT stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 HURT stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for instructions value: -1.94 -1.85 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 367221624 -> 367168597 (-0.01%) cycles in affected programs: 126409635 -> 126356608 (-0.04%) helped: 5612 HURT: 1023 helped stats (abs) min: 1 max: 2332 x̄: 31.11 x̃: 16 helped stats (rel) min: <.01% max: 30.31% x̄: 1.69% x̃: 1.42% HURT stats (abs) min: 1 max: 2372 x̄: 118.84 x̃: 16 HURT stats (rel) min: <.01% max: 46.98% x̄: 1.46% x̃: 0.35% 95% mean confidence interval for cycles value: -11.52 -4.46 95% mean confidence interval for cycles %-change: -1.26% -1.14% Cycles are helped. total spills in shared programs: 8868 -> 8870 (0.02%) spills in affected programs: 28 -> 30 (7.14%) helped: 0 HURT: 1 total fills in shared programs: 21903 -> 21904 (<.01%) fills in affected programs: 42 -> 43 (2.38%) helped: 0 HURT: 1 Haswell total instructions in shared programs: 13353925 -> 13338728 (-0.11%) instructions in affected programs: 2265850 -> 2250653 (-0.67%) helped: 8127 HURT: 5 helped stats (abs) min: 1 max: 51 x̄: 1.88 x̃: 2 helped stats (rel) min: 0.04% max: 20.00% x̄: 1.13% x̃: 1.07% HURT stats (abs) min: 5 max: 16 x̄: 9.00 x̃: 6 HURT stats (rel) min: 0.19% max: 0.52% x̄: 0.35% x̃: 0.28% 95% mean confidence interval for instructions value: -1.91 -1.83 95% mean confidence interval for instructions %-change: -1.15% -1.11% Instructions are helped. total cycles in shared programs: 375535444 -> 375536343 (<.01%) cycles in affected programs: 131206582 -> 131207481 (<.01%) helped: 5590 HURT: 1055 helped stats (abs) min: 1 max: 2844 x̄: 34.15 x̃: 16 helped stats (rel) min: <.01% max: 21.57% x̄: 2.08% x̃: 1.60% HURT stats (abs) min: 1 max: 2487 x̄: 181.78 x̃: 21 HURT stats (rel) min: <.01% max: 40.66% x̄: 1.96% x̃: 0.37% 95% mean confidence interval for cycles value: -4.74 5.01 95% mean confidence interval for cycles %-change: -1.51% -1.37% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23401 -> 23407 (0.03%) spills in affected programs: 248 -> 254 (2.42%) helped: 2 HURT: 5 total fills in shared programs: 34850 -> 34845 (-0.01%) fills in affected programs: 383 -> 378 (-1.31%) helped: 2 HURT: 5 Ivy Bridge total instructions in shared programs: 11975423 -> 11968117 (-0.06%) instructions in affected programs: 845703 -> 838397 (-0.86%) helped: 4071 HURT: 0 helped stats (abs) min: 1 max: 51 x̄: 1.79 x̃: 1 helped stats (rel) min: 0.08% max: 8.21% x̄: 1.04% x̃: 0.93% 95% mean confidence interval for instructions value: -1.87 -1.71 95% mean confidence interval for instructions %-change: -1.06% -1.02% Instructions are helped. total cycles in shared programs: 179674318 -> 179635552 (-0.02%) cycles in affected programs: 5100065 -> 5061299 (-0.76%) helped: 2650 HURT: 611 helped stats (abs) min: 1 max: 900 x̄: 21.85 x̃: 16 helped stats (rel) min: <.01% max: 21.55% x̄: 2.39% x̃: 1.40% HURT stats (abs) min: 1 max: 1841 x̄: 31.33 x̃: 6 HURT stats (rel) min: <.01% max: 58.71% x̄: 1.64% x̃: 0.37% 95% mean confidence interval for cycles value: -14.14 -9.64 95% mean confidence interval for cycles %-change: -1.75% -1.52% Cycles are helped. LOST: 3 GAINED: 7 Sandy Bridge total instructions in shared programs: 10828844 -> 10824272 (-0.04%) instructions in affected programs: 525678 -> 521106 (-0.87%) helped: 2386 HURT: 0 helped stats (abs) min: 1 max: 51 x̄: 1.92 x̃: 2 helped stats (rel) min: 0.11% max: 7.96% x̄: 1.05% x̃: 0.94% 95% mean confidence interval for instructions value: -2.04 -1.80 95% mean confidence interval for instructions %-change: -1.08% -1.03% Instructions are helped. total cycles in shared programs: 154024591 -> 154009894 (<.01%) cycles in affected programs: 4005766 -> 3991069 (-0.37%) helped: 1245 HURT: 506 helped stats (abs) min: 1 max: 585 x̄: 21.07 x̃: 16 helped stats (rel) min: 0.02% max: 11.57% x̄: 1.98% x̃: 0.83% HURT stats (abs) min: 1 max: 639 x̄: 22.81 x̃: 6 HURT stats (rel) min: 0.01% max: 26.21% x̄: 1.07% x̃: 0.26% 95% mean confidence interval for cycles value: -10.57 -6.21 95% mean confidence interval for cycles %-change: -1.23% -0.97% Cycles are helped. GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8137248 -> 8135196 (-0.03%) instructions in affected programs: 148322 -> 146270 (-1.38%) helped: 992 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.07 x̃: 2 helped stats (rel) min: 0.41% max: 9.73% x̄: 1.74% x̃: 1.51% 95% mean confidence interval for instructions value: -2.16 -1.98 95% mean confidence interval for instructions %-change: -1.80% -1.67% Instructions are helped. total cycles in shared programs: 188583424 -> 188575724 (<.01%) cycles in affected programs: 4409620 -> 4401920 (-0.17%) helped: 956 HURT: 6 helped stats (abs) min: 2 max: 168 x̄: 8.09 x̃: 8 helped stats (rel) min: 0.04% max: 6.76% x̄: 0.27% x̃: 0.18% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.10% max: 0.10% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: -8.41 -7.60 95% mean confidence interval for cycles %-change: -0.29% -0.25% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:14 -07:00
Ian Romanick	b77070e293	nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	96fcb3f95b	nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: `8368829` -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	d24edb4b8c	nir/algebraic: Simplify some comparisons like a+constant < constant v2: Remove unsafe integer versions of the optimizations. This change had no effect on shader-db results. Suggested by Caio. All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333713 -> 16332631 (<.01%) instructions in affected programs: 258112 -> 257030 (-0.42%) helped: 1275 HURT: 407 helped stats (abs) min: 1 max: 7 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.20% max: 8.33% x̄: 1.33% x̃: 0.86% HURT stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 2.94% x̄: 0.98% x̃: 0.98% 95% mean confidence interval for instructions value: -0.70 -0.59 95% mean confidence interval for instructions %-change: -0.84% -0.70% Instructions are helped. total cycles in shared programs: 367596791 -> 367601268 (<.01%) cycles in affected programs: 3420062 -> 3424539 (0.13%) helped: 1553 HURT: 783 helped stats (abs) min: 1 max: 742 x̄: 24.36 x̃: 6 helped stats (rel) min: 0.05% max: 21.12% x̄: 1.47% x̃: 0.65% HURT stats (abs) min: 1 max: 557 x̄: 54.04 x̃: 14 HURT stats (rel) min: 0.01% max: 33.66% x̄: 3.36% x̃: 1.43% 95% mean confidence interval for cycles value: -1.60 5.43 95% mean confidence interval for cycles %-change: -0.03% 0.33% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Iron Lake total instructions in shared programs: 8137992 -> 8137874 (<.01%) instructions in affected programs: 17501 -> 17383 (-0.67%) helped: 104 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.25% max: 2.63% x̄: 0.87% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.22 -1.00 95% mean confidence interval for instructions %-change: -0.94% -0.76% Instructions are helped. total cycles in shared programs: 188540038 -> 188539650 (<.01%) cycles in affected programs: 704574 -> 704186 (-0.06%) helped: 125 HURT: 84 helped stats (abs) min: 2 max: 96 x̄: 6.45 x̃: 4 helped stats (rel) min: <.01% max: 3.47% x̄: 0.42% x̃: 0.25% HURT stats (abs) min: 2 max: 58 x̄: 4.98 x̃: 4 HURT stats (rel) min: 0.01% max: 2.75% x̄: 0.36% x̃: 0.33% 95% mean confidence interval for cycles value: -3.20 -0.52 95% mean confidence interval for cycles %-change: -0.19% -0.03% Cycles are helped. GM45 total instructions in shared programs: 5008889 -> 5008830 (<.01%) instructions in affected programs: 8824 -> 8765 (-0.67%) helped: 52 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.25% max: 2.38% x̄: 0.86% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.27 -0.95 95% mean confidence interval for instructions %-change: -0.96% -0.71% Instructions are helped. total cycles in shared programs: 128969426 -> 128969128 (<.01%) cycles in affected programs: 399798 -> 399500 (-0.07%) helped: 74 HURT: 30 helped stats (abs) min: 2 max: 22 x̄: 6.76 x̃: 6 helped stats (rel) min: <.01% max: 1.83% x̄: 0.46% x̃: 0.29% HURT stats (abs) min: 2 max: 58 x̄: 6.73 x̃: 6 HURT stats (rel) min: 0.06% max: 2.75% x̄: 0.42% x̃: 0.21% 95% mean confidence interval for cycles value: -4.60 -1.14 95% mean confidence interval for cycles %-change: -0.32% -0.08% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	7c64cbf49d	nir/algebraic: Recognize (a < 0 \|\| 0 < b) as min(a, -b) < 0 Similar to commit `97e6c1b9` and `f5cf74d8ba`. First apply 0 < b => -b < 0 to get (a < 0 \|\| -b < 0), then apply some pre-existing rules to get min(a, -b) < 0. v2: Substantially update the comment explaining the use of is_used_once and the duplication of patterns. Suggested by Caio. Also, while flt and fge are not commutative, ior and iand are. Half of the original patterns were redundant, so delete them. As alternate justification for deleting them, fmin(a, -b) < 0 <=> 0 < fmax(-a, b). Proof left as an exercise for the reader. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333789 -> 16333713 (<.01%) instructions in affected programs: 11424 -> 11348 (-0.67%) helped: 32 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.20% max: 1.67% x̄: 0.76% x̃: 0.69% 95% mean confidence interval for instructions value: -3.03 -1.72 95% mean confidence interval for instructions %-change: -0.89% -0.62% Instructions are helped. total cycles in shared programs: 367598295 -> 367596791 (<.01%) cycles in affected programs: 141414 -> 139910 (-1.06%) helped: 23 HURT: 6 helped stats (abs) min: 3 max: 386 x̄: 72.52 x̃: 20 helped stats (rel) min: 0.15% max: 4.86% x̄: 1.01% x̃: 0.76% HURT stats (abs) min: 4 max: 88 x̄: 27.33 x̃: 12 HURT stats (rel) min: 0.22% max: 3.95% x̄: 1.08% x̃: 0.59% 95% mean confidence interval for cycles value: -93.51 -10.21 95% mean confidence interval for cycles %-change: -1.10% -0.05% Cycles are helped. total instructions in shared programs: 10830836 -> 10830779 (<.01%) instructions in affected programs: 6895 -> 6838 (-0.83%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.75 x̃: 1 helped stats (rel) min: 0.14% max: 1.61% x̄: 0.65% x̃: 0.33% 95% mean confidence interval for instructions value: -8.46 -1.04 95% mean confidence interval for instructions %-change: -1.03% -0.27% Instructions are helped. total cycles in shared programs: 154028477 -> 154032740 (<.01%) cycles in affected programs: 178433 -> 182696 (2.39%) helped: 3 HURT: 9 helped stats (abs) min: 3 max: 20 x̄: 11.00 x̃: 10 helped stats (rel) min: 0.07% max: 0.20% x̄: 0.12% x̃: 0.09% HURT stats (abs) min: 27 max: 1415 x̄: 477.33 x̃: 262 HURT stats (rel) min: 0.22% max: 6.45% x̄: 2.49% x̃: 1.76% 95% mean confidence interval for cycles value: 28.68 681.82 95% mean confidence interval for cycles %-change: 0.37% 3.30% Cycles are HURT. Iron Lake total instructions in shared programs: 8137966 -> 8137992 (<.01%) instructions in affected programs: 3281 -> 3307 (0.79%) helped: 0 HURT: 6 HURT stats (abs) min: 3 max: 7 x̄: 4.33 x̃: 3 HURT stats (rel) min: 0.63% max: 1.01% x̄: 0.76% x̃: 0.64% 95% mean confidence interval for instructions value: 2.17 6.50 95% mean confidence interval for instructions %-change: 0.56% 0.96% Instructions are HURT. total cycles in shared programs: 188539386 -> 188540038 (<.01%) cycles in affected programs: 103826 -> 104478 (0.63%) helped: 0 HURT: 7 HURT stats (abs) min: 16 max: 218 x̄: 93.14 x̃: 80 HURT stats (rel) min: 0.14% max: 0.95% x̄: 0.53% x̃: 0.46% 95% mean confidence interval for cycles value: 10.26 176.02 95% mean confidence interval for cycles %-change: 0.24% 0.81% Cycles are HURT. GM45 total instructions in shared programs: 5008876 -> 5008889 (<.01%) instructions in affected programs: 1645 -> 1658 (0.79%) helped: 0 HURT: 3 HURT stats (abs) min: 3 max: 7 x̄: 4.33 x̃: 3 HURT stats (rel) min: 0.63% max: 1.00% x̄: 0.76% x̃: 0.63% total cycles in shared programs: 128968950 -> 128969426 (<.01%) cycles in affected programs: 64854 -> 65330 (0.73%) helped: 0 HURT: 4 HURT stats (abs) min: 18 max: 218 x̄: 119.00 x̃: 120 HURT stats (rel) min: 0.14% max: 0.95% x̄: 0.60% x̃: 0.66% 95% mean confidence interval for cycles value: -62.92 300.92 95% mean confidence interval for cycles %-change: -0.05% 1.26% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	92b75c126b	nir/algebraic: Replace checks that a value is between (or not) [0, 1] v2: Add an extra line to one of the proofs. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16329772 -> 16329427 (<.01%) instructions in affected programs: 41980 -> 41635 (-0.82%) helped: 110 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 3.14 x̃: 2 helped stats (rel) min: 0.19% max: 5.56% x̄: 1.12% x̃: 0.94% 95% mean confidence interval for instructions value: -4.10 -2.17 95% mean confidence interval for instructions %-change: -1.28% -0.96% Instructions are helped. total cycles in shared programs: 367551273 -> 367549979 (<.01%) cycles in affected programs: 492462 -> 491168 (-0.26%) helped: 76 HURT: 25 helped stats (abs) min: 1 max: 400 x̄: 42.86 x̃: 12 helped stats (rel) min: 0.06% max: 10.72% x̄: 1.23% x̃: 0.75% HURT stats (abs) min: 2 max: 730 x̄: 78.52 x̃: 16 HURT stats (rel) min: 0.17% max: 6.89% x̄: 2.08% x̃: 1.23% 95% mean confidence interval for cycles value: -37.79 12.16 95% mean confidence interval for cycles %-change: -0.90% 0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Sandy Bridge total instructions in shared programs: 10831115 -> 10830836 (<.01%) instructions in affected programs: 37830 -> 37551 (-0.74%) helped: 70 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 3.99 x̃: 2 helped stats (rel) min: 0.33% max: 7.14% x̄: 1.21% x̃: 0.97% 95% mean confidence interval for instructions value: -5.47 -2.50 95% mean confidence interval for instructions %-change: -1.49% -0.92% Instructions are helped. total cycles in shared programs: 154029323 -> 154028477 (<.01%) cycles in affected programs: 247909 -> 247063 (-0.34%) helped: 52 HURT: 6 helped stats (abs) min: 2 max: 254 x̄: 25.81 x̃: 4 helped stats (rel) min: 0.07% max: 4.39% x̄: 0.81% x̃: 0.19% HURT stats (abs) min: 4 max: 403 x̄: 82.67 x̃: 8 HURT stats (rel) min: 0.18% max: 1.60% x̄: 0.71% x̃: 0.53% 95% mean confidence interval for cycles value: -34.83 5.65 95% mean confidence interval for cycles %-change: -0.98% -0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake total instructions in shared programs: 8138007 -> 8137966 (<.01%) instructions in affected programs: 4060 -> 4019 (-1.01%) helped: 31 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.32 x̃: 1 helped stats (rel) min: 0.68% max: 8.33% x̄: 1.45% x̃: 0.90% 95% mean confidence interval for instructions value: -1.50 -1.15 95% mean confidence interval for instructions %-change: -2.11% -0.79% Instructions are helped. total cycles in shared programs: 188539492 -> 188539386 (<.01%) cycles in affected programs: 26280 -> 26174 (-0.40%) helped: 25 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4 helped stats (rel) min: 0.08% max: 2.11% x̄: 0.54% x̃: 0.50% 95% mean confidence interval for cycles value: -5.08 -3.40 95% mean confidence interval for cycles %-change: -0.70% -0.37% Cycles are helped. GM45 total instructions in shared programs: 5008897 -> 5008876 (<.01%) instructions in affected programs: 2096 -> 2075 (-1.00%) helped: 16 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1 helped stats (rel) min: 0.68% max: 7.69% x̄: 1.41% x̃: 0.89% 95% mean confidence interval for instructions value: -1.57 -1.06 95% mean confidence interval for instructions %-change: -2.32% -0.49% Instructions are helped. total cycles in shared programs: 128969020 -> 128968950 (<.01%) cycles in affected programs: 18490 -> 18420 (-0.38%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 4.67 x̃: 4 helped stats (rel) min: 0.08% max: 2.11% x̄: 0.51% x̃: 0.48% 95% mean confidence interval for cycles value: -6.03 -3.30 95% mean confidence interval for cycles %-change: -0.78% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Erico Nunes	b3676a6548	nir/algebraic: rename lower_bitshift to lower_bitops Optimizations that insert bitshift or bitwise operations should not be applied on GPUs that don't support integer operations. The .lower_bitshift could be used to control the bitshift related ones, but there was also one bitwise optimization uncovered. Since only lima and freedreno use this option and the use case is that no bit operations are wanted, let's rename it to .lower_bitops and use it to control all bitops related optimizations. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-07-31 23:06:04 +02:00
Erico Nunes	4a407df682	nir/algebraic: add new fsum ops and fdot lowering The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can be used to optimize fdot lowering. fsum2 is not implemented and can be further lowered to an add with the vector components. Currently lima ppir handles this lowering internally, however this happens in a very late stage and requires a big chunk of code compared to a nir_opt_algebraic lowering. By having fsum in nir, we can reduce ppir complexity and enable the lowered ops to be part of other nir optimizations in the optimization loop. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-31 21:35:58 +02:00
Jonathan Marek	97c8314c5f	nir/algebraic: add scmp algebraic optimizations When 'x' is the result of a scmp op: x != 0.0 or x == 1.0: passthrough x == 0.0 or x != 1.0: invert Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	9be902097c	nir/algebraic: add option to lower fall_equalN/fany_nequalN Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal for vec4 backends that doesn't have any special instructions for it, as long as they support saturate. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	397375d3f3	nir/algebraic: add fdot2 optimizations Add simple fdot2 optimizations that are missing. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	1e089d0575	nir/algebraic: add option to lower fdph For backends that don't have a 'fdph' instructions Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	bc3b6168ba	nir: replace lower_sincos with algebraic opt This version has less ops for the same precision. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	5a4e71c082	nir/algebraic: allow swizzle in nir_algebraic replace expression This is to allow optimizations in nir_opt_algebraic not otherwise possible Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Rhys Perry	e8644122ed	nir/algebraic: mark a few comparison simplifications as precise No vkpipeline-db changes found. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-19 16:33:01 +00:00
Rhys Perry	79801b9d7d	nir/algebraic: optimize contradictory iand operands Some of these were found in a few GTAV, Rise of the Tomb Raider and Shadow of the Tomb Raider shaders. Results from vkpipeline-db run with ACO: Totals from affected shaders: SGPRS: 376 -> 376 (0.00 %) VGPRS: 220 -> 220 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13492 -> 11560 (-14.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 69 -> 69 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: use False instead of 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-19 16:33:01 +00:00
Jason Ekstrand	812b341578	nir/algebraic: Optimize comparisons and up-casts These seem like obvious enough optimizations in the world of multiple integer bit sizes. The only known thing which hits these at the moment is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast and check for equality. However, it's something that's bound to come up as we start seeing more integers in shaders. The optimizations of comparisons of casted values with constants are something which we would ideally do with range analysis. However, lacking that, we can do it in opt_algebraic as long as one side is a constant. In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along with the previous commit, reduce the number of instructions emitted on Skylake from 55328 to 44546, a reduction of 20%. Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	e8505e982a	nir/algebraic: Optimize comparing unpacked values We could, in theory, add the same optimization for 64-bit unpack operations but that's likely to fight with 64-bit integer lowering on platforms which require it so it will require more infrastructure before that will be a good idea. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Ian Romanick	ef7b4fdf3f	nir/algebraic: Recognize open-coded flrp(a, b, a) No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. All Gen6-Gen9 platforms had similar results. (Skylake shown) total instructions in shared programs: 15041996 -> 15041184 (<.01%) instructions in affected programs: 71776 -> 70964 (-1.13%) helped: 312 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3 helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28% 95% mean confidence interval for instructions value: -2.66 -2.55 95% mean confidence interval for instructions %-change: -1.89% -1.61% Instructions are helped. total cycles in shared programs: 354303333 -> 354301807 (<.01%) cycles in affected programs: 433742 -> 432216 (-0.35%) helped: 206 HURT: 78 helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8 helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82% HURT stats (abs) min: 1 max: 220 x̄: 35.95 x̃: 10 HURT stats (rel) min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56% 95% mean confidence interval for cycles value: -10.68 -0.06 95% mean confidence interval for cycles %-change: -0.99% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	0c2b3a7fc0	nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Convert the pattern directly to flrp. There were negligible improvements on Gen4 and Gen5, and Gen11 was actually hurt. I believe the problem is this optimization conflicts with the (1-x)*y => ffma(-x, y, y) optimization on Gen11. Skylake total instructions in shared programs: 15046487 -> 15041996 (-0.03%) instructions in affected programs: 194681 -> 190190 (-2.31%) helped: 880 HURT: 20 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17% 95% mean confidence interval for instructions value: -5.25 -4.73 95% mean confidence interval for instructions %-change: -5.11% -4.36% Instructions are helped. total cycles in shared programs: 354340839 -> 354303333 (-0.01%) cycles in affected programs: 1753622 -> 1716116 (-2.14%) helped: 786 HURT: 182 helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22 helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84% HURT stats (abs) min: 1 max: 440 x̄: 37.99 x̃: 9 HURT stats (rel) min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32% 95% mean confidence interval for cycles value: -45.90 -31.59 95% mean confidence interval for cycles %-change: -3.09% -2.50% Cycles are helped. All Gen6-Gen8 platforms had similar results. (Broadwell shown) total instructions in shared programs: 15055907 -> 15051466 (-0.03%) instructions in affected programs: 196370 -> 191929 (-2.26%) helped: 871 HURT: 26 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12% 95% mean confidence interval for instructions value: -5.21 -4.69 95% mean confidence interval for instructions %-change: -4.99% -4.24% Instructions are helped. total cycles in shared programs: 387729170 -> 387699745 (<.01%) cycles in affected programs: 1816409 -> 1786984 (-1.62%) helped: 788 HURT: 172 helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22 helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76% HURT stats (abs) min: 1 max: 404 x̄: 45.59 x̃: 14 HURT stats (rel) min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43% 95% mean confidence interval for cycles value: -35.69 -25.61 95% mean confidence interval for cycles %-change: -2.88% -2.40% Cycles are helped. total fills in shared programs: 34712 -> 34710 (<.01%) fills in affected programs: 7 -> 5 (-28.57%) helped: 1 HURT: 0 LOST: 0 GAINED: 2 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	09705747d7	nir/algebraic: Reassociate fadd into fmul in DPH-like pattern Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> 1220190 (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	ff9f526de3	nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a) v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. v3: Add a couple more patterns that just move the negation around. No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. Skylake total instructions in shared programs: 15279687 -> 15256058 (-0.15%) instructions in affected programs: 4344440 -> 4320811 (-0.54%) helped: 23455 HURT: 18 helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1 HURT stats (rel) min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34% 95% mean confidence interval for instructions value: -1.01 -1.00 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 355593755 -> 355339981 (-0.07%) cycles in affected programs: 162089552 -> 161835778 (-0.16%) helped: 20467 HURT: 7158 helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6 helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58% HURT stats (abs) min: 1 max: 4814 x̄: 47.46 x̃: 11 HURT stats (rel) min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98% 95% mean confidence interval for cycles value: -10.39 -7.98 95% mean confidence interval for cycles %-change: -0.57% -0.47% Cycles are helped. total spills in shared programs: 8843 -> 8835 (-0.09%) spills in affected programs: 190 -> 182 (-4.21%) helped: 2 HURT: 0 total fills in shared programs: 21738 -> 21738 (0.00%) fills in affected programs: 372 -> 372 (0.00%) helped: 1 HURT: 1 LOST: 12 GAINED: 22 Broadwell total instructions in shared programs: 15290523 -> 15266818 (-0.16%) instructions in affected programs: 4314738 -> 4291033 (-0.55%) helped: 23391 HURT: 11 helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 189 x̄: 18.09 x̃: 1 HURT stats (rel) min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50% 95% mean confidence interval for instructions value: -1.04 -0.99 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 388911660 -> 388830827 (-0.02%) cycles in affected programs: 172903324 -> 172822491 (-0.05%) helped: 15601 HURT: 13269 helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6 helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55% HURT stats (abs) min: 1 max: 14904 x̄: 28.21 x̃: 6 HURT stats (rel) min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60% 95% mean confidence interval for cycles value: -4.20 -1.40 95% mean confidence interval for cycles %-change: -0.17% -0.08% Cycles are helped. total spills in shared programs: 23110 -> 23069 (-0.18%) spills in affected programs: 656 -> 615 (-6.25%) helped: 3 HURT: 1 total fills in shared programs: 34399 -> 34398 (<.01%) fills in affected programs: 905 -> 904 (-0.11%) helped: 3 HURT: 1 LOST: 6 GAINED: 23 Haswell total instructions in shared programs: 13465303 -> 13441142 (-0.18%) instructions in affected programs: 3726999 -> 3702838 (-0.65%) helped: 22139 HURT: 347 helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12% 95% mean confidence interval for instructions value: -1.08 -1.07 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 376271308 -> 376273090 (<.01%) cycles in affected programs: 167496811 -> 167498593 (<.01%) helped: 13206 HURT: 13281 helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8 helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80% HURT stats (abs) min: 1 max: 3828 x̄: 35.32 x̃: 8 HURT stats (rel) min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61% 95% mean confidence interval for cycles value: -1.33 1.47 95% mean confidence interval for cycles %-change: 0.22% 0.36% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23158 -> 23134 (-0.10%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 34580 -> 34550 (-0.09%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 23 GAINED: 13 Ivy Bridge total instructions in shared programs: 12034154 -> 12014301 (-0.16%) instructions in affected programs: 3636209 -> 3616356 (-0.55%) helped: 18771 HURT: 459 helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11% 95% mean confidence interval for instructions value: -1.04 -1.02 95% mean confidence interval for instructions %-change: -0.86% -0.84% Instructions are helped. total cycles in shared programs: 180186960 -> 180175147 (<.01%) cycles in affected programs: 44652745 -> 44640932 (-0.03%) helped: 12979 HURT: 11033 helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6 helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74% HURT stats (abs) min: 1 max: 4811 x̄: 37.61 x̃: 9 HURT stats (rel) min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69% 95% mean confidence interval for cycles value: -2.29 1.31 95% mean confidence interval for cycles %-change: 0.11% 0.26% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3623 -> 3599 (-0.66%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 4061 -> 4031 (-0.74%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 17 GAINED: 18 Sandy Bridge total instructions in shared programs: 10853968 -> 10834932 (-0.18%) instructions in affected programs: 3769957 -> 3750921 (-0.50%) helped: 17944 HURT: 204 helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93% 95% mean confidence interval for instructions value: -1.05 -1.04 95% mean confidence interval for instructions %-change: -0.81% -0.78% Instructions are helped. total cycles in shared programs: 153894864 -> 153885988 (<.01%) cycles in affected programs: 50643925 -> 50635049 (-0.02%) helped: 9361 HURT: 10534 helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4 helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22% HURT stats (abs) min: 1 max: 1371 x̄: 16.42 x̃: 5 HURT stats (rel) min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27% 95% mean confidence interval for cycles value: -1.27 0.38 95% mean confidence interval for cycles %-change: -0.03% 0.04% Inconclusive result (value mean confidence interval includes 0). LOST: 6 GAINED: 24 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Sagar Ghuge	80117117bd	nir: Add optimization to use ROR/ROL instructions v2: 1) Add more optimization rules for ROL/ROR (Matt Turner) 2) Add lowering rules for ROL/ROR (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Ian Romanick	8d6b35fffd	nir/algebraic: Fail build when too many commutative expressions are used Search patterns that are expected to have too many (e.g., the giant bitfield_reverse pattern) can be added to a white list. This would have saved me a few hours debugging. :( v2: Implement the expected-failure annotation as a property of the search-replace pattern instead of as a property of the whole list of patterns. Suggested by Connor. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 18:56:19 -07:00
Eric Anholt	8fd8964302	nir: Fix lowering of bitfield_insert to shifts. The bfi/bfm behavior change replaced the bfi/bfm usage in lower_bitfield_insert_to_shifts with actual shifts like the name says, but it failed to handle the offset=0, bits==32 case in the new lowering. v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits, 31) -> false optimization. Fixes regressions in dEQP-GLES31.bitfield_insert on freedreno. Fixes: `165b7f3a44` ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-06-28 16:38:23 -07:00
Caio Marcelo de Oliveira Filho	085c0f1f13	nir/algebraic: Add helpers and a rule involving wrapping The helpers are needed so we can use the syntax `instr(cond)` in the algebraic rules. Add simple rule for dropping a pair of mul-div of the same value when wrapping is guaranteed to not happen. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-26 14:13:02 -07:00
Jonathan Marek	a70ff70158	nir: remove fnot/fxor/fand/for opcodes There doesn't seem to be any reason to keep these opcodes around: * fnot/fxor are not used at all. * fand/for are only used in lower_alu_to_scalar, but easily replaced Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-26 15:26:10 -04:00
Daniel Schürmann	a8b0b6e52b	nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select. bitfield_select is defined as: bitfield_select(mask, base, insert) = (mask & base) \| (~mask & insert) matching the behavior of AMD's BFI instruction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	1403c3a7bf	nir/algebraic: Use unsigned comparison when lowering bitfield insert/extract This lets us use the optimization pattern (('ult', 31, ('iand', b, 31)), False) to remove the bcsel instruction for code originating in D3D shaders. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	4eeb49ea71	nir/algebraic: Remove unnecessary iand of [iu]bfe and bfm sources The [iu]bfe and bfm instructions are defined to only use the five least significant bits. This optimizes a common pattern from D3D -> SPIR-V translation. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	165b7f3a44	nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec. That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	a74f256c58	nir/algebraic: add optimization pattern for ('ult', a, ('and', b, a)) and friends. These optimizations are based on the fact that 'and(a,b) <= umin(a,b)'. For AMD, this series moves the optimization from LLVM to NIR, so currently no vkpipeline-db changes here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-24 18:42:20 +02:00
Eduardo Lima Mitev	fb2169040a	nir/opt_algebraic: Fix rules for imadsh_mix16 The rules added in patch `3addd7c` are inverted: It should be: (al * bh) << 16 + c instead of: (ah * bl) << 16 + c Fixes a number of regressions under dEQP-GLES31.functional.draw_indirect.compute_interop.large.* on Freedreno. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-10 22:27:46 +02:00
Eduardo Lima Mitev	3addd7c8d9	nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16 For umul_low (al * bl), zero is returned if the low 16-bits word of either source is zero. for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl' is zero. A couple of nir_search_helpers are added: is_upper_half_zero() returns true if the highest word of all components of an integer NIR alu src are zero. is_lower_half_zero() returns true if the lowest word of all components of an integer nir alu src are zero. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Kenneth Graunke	c7d1b52a2c	nir: Combine lower_fmod16/32 back into a single lower_fmod. We originally had a single lower_fmod option. In commit `2ab2d2e5`, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit `ca31df6f`. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	edd45af9ba	nir: Drop lower_fmod64 option. nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Alyssa Rosenzweig	d2d3cc66cf	nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a) This pattern was noticed in glmark's jellyfish scene. v2: Add inexact qualifier due to NaN behaviour. Minimal shader-db changes (slightly helped). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-06-04 19:57:19 +00:00
Jonathan Marek	f889180ee1	nir: add lower_bitshift option Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Alyssa Rosenzweig	46494c3dc1	nir/algebraic: Remove problematic "optimization" This line is no longer relevant now that booleans are 1-bit, and in fact causes issues (infinite progress loop between algebraic optimizations and copy prop) with constant vector masks. No shader-db changes on Intel platforms (Jason). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2019-05-16 02:08:37 +00:00
Ian Romanick	32d259713b	nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	a7f0c57673	nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1) v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224337 -> 17224269 (<.01%) instructions in affected programs: 13578 -> 13510 (-0.50%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.05% -0.63% Instructions are helped. total cycles in shared programs: 360826090 -> 360825137 (<.01%) cycles in affected programs: 94867 -> 93914 (-1.00%) helped: 58 HURT: 1 helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18 helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22% HURT stats (abs) min: 76 max: 76 x̄: 76.00 x̃: 76 HURT stats (rel) min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86% 95% mean confidence interval for cycles value: -19.53 -12.78 95% mean confidence interval for cycles %-change: -1.56% -1.08% Cycles are helped. No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	281f20e26d	nir/algebraic: Strip double negatives from comparison sources All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224623 -> 17224337 (<.01%) instructions in affected programs: 32648 -> 32362 (-0.88%) helped: 148 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2 helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08% 95% mean confidence interval for instructions value: -1.97 -1.89 95% mean confidence interval for instructions %-change: -1.15% -1.00% Instructions are helped. total cycles in shared programs: 360828714 -> 360826090 (<.01%) cycles in affected programs: 347416 -> 344792 (-0.76%) helped: 148 HURT: 26 helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18 helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41% HURT stats (abs) min: 2 max: 337 x̄: 48.96 x̃: 6 HURT stats (rel) min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27% 95% mean confidence interval for cycles value: -23.78 -6.38 95% mean confidence interval for cycles %-change: -1.59% -0.79% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	45c7ff95fc	intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	d2a9ba03e3	Revert "nir: add late opt to turn inot/b2f combos back to bcsel" This reverts commit `7acc865226`. With these optimizations in place, the extra constant folding added in the next commit extends some live ranges of 0.0 and ±1.0 constants, and that causes several hundred shaders to have more spills and fills. I believe this optimization we made basically irrelevant by `7725d60938` "intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))". All Gen7.5+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225303 -> 17224634 (<.01%) instructions in affected programs: 879402 -> 878733 (-0.08%) helped: 679 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.02 -0.95 95% mean confidence interval for instructions %-change: -0.26% -0.22% Instructions are helped. total cycles in shared programs: 360842595 -> 360828542 (<.01%) cycles in affected programs: 110443594 -> 110429541 (-0.01%) helped: 389 HURT: 265 helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28 helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11% HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48 HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10% 95% mean confidence interval for cycles value: -75.65 32.67 95% mean confidence interval for cycles %-change: -0.49% -0.06% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12159 -> 12161 (0.02%) spills in affected programs: 13 -> 15 (15.38%) helped: 0 HURT: 1 total fills in shared programs: 25207 -> 25208 (<.01%) fills in affected programs: 25 -> 26 (4.00%) helped: 0 HURT: 1 Ivy Bridge total instructions in shared programs: 12082019 -> 12082013 (<.01%) instructions in affected programs: 1033 -> 1027 (-0.58%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.78% -0.45% Instructions are helped. total cycles in shared programs: 179849270 -> 179849157 (<.01%) cycles in affected programs: 4735 -> 4622 (-2.39%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18 helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36% 95% mean confidence interval for cycles value: -82.73 26.23 95% mean confidence interval for cycles %-change: -7.98% 2.28% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10882750 -> 10882748 (<.01%) instructions in affected programs: 266 -> 264 (-0.75%) helped: 2 HURT: 0 Iron Lake total cycles in shared programs: 188609440 -> 188609448 (<.01%) cycles in affected programs: 4320 -> 4328 (0.19%) helped: 0 HURT: 2 GM45 total cycles in shared programs: 129016868 -> 129016872 (<.01%) cycles in affected programs: 2302 -> 2306 (0.17%) helped: 0 HURT: 1 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3cb091f8b4	nir/algebraic: Eliminate a tautological compare The value-range tracking pass that is coming is not clever enough to know that the result of the ffma must be non-negative. Making it that smart will require quite a bit of work. It might be possible to add a special case that detects that a whole tree of fadd(fmul(fsat(a), fneg(fsat(a))), 1.0) cannot be negative. For cases when the comparison is used in the domain guard for a square-root (see nir/algebraic: Simplify fsqrt domain guard), the compare may be converted to a fmax. This patch also handles that case. All of the affected cases are in DiRT: Showdown. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225365 -> 17225303 (<.01%) instructions in affected programs: 40051 -> 39989 (-0.15%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.31% -0.22% Instructions are helped. total cycles in shared programs: 360842788 -> 360842595 (<.01%) cycles in affected programs: 1818081 -> 1817888 (-0.01%) helped: 29 HURT: 22 helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14 helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42% HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7 HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19% 95% mean confidence interval for cycles value: -14.48 6.91 95% mean confidence interval for cycles %-change: -0.71% 0.21% Inconclusive result (value mean confidence interval includes 0). No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	9725e45b3d	nir/algebraic: Simplify fsqrt domain guard All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228376 -> 17225365 (-0.02%) instructions in affected programs: 280732 -> 277721 (-1.07%) helped: 1072 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2 helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07% 95% mean confidence interval for instructions value: -2.92 -2.70 95% mean confidence interval for instructions %-change: -1.48% -1.37% Instructions are helped. total cycles in shared programs: 360935690 -> 360842788 (-0.03%) cycles in affected programs: 7838017 -> 7745115 (-1.19%) helped: 1569 HURT: 69 helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20 helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12% HURT stats (abs) min: 1 max: 2820 x̄: 98.22 x̃: 47 HURT stats (rel) min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31% 95% mean confidence interval for cycles value: -63.55 -49.89 95% mean confidence interval for cycles %-change: -3.33% -2.96% Cycles are helped. No changes on any other platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	5116646a76	nir/algebraic: Recognize open-coded fsat with modifiers This change also enables a later change (nir/algebraic: Replace 1-fsat(a) with fsat(1-a)) to affect more shaders. Almost all of the affected shaders are in Bioshock Infinite, and all of those shaders all require GLSL 4.10. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228584 -> 17228376 (<.01%) instructions in affected programs: 31438 -> 31230 (-0.66%) helped: 105 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1 helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70% 95% mean confidence interval for instructions value: -2.20 -1.76 95% mean confidence interval for instructions %-change: -0.80% -0.67% Instructions are helped. total cycles in shared programs: 360936431 -> 360935690 (<.01%) cycles in affected programs: 420100 -> 419359 (-0.18%) helped: 71 HURT: 21 helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10 helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48% HURT stats (abs) min: 1 max: 198 x̄: 29.90 x̃: 10 HURT stats (rel) min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90% 95% mean confidence interval for cycles value: -16.77 0.66 95% mean confidence interval for cycles %-change: -0.85% -0.06% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	c769641c8e	nir/algebraic: Push unary operations into source operands of fsat source Pushing a unary operation, like fneg, into the operation that generates its operand allows the fsat to be applied to the inner instruction instead of on a separate instruction that performs the unary operation. This changes fmul ssa_100, ssa_99, ssa_98 fmov.sat ssa_101, -ssa_100 into fmul.sat ssa_100, -ssa_99, ssa_98 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 17228658 -> 17228584 (<.01%) instructions in affected programs: 3163 -> 3089 (-2.34%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2 helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51% 95% mean confidence interval for instructions value: -1.66 -1.37 95% mean confidence interval for instructions %-change: -4.37% -3.00% Instructions are helped. total cycles in shared programs: 360937144 -> 360936431 (<.01%) cycles in affected programs: 24029 -> 23316 (-2.97%) helped: 47 HURT: 2 helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16 helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for cycles value: -16.05 -13.05 95% mean confidence interval for cycles %-change: -4.07% -3.15% Cycles are helped. All Gen7 and earlier platforms had similar results. (Haswell shown) total instructions in shared programs: 13536059 -> 13535884 (<.01%) instructions in affected programs: 8797 -> 8622 (-1.99%) helped: 150 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96% 95% mean confidence interval for instructions value: -1.23 -1.11 95% mean confidence interval for instructions %-change: -3.97% -3.05% Instructions are helped. total cycles in shared programs: 357696119 -> 357694193 (<.01%) cycles in affected programs: 50216 -> 48290 (-3.84%) helped: 109 HURT: 14 helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16 helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37% HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5 HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92% 95% mean confidence interval for cycles value: -19.27 -12.05 95% mean confidence interval for cycles %-change: -7.34% -5.31% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3b74790941	nir/algebraic: Recognize open-coded flrp(a, b, fsat(c)) All Gen6+ GPUs had similar results. (Skylake shown) total instructions in shared programs: 15336712 -> 15336622 (<.01%) instructions in affected programs: 3952 -> 3862 (-2.28%) helped: 24 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4 helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46% 95% mean confidence interval for instructions value: -4.06 -3.44 95% mean confidence interval for instructions %-change: -2.47% -2.22% Instructions are helped. total cycles in shared programs: 355722052 -> 355721235 (<.01%) cycles in affected programs: 27326 -> 26509 (-2.99%) helped: 20 HURT: 4 helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14 helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23% HURT stats (abs) min: 2 max: 64 x̄: 19.50 x̃: 6 HURT stats (rel) min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55% 95% mean confidence interval for cycles value: -61.61 -6.47 95% mean confidence interval for cycles %-change: -5.59% -0.39% Cycles are helped. No changes on Ice Lake, Iron Lake, or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:21 -07:00
Ian Romanick	a7724b1cbb	nir/algebraic: Add missing ffma(-1, a, b) pattern All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17229439 -> 17229377 (<.01%) instructions in affected programs: 9859 -> 9797 (-0.63%) helped: 41 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67% 95% mean confidence interval for instructions value: -1.88 -1.14 95% mean confidence interval for instructions %-change: -2.48% -0.81% Instructions are helped. total cycles in shared programs: 360944145 -> 360942989 (<.01%) cycles in affected programs: 178167 -> 177011 (-0.65%) helped: 36 HURT: 19 helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5 helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45% HURT stats (abs) min: 1 max: 34 x̄: 11.21 x̃: 6 HURT stats (rel) min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50% 95% mean confidence interval for cycles value: -36.01 -6.02 95% mean confidence interval for cycles %-change: -4.18% -0.57% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:03 -07:00
Ian Romanick	7b4ff6a1af	nir: Mark ffma as 2src_commutative This doesn't make any real difference now, but future work (not in this series) will add a LOT of ffma patterns. Having to duplicate all of them for ffma(a, b, c) and ffma(b, a, c) is just terrible. No shader-db changes on any Intel platform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	ab86926156	nir/algebraic: Reassociate open-coded flrp(1, b, c) In a previous verion of this patch, Jason commented, "Re-associating based on whether or not something has a constant value of 1.0 seems a bit sneaky. I think it's well within the rules but it seems like something that could bite you." That is possibly true. The reassociation will generate different results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as fabs(c) approaches 0. However, i965 has done this same reassociation indirectly for years. We would previously allow nir_op_flrp on all pre-Gen11 hardware even though Gen4 and Gen5 do not have a LRP instruction. Optimizations in nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a, b, c). On Gen7+, the hardware performs the same arithmetic as a(1-c)+bc. Gen6 seems to implement LRP as a+c(b-a). On Gen4 and Gen5, we would lower LRP to a sequence of instructions that implement a(1-c)+bc. The lowering happens after all constant folding, so we would litterally generate a 1+(-1) instruction sequence in this scenario: one instruction to load either 1 or -1 in a register, and another instruction to add either -1 or 1 to it. This patch just cuts out the middle man. Do the reassociation that we've always done, but do it explicitly at a time when we can benefit from other optimizations. A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently" are restored by this patch. This includes a few shaders in ET:QW. I tried a similar thing for open-coded flrp(-1, b, c), and it hurt instructions on 35 shaders for ILK without helping any. The helped / hurt cycles was about even. No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8172020 -> 8164367 (-0.09%) instructions in affected programs: 1089851 -> 1082198 (-0.70%) helped: 3285 HURT: 64 helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2 helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38% 95% mean confidence interval for instructions value: -2.32 -2.25 95% mean confidence interval for instructions %-change: -1.16% -1.09% Instructions are helped. total cycles in shared programs: 188758338 -> 188719974 (-0.02%) cycles in affected programs: 20004922 -> 19966558 (-0.19%) helped: 3012 HURT: 477 helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12 helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24% HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4 HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11% 95% mean confidence interval for cycles value: -11.38 -10.62 95% mean confidence interval for cycles %-change: -0.46% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	dc566a033c	nir/algebraic: Pull common multiplication out of flrp arguments All Intel platforms had similar results. (Skylake shown) total instructions in shared programs: 15342485 -> 15337495 (-0.03%) instructions in affected programs: 217456 -> 212466 (-2.29%) helped: 1539 HURT: 1 helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3 helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56% 95% mean confidence interval for instructions value: -3.39 -3.09 95% mean confidence interval for instructions %-change: -3.24% -2.96% Instructions are helped. total cycles in shared programs: 355734320 -> 355728237 (<.01%) cycles in affected programs: 1851555 -> 1845472 (-0.33%) helped: 835 HURT: 575 helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14 helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81% HURT stats (abs) min: 1 max: 322 x̄: 48.40 x̃: 14 HURT stats (rel) min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43% 95% mean confidence interval for cycles value: -8.50 -0.13 95% mean confidence interval for cycles %-change: 0.48% 1.62% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Ian Romanick	a83a6e9690	nir/algebraic: Pull common addition out of flrp arguments v2: Augment the late optimization patterns with a couple pre-ffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 -> 15342485 (<.01%) instructions in affected programs: 56304 -> 55807 (-0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: -2.31 -1.92 95% mean confidence interval for instructions %-change: -1.46% -1.09% Instructions are helped. total cycles in shared programs: 355734740 -> 355734320 (<.01%) cycles in affected programs: 1028807 -> 1028387 (-0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: -8.51 4.98 95% mean confidence interval for cycles %-change: -0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 -> 10886390 (<.01%) instructions in affected programs: 36883 -> 36458 (-1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: -3.12 -2.67 95% mean confidence interval for instructions %-change: -1.83% -1.38% Instructions are helped. total cycles in shared programs: 154188360 -> 154186902 (<.01%) cycles in affected programs: 388094 -> 386636 (-0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: -22.62 2.92 95% mean confidence interval for cycles %-change: -0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 -> 8220357 (-0.01%) instructions in affected programs: 54560 -> 53678 (-1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: -5.21 -4.28 95% mean confidence interval for instructions %-change: -2.23% -1.72% Instructions are helped. total cycles in shared programs: 188654442 -> 188650364 (<.01%) cycles in affected programs: 1454384 -> 1450306 (-0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: -22.38 -17.60 95% mean confidence interval for cycles %-change: -0.67% -0.46% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Jason Ekstrand	00d4e78ea9	nir/algebraic: Optimize integer cast-of-cast These have been popping up more and more with the OpenCL work and other bits causing extra conversions to/from 64-bit. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-26 04:26:08 -05:00
Erico Nunes	4577eb7b7c	nir/algebraic: add lowering for fsign The mali utgard pp doesn't support a sign instruction. In the ARM offline shader compiler, the sign function is implemented using sub(gt(0.0, a), lt(0.0, a)). This is a generic optimization, so implement it in the nir level when lower_fsign is set, alongside the lowering for isign. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-19 15:42:23 +00:00
Ian Romanick	6b97fa9a99	nir/algebraic: Strength reduce some compares of x and -x Converting the x vs -x comparison to an x vs 0 comparison enable cmod propagation to help. The seems to be a win everywhere except Gen7. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15566733 -> 15566014 (<.01%) instructions in affected programs: 72617 -> 71898 (-0.99%) helped: 302 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98% 95% mean confidence interval for instructions value: -2.55 -2.21 95% mean confidence interval for instructions %-change: -1.40% -1.16% Instructions are helped. total cycles in shared programs: 413014786 -> 413015475 (<.01%) cycles in affected programs: 707594 -> 708283 (0.10%) helped: 227 HURT: 101 helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20 helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49% HURT stats (abs) min: 2 max: 334 x̄: 87.90 x̃: 45 HURT stats (rel) min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36% 95% mean confidence interval for cycles value: -8.12 12.32 95% mean confidence interval for cycles %-change: -0.67% 0.34% Inconclusive result (value mean confidence interval includes 0). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13828220 -> 13827881 (<.01%) instructions in affected programs: 60887 -> 60548 (-0.56%) helped: 253 HURT: 6 helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1 helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27% 95% mean confidence interval for instructions value: -1.39 -1.23 95% mean confidence interval for instructions %-change: -0.85% -0.70% Instructions are helped. total cycles in shared programs: 386870095 -> 386894412 (<.01%) cycles in affected programs: 1537307 -> 1561624 (1.58%) helped: 127 HURT: 188 helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4 helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33% HURT stats (abs) min: 2 max: 5585 x̄: 141.43 x̃: 14 HURT stats (rel) min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06% 95% mean confidence interval for cycles value: 21.95 132.45 95% mean confidence interval for cycles %-change: 0.32% 0.85% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10896339 -> 10896276 (<.01%) instructions in affected programs: 10757 -> 10694 (-0.59%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1 helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89% 95% mean confidence interval for instructions value: -1.42 -1.15 95% mean confidence interval for instructions %-change: -1.03% -0.72% Instructions are helped. total cycles in shared programs: 155091003 -> 155090480 (<.01%) cycles in affected programs: 102761 -> 102238 (-0.51%) helped: 51 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4 helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36% 95% mean confidence interval for cycles value: -12.98 -7.53 95% mean confidence interval for cycles %-change: -0.97% -0.56% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8234667 -> 8234652 (<.01%) instructions in affected programs: 2063 -> 2048 (-0.73%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.97% -0.67% Instructions are helped. total cycles in shared programs: 188700906 -> 188700598 (<.01%) cycles in affected programs: 283480 -> 283172 (-0.11%) helped: 83 HURT: 3 helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4 helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04% 95% mean confidence interval for cycles value: -3.87 -3.29 95% mean confidence interval for cycles %-change: -0.16% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	f3d6df719c	nir/algebraic: Fix some 1-bit Boolean weirdness Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total cycles in shared programs: 372594532 -> 372594460 (<.01%) cycles in affected programs: 46854 -> 46782 (-0.15%) helped: 9 HURT: 0 helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09% 95% mean confidence interval for cycles value: -14.34 -1.66 95% mean confidence interval for cycles %-change: -0.28% -0.04% Cycles are helped. Ivy Bridge total instructions in shared programs: 12038379 -> 12038373 (<.01%) instructions in affected programs: 1278 -> 1272 (-0.47%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55% total cycles in shared programs: 180889027 -> 180888997 (<.01%) cycles in affected programs: 29979 -> 29949 (-0.10%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5 helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07% 95% mean confidence interval for cycles value: -13.40 1.40 95% mean confidence interval for cycles %-change: -0.27% 0.05% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total cycles in shared programs: 155091021 -> 155091003 (<.01%) cycles in affected programs: 8842 -> 8824 (-0.20%) helped: 2 HURT: 0 No changes on Iron Lake or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	403aac7500	nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel All of the affected shaders are in Mad Max. I noticed this while looking at some other things. I tried a couple similar patterns, but the affect on cycles was general negative. It may be worth revisiting this later. v2: Rebase on 1-bit Boolean changes. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15282073 -> 15282053 (<.01%) instructions in affected programs: 1192 -> 1172 (-1.68%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1 helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39% 95% mean confidence interval for instructions value: -1.73 -1.13 95% mean confidence interval for instructions %-change: -1.91% -1.38% Instructions are helped. total cycles in shared programs: 372595954 -> 372594532 (<.01%) cycles in affected programs: 11477 -> 10055 (-12.39%) helped: 14 HURT: 0 helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104 helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78% 95% mean confidence interval for cycles value: -111.05 -92.09 95% mean confidence interval for cycles %-change: -14.90% -10.98% Cycles are helped. No changes on any Gen6 or earlier platforms. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	25bfba3335	nir/algebraic: Recognize open-coded copysign(1.0, a) All of the affected shaders are in Mad Max. The inner part of the pattern is itself an open-coded sign(a). I tried using that as a pattern, but the results were not good. A bunch of shaders were helped for instructions, but overall cycles, spill, and fills were hurt. v2: Rebase on 1-bit Boolean changes. v3: Fix order of copysign() parameters in comments and commit message. Noticed by Matt. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15282141 -> 15282073 (<.01%) instructions in affected programs: 6106 -> 6038 (-1.11%) helped: 17 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06% 95% mean confidence interval for instructions value: -4.00 -4.00 95% mean confidence interval for instructions %-change: -1.30% -1.00% Instructions are helped. total cycles in shared programs: 372597886 -> 372595954 (<.01%) cycles in affected programs: 32701 -> 30769 (-5.91%) helped: 17 HURT: 0 helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118 helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83% 95% mean confidence interval for cycles value: -152.84 -74.45 95% mean confidence interval for cycles %-change: -8.89% -3.51% Cycles are helped. No changes on any Gen6 or earlier platforms. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Christian Gmeiner	b6bed115a5	nir: add lower_ftrunc Port TGSI TRUNC lowering to nir Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-13 17:54:48 +00:00
Caio Marcelo de Oliveira Filho	d08a74d2bf	nir/algebraic: Lower CS derivatives to zero when no group defined In compute shaders if no derivative group is defined, the derivatives will always be zero. Specified in NV_compute_shader_derivatives. To make the check more convenient, add a "info" local variable to the generated code so we can refer to it in the Python rules. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:32 -07:00
Jason Ekstrand	ad8c145658	nir/algebraic: Add some logical OR and AND patterns The new OR pattern has been seen in the wild and can end up being generated by GLSLang. Not sure about the other two new patterns but we may as well throw them in for completeness. While we're here, we can drop the '@bool' specifier from the one pattern because specifying True already implies 1-bit which basically implies boolean. Shader-db results on Kaby Lake: total instructions in shared programs: 15321227 -> 15321129 (<.01%) instructions in affected programs: 3594 -> 3496 (-2.73%) helped: 6 HURT: 0 total cycles in shared programs: 357481321 -> 357479725 (<.01%) cycles in affected programs: 44109 -> 42513 (-3.62%) helped: 6 HURT: 0 VkPipeline-DB results on Kaby Lake: total instructions in shared programs: 3770504 -> 3769734 (-0.02%) instructions in affected programs: 19058 -> 18288 (-4.04%) helped: 163 HURT: 0 total cycles in shared programs: 1417583701 -> 1417569727 (<.01%) cycles in affected programs: 750958 -> 736984 (-1.86%) helped: 158 HURT: 1 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 18:39:06 -05:00
Jason Ekstrand	03a72d96d8	nir/algebraic: Drop some @bool specifiers Now that we have one-bit booleans, we don't need to rely on looking at parent instructions in order to figure out if a value is a Boolean most of the time. We can drop these specifiers and now the optimizations will apply more generally. Shader-DB results on Kaby Lake: total instructions in shared programs: 15321168 -> 15321227 (<.01%) instructions in affected programs: 8836 -> 8895 (0.67%) helped: 1 HURT: 31 total cycles in shared programs: 357481781 -> 357481321 (<.01%) cycles in affected programs: 146524 -> 146064 (-0.31%) helped: 22 HURT: 10 total spills in shared programs: 23675 -> 23673 (<.01%) spills in affected programs: 11 -> 9 (-18.18%) helped: 1 HURT: 0 total fills in shared programs: 32040 -> 32036 (-0.01%) fills in affected programs: 27 -> 23 (-14.81%) helped: 1 HURT: 0 No change in VkPipeline-DB Looking at the instructions hurt, a bunch of them seem to be a case where doing exactly the right thing in NIR ends up doing the wrong-ish thing in the back-end because flags are dumb. In particular, there's a case where we have a MUL followed by a CMP followed by a SEL and when we turn that SEL into an OR, it uses the GRF result of the CMP rather than the flag result so the CMP can't be merged with the MUL. Those shaders appear to schedule better according to the cycle estimates so I guess it's a win? Also it helps spilling in one Car Chase compute shader. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 18:39:00 -05:00
Ian Romanick	ae21b52e1d	nir/algebraic: Add missing 16-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2] Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]	2019-03-28 15:35:52 -07:00
Ian Romanick	cbad201c2b	nir/algebraic: Add missing 64-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2] Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]	2019-03-28 15:35:52 -07:00
Ian Romanick	bc17f5a2a3	nir/algebraic: Remove redundant extract_[iu]8 patterns No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-28 15:35:52 -07:00
Ian Romanick	c152672e68	nir/algebraic: Fix up extract_[iu]8 after loop unrolling Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2] Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]	2019-03-28 15:35:52 -07:00
Iago Toral Quiroga	763c8aabed	compiler/nir: add lowering for 16-bit ldexp v2 (Topi): - Make bit-size handling order be 16-bit, 32-bit, 64-bit - Clamp lower exponent range at -28 instead of -30. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Iago Toral Quiroga	3766334923	compiler/nir: add lowering for 16-bit flrp And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Iago Toral Quiroga	ca31df6f1f	compiler/nir: add lowering option for 16-bit fmod And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Jason Ekstrand	2b76de9b5d	nir/algebraic: Add a couple optimizations for iabs and ishr Shader-db results on Kaby Lake: total instructions in shared programs: 15225213 -> 15222365 (-0.02%) instructions in affected programs: 43524 -> 40676 (-6.54%) helped: 203 HURT: 0 Lots of shaders in Shadow Warrior had this pattern along with Deus Ex, Civ, Shadow of Mordor, and several others. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-15 01:02:19 +00:00
Kenneth Graunke	da51e3f1b0	Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers) This broke piles of image load store tests (179 failures on CI, mesa_master build #15546, previous build right before this landed was green). I'd rather not leave the tree on fire over the weekend, so let's revert for now, and we can figure out what happened next week.	2019-03-09 01:42:16 -08:00
Ian Romanick	18e4bf65de	nir/algebraic: Add missing 16-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	55c1ac4b75	nir/algebraic: Add missing 64-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	9aaaac6080	nir/algebraic: Remove redundant extract_[iu]8 patterns No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	37ee462e03	nir/algebraic: Fix up extract_[iu]8 after loop unrolling Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Karol Herbst	272e927d0e	nir/spirv: initial handling of OpenCL.std extension opcodes Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Sagar Ghuge	47ec9bdc60	nir/algebraic: Optimize low 32 bit extraction Optimize a situation where we only need lower 32 bits from 64 bit result. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	e551040c60	nir/glsl: Add another way of doing lower_imul64 for gen8+ On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Ian Romanick	bae0c36751	nir/algebraic: Optimize away an fsat of a b2f The b2f can only produce 0.0 or 1.0, so the fsat does nothing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-02 13:58:56 -08:00
Ian Romanick	ecc9ffa778	nir/algebraic: Replace a-fract(a) with floor(a) I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-03-01 12:43:25 -08:00
Ian Romanick	d40640efe8	nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a \|\| b)) I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	eae19f5f19	nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Daniel Schürmann	0bd45f96b9	nir: Use SM5 properties to optimize shift(a@32, iand(31, b)) This is a common pattern from HLSL->SPIRV translation and supported in HW by all current NIR backends. vkpipeline-db results anv (SKL): total instructions in shared programs: `6403130` -> 6402380 (-0.01%) instructions in affected programs: 204084 -> 203334 (-0.37%) helped: 208 HURT: 0 total cycles in shared programs: 1915629582 -> 1918198408 (0.13%) cycles in affected programs: 1158892682 -> 1161461508 (0.22%) helped: 107 HURT: 86 shader-db results on i965 (KBL): total instructions in shared programs: 15284592 -> 15284568 (<.01%) instructions in affected programs: 81683 -> 81659 (-0.03%) helped: 24 HURT: 0 total cycles in shared programs: 375013622 -> 375013932 (<.01%) cycles in affected programs: 40169618 -> 40169928 (<.01%) helped: 13 HURT: 9 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:44 -06:00
Kenneth Graunke	535251487b	nir: Don't reassociate add/mul chains containing only constants The idea here is to reassociate a * (b * c) into (a * c) * b, when b is a non-constant value, but a and c are constants, allowing them to be combined. But nothing was enforcing that 'b' must be non-constant, which meant that running opt_algebraic in a loop would never terminate if the IR contained non-folded constant expressions like 256 * 0.5 * 2. Normally, we call constant folding in such a loop too, but IMO it's better for nir_opt_algebraic to be robust and not rely on that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581 Fixes: `32e266a9a5` i965: Compile fp64 funcs only if we do not have 64-bit hardware support Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-16 23:36:14 -08:00
Ian Romanick	979b43b347	nir/algebraic: Simplify comparison with sequential integers starting with 0 All of the affected shaders are Unreal4 demos. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437170 -> 15437001 (<.01%) instructions in affected programs: 21536 -> 21367 (-0.78%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4 helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -4.07 -3.79 95% mean confidence interval for instructions %-change: -0.83% -0.77% Instructions are helped. total cycles in shared programs: 383007896 -> 383007378 (<.01%) cycles in affected programs: 158640 -> 158122 (-0.33%) helped: 38 HURT: 4 helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6 helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19% HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -16.90 -7.77 95% mean confidence interval for cycles %-change: -0.39% -0.19% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8213746 -> 8213745 (<.01%) instructions in affected programs: 127 -> 126 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 187734146 -> 187734144 (<.01%) cycles in affected programs: 2132 -> 2130 (-0.09%) helped: 1 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Ian Romanick	ad05920258	nir/algebraic: Convert some f2u to f2i Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floating-point value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shader-db results compairing before "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 -> 383007896 (<.01%) cycles in affected programs: 85208 -> 85108 (-0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: -9.31 -0.21 95% mean confidence interval for cycles %-change: -0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 -> 415251370 (<.01%) cycles in affected programs: 83750 -> 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %-change: <.01% 0.37% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 -> 13823886 (<.01%) instructions in affected programs: 2249 -> 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 -> 390094001 (<.01%) cycles in affected programs: 85640 -> 85398 (-0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: -17.36 -5.69 95% mean confidence interval for cycles %-change: -0.44% -0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 -> 180986552 (<.01%) cycles in affected programs: 34835 -> 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %-change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 -> 154603970 (<.01%) cycles in affected programs: 171514 -> 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: -0.91 0.96 95% mean confidence interval for cycles %-change: -0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Eric Anholt	42d2cae907	nir: Move panfrost's isign lowering to nir_opt_algebraic. I wanted to reuse this from v3d. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-14 00:32:30 +00:00
Ian Romanick	96c4b135e3	nir/algebraic: Don't put quotes around floating point literals The quotation marks around 1.0 cause it to be treated as a string instead of a floating point value. The generator then treats it as an arbitrary variable replacement, so any iand involving a ('ineg', ('b2i', a)) matches. v2: Remove misleading comment about sized literals (suggested by Timothy). Add assertion that the name of a varible is entierly alphabetic (suggested by Jason). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Fixes: `6bcd2af086` ("nir/algebraic: Add some optimizations for D3D-style Booleans") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075	2018-12-18 23:28:31 -08:00
Jason Ekstrand	6bcd2af086	nir/algebraic: Add some optimizations for D3D-style Booleans D3D Booleans use a 32-bit 0/-1 representation. Because this previously matched NIR exactly, we didn't have to really optimize for it. Now that we have 1-bit Booleans, we need some specific optimizations to chew through the D3D12-style Booleans. Shader-db results on Kaby Lake: total instructions in shared programs: 15136811 -> 14967944 (-1.12%) instructions in affected programs: 2457021 -> 2288154 (-6.87%) helped: 8318 HURT: 10 total cycles in shared programs: 373544524 -> 359701825 (-3.71%) cycles in affected programs: 151029683 -> 137186984 (-9.17%) helped: 7749 HURT: 682 total loops in shared programs: 4431 -> 4399 (-0.72%) loops in affected programs: 32 -> 0 helped: 21 HURT: 0 total spills in shared programs: 10290 -> 10051 (-2.32%) spills in affected programs: 2532 -> 2293 (-9.44%) helped: 18 HURT: 18 total fills in shared programs: 22203 -> 21732 (-2.12%) fills in affected programs: 3319 -> 2848 (-14.19%) helped: 18 HURT: 18 Note that a large chunk of the improvement fixing regressions caused by switching to 1-bit Booleans. Previously, our ability to optimize D3D booleans was improved by using the D3D representation directly in NIR. Now that NIR does 1-bit bools, we need a few more optimizations. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3b30814791	nir/algebraic: Optimize 1-bit Booleans Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	615cc26b97	nir/algebraic: Generalize an optimization This just makes it nicely scale across bit sizes. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	b569093566	nir/algebraic: Make an optimization more specific Later in this series, bool is not going to imply 32-bit. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	517099809a	nir: Drop support for lower_b2f This was originally added for the out-of-tree Mali driver but I think we've all agreed it's easy enough for them to just do in their back-end. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	4bb1a34727	nir/algebraic: Optimize x2b(xneg(a)) -> a Shader-db results on Kaby Lake: total instructions in shared programs: 15072525 -> 15072525 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This helps prevent regressions in later commits. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Jason Ekstrand	be98b1db38	nir/opt_algebraic: Add 32-bit specifiers to a bunch of booleans Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:03 -06:00
Jason Ekstrand	2715080d65	nir/opt_algebraic: Drop bit-size suffixes from conversions Suffixes are dropped from a bunch of conversion opcodes when it makes sense to do so. Others are kept if we really do want the bit-size restriction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:01 -06:00
Jason Ekstrand	ff8e3d3b7b	nir/opt_algebraic: Simplify an optimization using the new search ops Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:58 -06:00
Jonathan Marek	3e7186d472	nir: add fceil lowering lowers ceil(x) as -floor(-x) Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Christian Gmeiner	c6aaafa3a1	nir: add lowering for ffloor Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 21:57:25 +01:00
Jason Ekstrand	6068be543b	nir/algebraic: Generalize an optimization There's nothing boolean about (a \| ~a) ~> -1 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	d7e0d47b9d	nir: Add a bunch of b2[if] optimizations The b2f and b2i conversions always produce zero or one which are both representable in every type and size. Since b2i and b2f support all bit sizes, we can just get rid of the conversion opcode. total instructions in shared programs: 15089335 -> 15084368 (-0.03%) instructions in affected programs: 212564 -> 207597 (-2.34%) helped: 896 HURT: 0 total cycles in shared programs: 369831123 -> 369826267 (<.01%) cycles in affected programs: 2008647 -> 2003791 (-0.24%) helped: 693 HURT: 216 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Ian Romanick	a68dd47b91	nir/algebraic: Simplify fsat of fsign These allows us to not support fsign.sat in the Intel compiler backend, and that will simplify some later changes. No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	1546204cdd	nir/algebraic: sign(x)xx is abs(x)*x shader-db results: All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15106023 -> 15105981 (<.01%) instructions in affected programs: 300 -> 258 (-14.00%) helped: 6 HURT: 0 helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00% 95% mean confidence interval for instructions value: -7.00 -7.00 95% mean confidence interval for instructions %-change: -14.00% -14.00% Instructions are helped. total cycles in shared programs: 566050327 -> 566050075 (<.01%) cycles in affected programs: 2826 -> 2574 (-8.92%) helped: 6 HURT: 0 helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42 helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92% 95% mean confidence interval for cycles value: -44.30 -39.70 95% mean confidence interval for cycles %-change: -8.95% -8.88% Cycles are helped. No changes on Gen6 or earlier. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Jason Ekstrand	d448fa3ae3	nir/algebraic: Add some max/min optimizations Found by inspection. This doesn't help much now but we'll see this pattern with images if you load UNORM and then store UNORM. Shader-db results on Kaby Lake: total instructions in shared programs: 15166916 -> 15166910 (<.01%) instructions in affected programs: 761 -> 755 (-0.79%) helped: 6 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	4dd5263663	nir/algebraic: Add more extract_[iu](8\|16) optimizations This adds the "(a << N) >> M" family of mask or sign-extensions. Not a huge win right now but this pattern will soon be generated by NIR format lowering code. Shader-db results on Kaby Lake: total instructions in shared programs: 15166918 -> 15166916 (<.01%) instructions in affected programs: 36 -> 34 (-5.56%) helped: 2 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	116b47fe3c	nir/algebraic: Be more careful converting ushr to extract_u8/16 If it's not the right bit-size, it may not actually be the correct extraction. For now, we'll only worry about 32-bit versions. Fixes: `905ff86198` "nir: Recognize open-coded extract_u16" Fixes: `76289fbfa8` "nir: Recognize open-coded extract_u8" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Mathieu Bridon	d9ca4a172e	python: Use the right function for the job The code was just reimplementing itertools.combinations_with_replacement in a less efficient way. This does change the order of the results slightly, but it should be ok. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:18 -07:00
Ian Romanick	3b07d28f81	nir: Transform expressions of b2f(a) and b2f(b) to a == b All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276886 -> 14276838 (<.01%) instructions in affected programs: 312 -> 264 (-15.38%) helped: 2 HURT: 0 total cycles in shared programs: 532578395 -> 532570985 (<.01%) cycles in affected programs: 682562 -> 675152 (-1.09%) helped: 374 HURT: 4 helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18 helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28% HURT stats (abs) min: 2 max: 114 x̄: 53.50 x̃: 49 HURT stats (rel) min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15% 95% mean confidence interval for cycles value: -21.30 -17.91 95% mean confidence interval for cycles %-change: -1.30% -1.06% Cycles are helped. Sandy Bridge total instructions in shared programs: 10488123 -> 10488075 (<.01%) instructions in affected programs: 336 -> 288 (-14.29%) helped: 2 HURT: 0 total cycles in shared programs: 150260379 -> 150260439 (<.01%) cycles in affected programs: 4726 -> 4786 (1.27%) helped: 0 HURT: 2 No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	c658b6c4c8	nir: Transform expressions of b2f(a) and b2f(b) to a ^^ b All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276892 -> 14276886 (<.01%) instructions in affected programs: 484 -> 478 (-1.24%) helped: 2 HURT: 0 total cycles in shared programs: 532578397 -> 532578395 (<.01%) cycles in affected programs: 3522 -> 3520 (-0.06%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	3aca80aabc	nir: Transform expressions of b2f(a) and b2f(b) to !(a && b) All Gen platforms had pretty similar results. (Skylake shown) total cycles in shared programs: 532578400 -> 532578397 (<.01%) cycles in affected programs: 2784 -> 2781 (-0.11%) helped: 1 HURT: 1 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	1713c97181	nir: Transform expressions of b2f(a) and b2f(b) to a && b No changes on any Gen platform. v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	4425f4786a	nir: Transform expressions of b2f(a) and b2f(b) to !(a \|\| b) All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276961 -> 14276892 (<.01%) instructions in affected programs: 3215 -> 3146 (-2.15%) helped: 28 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92% 95% mean confidence interval for instructions value: -2.87 -2.06 95% mean confidence interval for instructions %-change: -5.73% -2.95% Instructions are helped. total cycles in shared programs: 532577068 -> 532578400 (<.01%) cycles in affected programs: 121864 -> 123196 (1.09%) helped: 35 HURT: 30 helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22 helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86% HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36 HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58% 95% mean confidence interval for cycles value: -5.02 46.01 95% mean confidence interval for cycles %-change: -0.99% 1.65% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781299 -> 7781342 (<.01%) instructions in affected programs: 22300 -> 22343 (0.19%) helped: 13 HURT: 40 helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3 helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43% 95% mean confidence interval for instructions value: 0.23 1.39 95% mean confidence interval for instructions %-change: -1.18% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 177878928 -> 177879332 (<.01%) cycles in affected programs: 383298 -> 383702 (0.11%) helped: 7 HURT: 43 helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10 helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40% HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12 HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09% 95% mean confidence interval for cycles value: 5.21 10.95 95% mean confidence interval for cycles %-change: -0.51% 0.21% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	6b3670ae80	nir: Transform -fabs(a) >= 0 to a == 0 All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276964 -> 14276961 (<.01%) instructions in affected programs: 411 -> 408 (-0.73%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68% total cycles in shared programs: 532577062 -> 532577068 (<.01%) cycles in affected programs: 1093 -> 1099 (0.55%) helped: 1 HURT: 1 helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77% HURT stats (abs) min: 22 max: 22 x̄: 22.00 x̃: 22 HURT stats (rel) min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	46e7c340d4	nir: Transform expressions of b2f(a) and b2f(b) to a \|\| b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	be7d3ba34a	nir: Transform -fabs(a) < 0 to a != 0 Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if a is NaN. All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277216 -> 14277184 (<.01%) instructions in affected programs: 2300 -> 2268 (-1.39%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3 helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01% 95% mean confidence interval for instructions value: -6.45 -1.55 95% mean confidence interval for instructions %-change: -9.96% 1.13% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 532577848 -> 532577990 (<.01%) cycles in affected programs: 17486 -> 17628 (0.81%) helped: 2 HURT: 5 helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93% HURT stats (abs) min: 6 max: 50 x̄: 30.00 x̃: 26 HURT stats (rel) min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02% 95% mean confidence interval for cycles value: -1.06 41.63 95% mean confidence interval for cycles %-change: -0.58% 1.74% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	d49eab2757	nir: Rearrange bcsel with two bcsel sources All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277220 -> 14277216 (<.01%) instructions in affected programs: 422 -> 418 (-0.95%) helped: 2 HURT: 0 total cycles in shared programs: 532577908 -> 532577848 (<.01%) cycles in affected programs: 2800 -> 2740 (-2.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	b92fded6eb	nir: Collapse more repeated bcsels on the same argument All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277230 -> 14277220 (<.01%) instructions in affected programs: 751 -> 741 (-1.33%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32% 95% mean confidence interval for instructions value: -3.42 -1.58 95% mean confidence interval for instructions %-change: -1.47% -1.17% Instructions are helped. total cycles in shared programs: 532577947 -> 532577908 (<.01%) cycles in affected programs: 10641 -> 10602 (-0.37%) helped: 4 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7 helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60% HURT stats (abs) min: 2 max: 8 x̄: 5.33 x̃: 6 HURT stats (rel) min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23% 95% mean confidence interval for cycles value: -20.69 9.55 95% mean confidence interval for cycles %-change: -1.63% 0.63% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-04 01:12:03 -07:00
Ian Romanick	408330ed48	nir: Don't compare i2f or u2i with zero Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277620 -> 14277230 (<.01%) instructions in affected programs: 36905 -> 36515 (-1.06%) helped: 101 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6 helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51% HURT stats (abs) min: 1 max: 28 x̄: 10.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47% 95% mean confidence interval for instructions value: -4.59 -2.70 95% mean confidence interval for instructions %-change: -1.90% -1.41% Instructions are helped. total cycles in shared programs: 532580716 -> 532577947 (<.01%) cycles in affected programs: 940575 -> 937806 (-0.29%) helped: 92 HURT: 12 helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62 helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41% HURT stats (abs) min: 10 max: 1112 x̄: 160.58 x̃: 63 HURT stats (rel) min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20% 95% mean confidence interval for cycles value: -50.66 -2.59 95% mean confidence interval for cycles %-change: -2.09% -0.73% Cycles are helped. total spills in shared programs: 8116 -> 8124 (0.10%) spills in affected programs: 200 -> 208 (4.00%) helped: 0 HURT: 2 total fills in shared programs: 11086 -> 11094 (0.07%) fills in affected programs: 436 -> 444 (1.83%) helped: 0 HURT: 2 Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12979054 -> 12978067 (<.01%) instructions in affected programs: 33633 -> 32646 (-2.93%) helped: 120 HURT: 2 helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13 helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17% HURT stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 HURT stats (rel) min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00% 95% mean confidence interval for instructions value: -9.19 -6.99 95% mean confidence interval for instructions %-change: -5.27% -3.62% Instructions are helped. total cycles in shared programs: 411212880 -> 411199636 (<.01%) cycles in affected programs: 696441 -> 683197 (-1.90%) helped: 107 HURT: 5 helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146 helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88% HURT stats (abs) min: 2 max: 50 x̄: 24.00 x̃: 22 HURT stats (rel) min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25% 95% mean confidence interval for cycles value: -136.96 -99.54 95% mean confidence interval for cycles %-change: -9.75% -6.53% Cycles are helped. total spills in shared programs: 78623 -> 78631 (0.01%) spills in affected programs: 66 -> 74 (12.12%) helped: 0 HURT: 2 total fills in shared programs: 80104 -> 80108 (<.01%) fills in affected programs: 133 -> 137 (3.01%) helped: 0 HURT: 2 No changes on Sandy Bridge, Iron Lake, or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	a3845616a2	nir: Remove f2i(i2f(x)) conversions Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277978 -> 14277620 (<.01%) instructions in affected programs: 36957 -> 36599 (-0.97%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4 helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36% 95% mean confidence interval for instructions value: -7.06 -2.24 95% mean confidence interval for instructions %-change: -1.28% -0.77% Instructions are helped. total cycles in shared programs: 532584581 -> 532580716 (<.01%) cycles in affected programs: 973591 -> 969726 (-0.40%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32 helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19% HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280 HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10% 95% mean confidence interval for cycles value: -386.98 286.59 95% mean confidence interval for cycles %-change: -1.41% -0.81% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8127 -> 8116 (-0.14%) spills in affected programs: 108 -> 97 (-10.19%) helped: 1 HURT: 0 total fills in shared programs: 11090 -> 11086 (-0.04%) fills in affected programs: 440 -> 436 (-0.91%) helped: 1 HURT: 1 Haswell total instructions in shared programs: 12979174 -> 12979054 (<.01%) instructions in affected programs: 9040 -> 8920 (-1.33%) helped: 14 HURT: 1 helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6 helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19% 95% mean confidence interval for instructions value: -13.58 -2.42 95% mean confidence interval for instructions %-change: -3.94% -1.01% Instructions are helped. total cycles in shared programs: 411227148 -> 411212880 (<.01%) cycles in affected programs: 630506 -> 616238 (-2.26%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38 helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17% 95% mean confidence interval for cycles value: -2544.28 641.88 95% mean confidence interval for cycles %-change: -6.89% -0.94% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78626 -> 78623 (<.01%) spills in affected programs: 42 -> 39 (-7.14%) helped: 1 HURT: 0 total fills in shared programs: 80111 -> 80104 (<.01%) fills in affected programs: 140 -> 133 (-5.00%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11684101 -> 11684030 (<.01%) instructions in affected programs: 3080 -> 3009 (-2.31%) helped: 4 HURT: 1 helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5 helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for instructions value: -45.59 17.19 95% mean confidence interval for instructions %-change: -9.38% -1.56% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 258407697 -> 258389653 (<.01%) cycles in affected programs: 328323 -> 310279 (-5.50%) helped: 5 HURT: 0 helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32 helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60% 95% mean confidence interval for cycles value: -11616.71 4399.11 95% mean confidence interval for cycles %-change: -16.56% -2.03% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4537 -> 4528 (-0.20%) spills in affected programs: 64 -> 55 (-14.06%) helped: 1 HURT: 0 total fills in shared programs: 4823 -> 4815 (-0.17%) fills in affected programs: 189 -> 181 (-4.23%) helped: 1 HURT: 1 Sandy Bridge total instructions in shared programs: 10488464 -> 10488449 (<.01%) instructions in affected programs: 272 -> 257 (-5.51%) helped: 3 HURT: 0 helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49% total cycles in shared programs: 150263359 -> 150263263 (<.01%) cycles in affected programs: 7978 -> 7882 (-1.20%) helped: 3 HURT: 0 helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32 helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	ea6c276436	nir: Mark the 0.0 < abs(a) transformation as imprecise Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if the source is NaN. No shader-db changes on any platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Jason Ekstrand	b3b170ade9	nir: Add a couple of iand/ior optimizations Spotted in a shader in Batman: Arkham City. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-24 20:39:43 -07:00
Jason Ekstrand	e4d346c86d	nir: Add a couple trivial abs optimizations Spotted in a shader in Batman: Arkham City. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-07-23 10:48:21 -07:00
Timothy Arceri	e105b0ca30	nir: add a couple of ior opts to nir_opt_algebraic One of these was seen in a Deus Ex shader. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-18 09:53:27 +10:00
Mathieu Bridon	0f7b18fa0d	python: Use the print function In Python 2, `print` was a statement, but it became a function in Python 3. Using print functions everywhere makes the script compatible with Python versions >= 2.6, including Python 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2018-07-06 10:04:22 -07:00
Mathieu Bridon	fe8a153648	python: Stabilize some script outputs In Python, dictionaries and sets are unordered, and as a result their is no guarantee that running this script twice will produce the same output. Using ordered dicts and explicitly sorting items makes the build more reproducible, and will make it possible to verify that we're not breaking anything when we move the build scripts to Python 3. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-07-05 12:52:12 +01:00
Eric Anholt	6a0db5f08f	nir: Add lowering for find_lsb. There is a fairly simple relation to turn this into ufind_msb. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-06 13:44:28 -07:00
Eric Anholt	d4c7c3c225	nir: Add lowering for ifind_msb to ufind_msb. ufind_msb is easily expressed in terms of clz, and we can reduce ifind_msb to that. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-06 13:44:28 -07:00
Eric Anholt	af88acf4c4	nir: Add lowering from ibitfield_extract/ubitfield_extract to shifts. V3D doesn't have opcodes for ibfe/ubfe, so we need to lower similarly to glsl/lower_instructions.cpp. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-06 13:44:28 -07:00
Eric Anholt	74618ccbca	nir: Add lowering for bitfieldInsert without using bfi. If you don't have HW to do bfi, then lowering bitfieldInsert to bfi makes things harder than keeping the "bits" argument around. This still uses bfm, but I've added the obvious lowering of bfm if you need it. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-06 13:44:28 -07:00
Ian Romanick	f00fcfb7a2	nir: Lower !f2b(x) to x == 0.0 Some trivial help now, but it also prevents ~40 regressions caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. All Gen4+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14369557 -> 14369555 (<.01%) instructions in affected programs: 442 -> 440 (-0.45%) helped: 2 HURT: 0 total cycles in shared programs: 532425772 -> 532425743 (<.01%) cycles in affected programs: 6086 -> 6057 (-0.48%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-06-01 10:14:53 -07:00
Ian Romanick	619c51722b	nir: Add some missing "optimization undo" patterns `d8d18516b0` and `03fb13f646` added some patterns to undo conversions like (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))) If further optimization cause some of the operands to either be the same or be constants, undoing the transformation can lead to further savings. I don't know why these patterns were not added in those patches. I did not check to see which specific patterns actually helped. I just added all of them for symmetry. This prevents some loop unrolling regressions Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 14369768 -> 14369557 (<.01%) instructions in affected programs: 44076 -> 43865 (-0.48%) helped: 141 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1 helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60% 95% mean confidence interval for instructions value: -1.67 -1.32 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 532430629 -> 532425772 (<.01%) cycles in affected programs: 1170832 -> 1165975 (-0.41%) helped: 101 HURT: 5 helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32 helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03% HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01% 95% mean confidence interval for cycles value: -53.64 -38.00 95% mean confidence interval for cycles %-change: -3.06% -2.20% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-06-01 10:13:16 -07:00
Samuel Pitoiset	70f9e2589e	nir: optimize iand(ieq(a, 0), ieq(b, 0)) to ieq(ior(a, b), 0) Totals from affected shaders: SGPRS: 80 -> 80 (0.00 %) VGPRS: 48 -> 48 (0.00 %) Code Size: 2120 -> 2096 (-1.13 %) bytes Max Waves: 16 -> 16 (0.00 %) Only two Rise of Tomb Raider shaders are affected on my side. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-05-31 10:57:16 +02:00
Timothy Arceri	e8b368ad1c	nir: add unsigned comparison simplifications This avoids loop unrolling regressions in Wolfenstein II on DXVK with an upcoming optimisation series from Samuel. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-30 22:48:37 +10:00
Alyssa Rosenzweig	5d85a0a55b	nir: Implement optional b2f->iand lowering This pass is required by the Midgard compiler; our instruction set uses NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction. Normally, this lowering pass would be implemented in a backend-specific algebraic pass, but this conflicts with the existing iand->b2f pass in nir_opt_algebraic.py, hanging the compiler. This patch thus makes the existing pass optional (default on -- all other backends should remain unaffected), adding an optional pass for lowering the opposite direction. v2: Defer lowering until late algebraic optimisations to allow optimising the b2f instruction itself. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-05-18 22:44:09 +02:00
Ian Romanick	2c643fd978	nir: Don't condition 'a-b < 0' -> 'a < b' on is_not_used_by_conditional Now that i965 recognizes that a-b generates the same conditions as 'a < b', there is no reason to condition this transformation on 'is not used by conditional.' Since this was the only user of the is_not_used_by_conditional function, delete it. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14400775 -> 14400595 (<.01%) instructions in affected programs: 36712 -> 36532 (-0.49%) helped: 182 HURT: 26 helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.15% max: 1.82% x̄: 0.70% x̃: 0.62% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 1.02% x̄: 0.82% x̃: 0.90% 95% mean confidence interval for instructions value: -0.97 -0.76 95% mean confidence interval for instructions %-change: -0.59% -0.43% Instructions are helped. total cycles in shared programs: 532929592 -> 532926345 (<.01%) cycles in affected programs: 478660 -> 475413 (-0.68%) helped: 187 HURT: 22 helped stats (abs) min: 2 max: 200 x̄: 20.99 x̃: 18 helped stats (rel) min: 0.23% max: 24.10% x̄: 1.48% x̃: 1.03% HURT stats (abs) min: 1 max: 214 x̄: 30.86 x̃: 11 HURT stats (rel) min: 0.01% max: 23.06% x̄: 3.12% x̃: 0.86% 95% mean confidence interval for cycles value: -19.50 -11.57 95% mean confidence interval for cycles %-change: -1.42% -0.58% Cycles are helped. GM45 and Iron Lake had similar results. (Iron Lake shown) total cycles in shared programs: 177851578 -> 177851810 (<.01%) cycles in affected programs: 24408 -> 24640 (0.95%) helped: 2 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.42% max: 0.47% x̄: 0.44% x̃: 0.44% HURT stats (abs) min: 24 max: 108 x̄: 60.00 x̃: 54 HURT stats (rel) min: 0.52% max: 1.62% x̄: 1.04% x̃: 1.02% 95% mean confidence interval for cycles value: -7.75 85.08 95% mean confidence interval for cycles %-change: -0.39% 1.49% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-26 08:50:43 -07:00
Ian Romanick	6aeaa7d363	nir: Don't compare b2f or b2i with zero All of the shaders that had loops changed were in Tomb Raider. The one shader that lost SIMD16 is one of those. Skylake total instructions in shared programs: 14391653 -> 14390468 (<.01%) instructions in affected programs: 111891 -> 110706 (-1.06%) helped: 501 HURT: 0 helped stats (abs) min: 1 max: 155 x̄: 2.37 x̃: 1 helped stats (rel) min: 0.05% max: 21.54% x̄: 1.61% x̃: 1.01% 95% mean confidence interval for instructions value: -3.23 -1.50 95% mean confidence interval for instructions %-change: -1.77% -1.45% Instructions are helped. total cycles in shared programs: 532793024 -> 532776598 (<.01%) cycles in affected programs: 987682 -> 971256 (-1.66%) helped: 348 nnHURT: 41 helped stats (abs) min: 1 max: 3074 x̄: 54.91 x̃: 18 helped stats (rel) min: 0.05% max: 32.24% x̄: 3.36% x̃: 1.68% HURT stats (abs) min: 1 max: 422 x̄: 65.39 x̃: 24 HURT stats (rel) min: 0.09% max: 39.29% x̄: 9.50% x̃: 2.02% 95% mean confidence interval for cycles value: -64.08 -20.38 95% mean confidence interval for cycles %-change: -2.78% -1.23% Cycles are helped. total loops in shared programs: 4854 -> 4829 (-0.52%) loops in affected programs: 27 -> 2 (-92.59%) helped: 18 HURT: 0 LOST: 1 GAINED: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-19 13:52:35 -07:00
Ian Romanick	6878c9aabc	nir: Don't i2b a value that is already Boolean A bunch of shaders have sequences like: i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0)))) Other optimizations (and NIR's typeless nature) reduce this to i2b(x == y) which is silly. Skylake total instructions in shared programs: 14498698 -> 14497948 (<.01%) instructions in affected programs: 74480 -> 73730 (-1.01%) helped: 277 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2 helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68% 95% mean confidence interval for instructions value: -3.35 -2.06 95% mean confidence interval for instructions %-change: -1.74% -1.16% Instructions are helped. total cycles in shared programs: 532015500 -> 531999238 (<.01%) cycles in affected programs: 5943878 -> 5927616 (-0.27%) helped: 251 HURT: 74 helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14 helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53% HURT stats (abs) min: 1 max: 4550 x̄: 214.04 x̃: 15 HURT stats (rel) min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33% 95% mean confidence interval for cycles value: -158.51 58.43 95% mean confidence interval for cycles %-change: -1.07% -0.04% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4753 -> 4735 (-0.38%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. Haswell and Broadwell had simliar results. (Broadwell shown) total instructions in shared programs: 14791877 -> 14791127 (<.01%) instructions in affected programs: 77326 -> 76576 (-0.97%) helped: 278 HURT: 1 helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49% 95% mean confidence interval for instructions value: -3.33 -2.05 95% mean confidence interval for instructions %-change: -1.70% -1.13% Instructions are helped. total cycles in shared programs: 558250067 -> 558252872 (<.01%) cycles in affected programs: 5806328 -> 5809133 (0.05%) helped: 235 HURT: 83 helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16 helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51% HURT stats (abs) min: 1 max: 10590 x̄: 265.19 x̃: 20 HURT stats (rel) min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54% 95% mean confidence interval for cycles value: -89.87 107.51 95% mean confidence interval for cycles %-change: -1.06% -0.32% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4735 -> 4717 (-0.38%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. total fills in shared programs: 83111 -> 83110 (<.01%) fills in affected programs: 28 -> 27 (-3.57%) helped: 1 HURT: 0 Ivy Bridge total instructions in shared programs: 11774173 -> 11773436 (<.01%) instructions in affected programs: 70819 -> 70082 (-1.04%) helped: 267 HURT: 0 helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2 helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63% 95% mean confidence interval for instructions value: -3.51 -2.01 95% mean confidence interval for instructions %-change: -1.94% -1.21% Instructions are helped. total cycles in shared programs: 257153833 -> 257148932 (<.01%) cycles in affected programs: 585341 -> 580440 (-0.84%) helped: 167 HURT: 100 helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16 helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88% HURT stats (abs) min: 1 max: 200 x̄: 25.95 x̃: 16 HURT stats (rel) min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65% 95% mean confidence interval for cycles value: -33.25 -3.46 95% mean confidence interval for cycles %-change: -1.47% -0.54% Cycles are helped. total loops in shared programs: 3416 -> 3398 (-0.53%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. LOST: 2 GAINED: 0 Sandy Bridge total instructions in shared programs: 10499306 -> 10499094 (<.01%) instructions in affected programs: 6051 -> 5839 (-3.50%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2 helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45% 95% mean confidence interval for instructions value: -7.66 -2.20 95% mean confidence interval for instructions %-change: -5.47% -3.12% Instructions are helped. total cycles in shared programs: 145862568 -> 145861370 (<.01%) cycles in affected programs: 61733 -> 60535 (-1.94%) helped: 36 HURT: 2 helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35 helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81% HURT stats (abs) min: 18 max: 102 x̄: 60.00 x̃: 60 HURT stats (rel) min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48% 95% mean confidence interval for cycles value: -41.28 -21.77 95% mean confidence interval for cycles %-change: -6.16% -3.00% Cycles are helped. total loops in shared programs: 1803 -> 1785 (-1.00%) loops in affected programs: 18 -> 0 helped: 18 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for loops value: -1.00 -1.00 95% mean confidence interval for loops %-change: -100.00% -100.00% Loops are helped. LOST: 4 GAINED: 0 No changes on Iron Lake of GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-08 15:26:26 -08:00
Ian Romanick	54e8d2268d	nir: Narrow some dot product operations On vector platforms, this helps elide some constant loads. v2: Reorder the transformations. No changes on Broadwell or Skylake. Haswell total instructions in shared programs: 13093793 -> 13060163 (-0.26%) instructions in affected programs: 1277532 -> 1243902 (-2.63%) helped: 13216 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.57 -2.49 95% mean confidence interval for instructions %-change: -3.65% -3.54% Instructions are helped. total cycles in shared programs: 409580819 -> 409268463 (-0.08%) cycles in affected programs: 71730652 -> 71418296 (-0.44%) helped: 9898 HURT: 2352 helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16 helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50% HURT stats (abs) min: 2 max: 276 x̄: 23.25 x̃: 6 HURT stats (rel) min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97% 95% mean confidence interval for cycles value: -33.19 -17.80 95% mean confidence interval for cycles %-change: -4.50% -4.26% Cycles are helped. total fills in shared programs: 82059 -> 82052 (<.01%) fills in affected programs: 21 -> 14 (-33.33%) helped: 7 HURT: 0 Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown) total instructions in shared programs: 11811851 -> 11780605 (-0.26%) instructions in affected programs: 1155007 -> 1123761 (-2.71%) helped: 12304 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.56 -2.48 95% mean confidence interval for instructions %-change: -3.71% -3.59% Instructions are helped. total cycles in shared programs: 257618409 -> 257316805 (-0.12%) cycles in affected programs: 71999580 -> 71697976 (-0.42%) helped: 9155 HURT: 2380 helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16 helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62% HURT stats (abs) min: 2 max: 290 x̄: 21.14 x̃: 4 HURT stats (rel) min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33% 95% mean confidence interval for cycles value: -34.32 -17.97 95% mean confidence interval for cycles %-change: -4.55% -4.29% Cycles are helped. GM45 and Iron Lake had nearly identical results (Iron Lake shown) total instructions in shared programs: 7886750 -> 7879944 (-0.09%) instructions in affected programs: 373781 -> 366975 (-1.82%) helped: 3715 HURT: 47 helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1 helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06% HURT stats (abs) min: 1 max: 6 x̄: 2.55 x̃: 2 HURT stats (rel) min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35% 95% mean confidence interval for instructions value: -1.85 -1.77 95% mean confidence interval for instructions %-change: -2.91% -2.73% Instructions are helped. total cycles in shared programs: 178114636 -> 178095452 (-0.01%) cycles in affected programs: 7227666 -> 7208482 (-0.27%) helped: 3349 HURT: 301 helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4 helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63% HURT stats (abs) min: 2 max: 42 x̄: 9.13 x̃: 10 HURT stats (rel) min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50% 95% mean confidence interval for cycles value: -5.52 -4.99 95% mean confidence interval for cycles %-change: -0.81% -0.73% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]	2018-03-08 15:26:26 -08:00
Ian Romanick	e3ea166a2c	nir: Simplify some comparisons like a+b < a All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14514555 -> 14514547 (<.01%) instructions in affected programs: 1972 -> 1964 (-0.41%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.39% max: 0.42% x̄: 0.41% x̃: 0.41% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.41% -0.40% Instructions are helped. total cycles in shared programs: 533141444 -> 533136780 (<.01%) cycles in affected programs: 164728 -> 160064 (-2.83%) helped: 181 HURT: 3 helped stats (abs) min: 2 max: 94 x̄: 26.17 x̃: 30 helped stats (rel) min: 0.12% max: 5.33% x̄: 3.42% x̃: 3.80% HURT stats (abs) min: 4 max: 54 x̄: 24.00 x̃: 14 HURT stats (rel) min: 0.20% max: 2.39% x̄: 1.09% x̃: 0.68% 95% mean confidence interval for cycles value: -27.12 -23.58 95% mean confidence interval for cycles %-change: -3.54% -3.16% Cycles are helped. Sandy Bridge total instructions in shared programs: 10533667 -> 10533539 (<.01%) instructions in affected programs: 10148 -> 10020 (-1.26%) helped: 124 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.39% max: 4.35% x̄: 2.20% x̃: 2.04% 95% mean confidence interval for instructions value: -1.06 -1.00 95% mean confidence interval for instructions %-change: -2.46% -1.95% Instructions are helped. total cycles in shared programs: 146136887 -> 146132122 (<.01%) cycles in affected programs: 206382 -> 201617 (-2.31%) helped: 171 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 27.87 x̃: 30 helped stats (rel) min: 0.08% max: 5.73% x̄: 2.98% x̃: 2.67% 95% mean confidence interval for cycles value: -29.19 -26.54 95% mean confidence interval for cycles %-change: -3.20% -2.76% Cycles are helped. Iron Lake total instructions in shared programs: 7886515 -> 7886507 (<.01%) instructions in affected programs: 3016 -> 3008 (-0.27%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.25% max: 0.28% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.27% -0.26% Instructions are helped. total cycles in shared programs: 178100396 -> 178100388 (<.01%) cycles in affected programs: 156128 -> 156120 (<.01%) helped: 4 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -3.68 1.68 95% mean confidence interval for cycles %-change: -0.03% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857872 -> 4857868 (<.01%) instructions in affected programs: 1544 -> 1540 (-0.26%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.25% max: 0.27% x̄: 0.26% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.28% -0.24% Instructions are helped. total cycles in shared programs: 122167654 -> 122167662 (<.01%) cycles in affected programs: 96248 -> 96256 (<.01%) helped: 0 HURT: 4 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: <.01% 0.02% Cycles are HURT. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-06 11:17:30 -08:00
Ian Romanick	d1ed4ffe0b	nir: Use De Morgan's Law on logic compounded comparisons The replacement of the comparison operators must happen during this step. If it does not, the next pass of nir_opt_algebraic will reapply De Morgan's Law in the "opposite direction" before performing dead code elimination. The resulting infinite loop will eventually get OOM killed. Haswell, Broadwell, and Skylake had similar results. (Broadwell shown) total instructions in shared programs: 14808185 -> 14808036 (<.01%) instructions in affected programs: 13758 -> 13609 (-1.08%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 3.82 x̃: 3 helped stats (rel) min: 0.44% max: 1.55% x̄: 0.98% x̃: 1.01% 95% mean confidence interval for instructions value: -4.67 -2.97 95% mean confidence interval for instructions %-change: -1.09% -0.88% Instructions are helped. total cycles in shared programs: 559438333 -> 559435832 (<.01%) cycles in affected programs: 199160 -> 196659 (-1.26%) helped: 42 HURT: 3 helped stats (abs) min: 2 max: 184 x̄: 61.50 x̃: 51 helped stats (rel) min: 0.02% max: 6.94% x̄: 1.41% x̃: 1.40% HURT stats (abs) min: 2 max: 40 x̄: 27.33 x̃: 40 HURT stats (rel) min: 0.05% max: 0.74% x̄: 0.51% x̃: 0.74% 95% mean confidence interval for cycles value: -71.47 -39.69 95% mean confidence interval for cycles %-change: -1.64% -0.93% Cycles are helped. Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11811776 -> 11811553 (<.01%) instructions in affected programs: 15201 -> 14978 (-1.47%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 5.72 x̃: 6 helped stats (rel) min: 0.44% max: 2.53% x̄: 1.30% x̃: 1.26% 95% mean confidence interval for instructions value: -7.21 -4.23 95% mean confidence interval for instructions %-change: -1.48% -1.12% Instructions are helped. total cycles in shared programs: 257617270 -> 257614589 (<.01%) cycles in affected programs: 212107 -> 209426 (-1.26%) helped: 45 HURT: 0 helped stats (abs) min: 2 max: 180 x̄: 59.58 x̃: 54 helped stats (rel) min: 0.02% max: 6.02% x̄: 1.30% x̃: 1.32% 95% mean confidence interval for cycles value: -74.02 -45.14 95% mean confidence interval for cycles %-change: -1.59% -1.01% Cycles are helped. Iron Lake total instructions in shared programs: 7886648 -> 7886515 (<.01%) instructions in affected programs: 14106 -> 13973 (-0.94%) helped: 29 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.59 x̃: 4 helped stats (rel) min: 0.35% max: 1.83% x̄: 0.90% x̃: 0.81% 95% mean confidence interval for instructions value: -5.65 -3.52 95% mean confidence interval for instructions %-change: -1.03% -0.76% Instructions are helped. total cycles in shared programs: 178100812 -> 178100396 (<.01%) cycles in affected programs: 67970 -> 67554 (-0.61%) helped: 29 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 14.34 x̃: 12 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.58% x̃: 0.54% 95% mean confidence interval for cycles value: -18.30 -10.39 95% mean confidence interval for cycles %-change: -0.71% -0.45% Cycles are helped. GM45 total instructions in shared programs: 4857939 -> 4857872 (<.01%) instructions in affected programs: 7426 -> 7359 (-0.90%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.47 x̃: 4 helped stats (rel) min: 0.33% max: 1.80% x̄: 0.87% x̃: 0.77% 95% mean confidence interval for instructions value: -6.06 -2.87 95% mean confidence interval for instructions %-change: -1.06% -0.67% Instructions are helped. total cycles in shared programs: 122167930 -> 122167654 (<.01%) cycles in affected programs: 43118 -> 42842 (-0.64%) helped: 15 HURT: 0 helped stats (abs) min: 4 max: 40 x̄: 18.40 x̃: 16 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.62% x̃: 0.54% 95% mean confidence interval for cycles value: -25.03 -11.77 95% mean confidence interval for cycles %-change: -0.82% -0.41% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-06 11:17:29 -08:00
Ian Romanick	52607658ff	nir: Replace fmin(b2f(a), b) with a bcsel All of the affected shaders are HDR mappers from Serious Sam 3. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14516285 -> 14516273 (<.01%) instructions in affected programs: 348 -> 336 (-3.45%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 2.08% max: 6.67% x̄: 4.31% x̃: 4.17% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -5.55% -3.06% Instructions are helped. total cycles in shared programs: 533163876 -> 533163808 (<.01%) cycles in affected programs: 1144 -> 1076 (-5.94%) helped: 4 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 5.80% max: 6.08% x̄: 5.94% x̃: 5.94% 95% mean confidence interval for cycles value: -18.84 -15.16 95% mean confidence interval for cycles %-change: -6.20% -5.68% Cycles are helped. Sandy Bridge total instructions in shared programs: 10533321 -> 10533309 (<.01%) instructions in affected programs: 372 -> 360 (-3.23%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 2.00% max: 5.88% x̄: 3.91% x̃: 3.85% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -4.96% -2.86% Instructions are helped. total cycles in shared programs: 146136632 -> 146136428 (<.01%) cycles in affected programs: 11668 -> 11464 (-1.75%) helped: 12 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 0.99% max: 3.44% x̄: 2.20% x̃: 2.29% 95% mean confidence interval for cycles value: -17.66 -16.34 95% mean confidence interval for cycles %-change: -2.82% -1.58% Cycles are helped. Iron Lake total instructions in shared programs: 7886301 -> 7886277 (<.01%) instructions in affected programs: 576 -> 552 (-4.17%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 6.06% x̄: 4.51% x̃: 4.65% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -5.30% -3.72% Instructions are helped. total cycles in shared programs: 178113176 -> 178113176 (0.00%) cycles in affected programs: 2116 -> 2116 (0.00%) helped: 2 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.14% max: 1.14% x̄: 1.14% x̃: 1.14% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.50% max: 0.65% x̄: 0.58% x̃: 0.58% 95% mean confidence interval for cycles value: -3.25 3.25 95% mean confidence interval for cycles %-change: -0.93% 0.94% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857756 -> 4857744 (<.01%) instructions in affected programs: 294 -> 282 (-4.08%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 5.71% x̄: 4.40% x̃: 4.55% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -5.71% -3.09% Instructions are helped. total cycles in shared programs: 122178730 -> 122178722 (<.01%) cycles in affected programs: 700 -> 692 (-1.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-06 11:17:29 -08:00
Ian Romanick	b974dfee11	nir: Pull b2f out of bcsel All platforms had similar results. (Skylake shown) total instructions in shared programs: 14516592 -> 14516586 (<.01%) instructions in affected programs: 500 -> 494 (-1.20%) helped: 2 HURT: 0 total cycles in shared programs: 533167044 -> 533166998 (<.01%) cycles in affected programs: 6988 -> 6942 (-0.66%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-06 11:17:29 -08:00
Ian Romanick	f50400cc80	nir: Replace an odd comparison involving fmin of -b2f I noticed the fge version while looking at a shader for an unrelated reason. The feq version prevents a regression in a later change that performs strength reduction of some compares. Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14514808 -> 14514796 (<.01%) instructions in affected programs: 750 -> 738 (-1.60%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.83% max: 1.96% x̄: 1.40% x̃: 1.40% 95% mean confidence interval for instructions value: -6.67 0.67 95% mean confidence interval for instructions %-change: -2.43% -0.36% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533144939 -> 533144853 (<.01%) cycles in affected programs: 8911 -> 8825 (-0.97%) helped: 4 HURT: 0 helped stats (abs) min: 16 max: 32 x̄: 21.50 x̃: 19 helped stats (rel) min: 0.60% max: 1.89% x̄: 1.28% x̃: 1.31% 95% mean confidence interval for cycles value: -32.94 -10.06 95% mean confidence interval for cycles %-change: -2.30% -0.26% Cycles are helped. Haswell total instructions in shared programs: 13093785 -> 13093775 (<.01%) instructions in affected programs: 924 -> 914 (-1.08%) helped: 4 HURT: 2 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.82% max: 1.95% x̄: 1.39% x̃: 1.39% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19% 95% mean confidence interval for instructions value: -4.53 1.20 95% mean confidence interval for instructions %-change: -2.02% 0.97% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 409580553 -> 409580118 (<.01%) cycles in affected programs: 10909 -> 10474 (-3.99%) helped: 5 HURT: 1 helped stats (abs) min: 6 max: 222 x̄: 89.60 x̃: 18 helped stats (rel) min: 0.16% max: 24.72% x̄: 9.54% x̃: 1.78% HURT stats (abs) min: 13 max: 13 x̄: 13.00 x̃: 13 HURT stats (rel) min: 0.39% max: 0.39% x̄: 0.39% x̃: 0.39% 95% mean confidence interval for cycles value: -180.68 35.68 95% mean confidence interval for cycles %-change: -19.55% 3.79% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 11811851 -> 11811840 (<.01%) instructions in affected programs: 1032 -> 1021 (-1.07%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 1 helped stats (rel) min: 0.63% max: 1.95% x̄: 1.13% x̃: 0.97% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19% 95% mean confidence interval for instructions value: -4.17 0.51 95% mean confidence interval for instructions %-change: -1.86% 0.36% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 257618403 -> 257618168 (<.01%) cycles in affected programs: 10784 -> 10549 (-2.18%) helped: 4 HURT: 2 helped stats (abs) min: 4 max: 220 x̄: 64.50 x̃: 17 helped stats (rel) min: 0.50% max: 24.34% x̄: 7.07% x̃: 1.72% HURT stats (abs) min: 9 max: 14 x̄: 11.50 x̃: 11 HURT stats (rel) min: 0.24% max: 0.42% x̄: 0.33% x̃: 0.33% 95% mean confidence interval for cycles value: -133.11 54.78 95% mean confidence interval for cycles %-change: -14.79% 5.59% Inconclusive result (value mean confidence interval includes 0). GM45, Iron Lake, and Sandy Bridge had similar results. (Sandy Bridge shown) total instructions in shared programs: 10533871 -> 10533859 (<.01%) instructions in affected programs: 865 -> 853 (-1.39%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.63% max: 1.83% x̄: 1.22% x̃: 1.21% 95% mean confidence interval for instructions value: -6.67 0.67 95% mean confidence interval for instructions %-change: -2.16% -0.29% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 146139904 -> 146139852 (<.01%) cycles in affected programs: 15213 -> 15161 (-0.34%) helped: 4 HURT: 0 helped stats (abs) min: 3 max: 18 x̄: 13.00 x̃: 15 helped stats (rel) min: 0.15% max: 0.84% x̄: 0.39% x̃: 0.29% 95% mean confidence interval for cycles value: -23.79 -2.21 95% mean confidence interval for cycles %-change: -0.88% 0.09% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-03-06 11:17:29 -08:00
Ian Romanick	380136e998	nir: Mark bcsel-to-fmin (or fmax) transformations as inexact These transformations are inexact because section 4.7.1 (Range and Precision) says: Operations and built-in functions that operate on a NaN are not required to return a NaN as the result. The fmin or fmax might not return NaN in cases where the original expression would be required to return NaN. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-06 11:17:14 -08:00
Ian Romanick	4addd34b04	nir: Recognize some more open-coded fmin / fmax This transformation is inexact because section 4.7.1 (Range and Precision) says: Operations and built-in functions that operate on a NaN are not required to return a NaN as the result. The fmin or fmax might not return NaN in cases where the original expression would be required to return NaN. v2: Reorder operands and mark as inexact. The latter suggested by Jason. shader-db results: Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14514817 -> 14514808 (<.01%) instructions in affected programs: 229 -> 220 (-3.93%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4 helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12% total cycles in shared programs: 533145211 -> 533144939 (<.01%) cycles in affected programs: 37268 -> 36996 (-0.73%) helped: 8 HURT: 0 helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2 helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05% Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown) total cycles in shared programs: 257618409 -> 257618403 (<.01%) cycles in affected programs: 12582 -> 12576 (-0.05%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-03-06 11:17:14 -08:00
Timothy Arceri	a050ea60ee	nir: add lower_ldexp to nir compiler options Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-28 09:23:49 +11:00
Samuel Pitoiset	63fb30c674	nir: lower fexp2(fmul(flog2(a), 2)) to fmul(a, a) Similar for the 4 case. Suggested by Bas. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-22 20:40:45 +01:00
Samuel Pitoiset	b18997876f	nir: add is_used_once for fmul(fexp2(a), fexp2(b)) to fexp2(fadd(a, b)) Otherwise the code size increases because the original fexp2() instructions can't be deleted. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-22 20:40:43 +01:00
Ian Romanick	ee63933a73	nir: Distribute binary operations with constants into bcsel This was specifically designed to simplify 1+mix(0, a-1, condition) to mix(1, a, condition) by pushing the 1+ inside. Skylake, Broadwell, and Haswell had similar results. Skylake shown. total instructions in shared programs: 14521753 -> 14521716 (<.01%) instructions in affected programs: 10619 -> 10582 (-0.35%) helped: 51 HURT: 14 helped stats (abs) min: 1 max: 12 x̄: 1.43 x̃: 1 helped stats (rel) min: 0.20% max: 3.58% x̄: 1.01% x̃: 0.95% HURT stats (abs) min: 1 max: 11 x̄: 2.57 x̃: 1 HURT stats (rel) min: 0.22% max: 1.75% x̄: 1.20% x̃: 1.32% 95% mean confidence interval for instructions value: -1.31 0.17 95% mean confidence interval for instructions %-change: -0.80% -0.27% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533000205 -> 533003533 (<.01%) cycles in affected programs: 110610 -> 113938 (3.01%) helped: 43 HURT: 28 helped stats (abs) min: 6 max: 440 x̄: 27.12 x̃: 16 helped stats (rel) min: 0.39% max: 4.84% x̄: 1.60% x̃: 1.67% HURT stats (abs) min: 2 max: 3066 x̄: 160.50 x̃: 14 HURT stats (rel) min: 0.08% max: 77.78% x̄: 5.16% x̃: 0.62% 95% mean confidence interval for cycles value: -43.81 137.56 95% mean confidence interval for cycles %-change: -1.47% 3.60% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 10018840 -> 10018713 (<.01%) instructions in affected programs: 9431 -> 9304 (-1.35%) helped: 51 HURT: 3 helped stats (abs) min: 1 max: 80 x̄: 2.76 x̃: 1 helped stats (rel) min: 0.20% max: 16.43% x̄: 1.16% x̃: 0.81% HURT stats (abs) min: 1 max: 12 x̄: 4.67 x̃: 1 HURT stats (rel) min: 0.22% max: 1.33% x̄: 0.59% x̃: 0.22% 95% mean confidence interval for instructions value: -5.36 0.66 95% mean confidence interval for instructions %-change: -1.66% -0.46% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 87571944 -> 87572785 (<.01%) cycles in affected programs: 117234 -> 118075 (0.72%) helped: 42 HURT: 23 helped stats (abs) min: 2 max: 114 x̄: 51.90 x̃: 30 helped stats (rel) min: 0.11% max: 11.01% x̄: 4.45% x̃: 2.74% HURT stats (abs) min: 1 max: 2341 x̄: 131.35 x̃: 10 HURT stats (rel) min: 0.06% max: 37.11% x̄: 2.75% x̃: 0.61% 95% mean confidence interval for cycles value: -61.05 86.93 95% mean confidence interval for cycles %-change: -3.47% -0.33% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10542933 -> 10542844 (<.01%) instructions in affected programs: 11487 -> 11398 (-0.77%) helped: 52 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 1.96 x̃: 1 helped stats (rel) min: 0.08% max: 8.16% x̄: 0.90% x̃: 0.72% HURT stats (abs) min: 1 max: 11 x̄: 4.33 x̃: 1 HURT stats (rel) min: 0.22% max: 1.22% x̄: 0.55% x̃: 0.22% 95% mean confidence interval for instructions value: -3.17 -0.07 95% mean confidence interval for instructions %-change: -1.13% -0.52% Instructions are helped. total cycles in shared programs: 146098397 -> 146097094 (<.01%) cycles in affected programs: 128140 -> 126837 (-1.02%) helped: 47 HURT: 8 helped stats (abs) min: 2 max: 333 x̄: 29.21 x̃: 18 helped stats (rel) min: 0.13% max: 5.04% x̄: 1.18% x̃: 0.95% HURT stats (abs) min: 1 max: 16 x̄: 8.75 x̃: 9 HURT stats (rel) min: 0.08% max: 0.43% x̄: 0.30% x̃: 0.34% 95% mean confidence interval for cycles value: -37.49 -9.90 95% mean confidence interval for cycles %-change: -1.22% -0.71% Cycles are helped. Iron Lake total instructions in shared programs: 7886711 -> 7886509 (<.01%) instructions in affected programs: 10425 -> 10223 (-1.94%) helped: 50 HURT: 2 helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1 helped stats (rel) min: 0.34% max: 15.38% x̄: 1.12% x̃: 0.54% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.86% max: 0.91% x̄: 0.89% x̃: 0.89% 95% mean confidence interval for instructions value: -8.05 0.28 95% mean confidence interval for instructions %-change: -1.83% -0.26% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 178115324 -> 178114612 (<.01%) cycles in affected programs: 765726 -> 765014 (-0.09%) helped: 39 HURT: 1 helped stats (abs) min: 2 max: 276 x̄: 18.31 x̃: 8 helped stats (rel) min: <.01% max: 8.47% x̄: 0.39% x̃: 0.04% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -32.07 -3.53 95% mean confidence interval for cycles %-change: -0.86% 0.10% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 4857762 -> 4857661 (<.01%) instructions in affected programs: 5523 -> 5422 (-1.83%) helped: 25 HURT: 1 helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1 helped stats (rel) min: 0.34% max: 13.61% x̄: 1.04% x̃: 0.52% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.86% max: 0.86% x̄: 0.86% x̃: 0.86% 95% mean confidence interval for instructions value: -9.99 2.22 95% mean confidence interval for instructions %-change: -2.01% 0.08% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 122179674 -> 122179194 (<.01%) cycles in affected programs: 530162 -> 529682 (-0.09%) helped: 22 HURT: 1 helped stats (abs) min: 2 max: 292 x̄: 21.91 x̃: 7 helped stats (rel) min: <.01% max: 8.65% x̄: 0.44% x̃: 0.04% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -46.56 4.82 95% mean confidence interval for cycles %-change: -1.20% 0.36% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:15 -08:00
Ian Romanick	03fb13f646	nir: Rearrange logic op-compounded integer compares Skylake and Broadwell had similar results. Skylake shown. total instructions in shared programs: 14521769 -> 14521753 (<.01%) instructions in affected programs: 8782 -> 8766 (-0.18%) helped: 16 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.12% max: 0.40% x̄: 0.20% x̃: 0.18% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.23% -0.16% Instructions are helped. total cycles in shared programs: 533000376 -> 533000205 (<.01%) cycles in affected programs: 447035 -> 446864 (-0.04%) helped: 9 HURT: 9 helped stats (abs) min: 2 max: 40 x̄: 35.78 x̃: 40 helped stats (rel) min: 0.02% max: 0.18% x̄: 0.10% x̃: 0.09% HURT stats (abs) min: 1 max: 52 x̄: 16.78 x̃: 10 HURT stats (rel) min: <.01% max: 1.11% x̄: 0.29% x̃: 0.12% 95% mean confidence interval for cycles value: -25.07 6.07 95% mean confidence interval for cycles %-change: -0.08% 0.27% Inconclusive result (value mean confidence interval includes 0). No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Ian Romanick	053be9f020	nir: Rearrange and-compounded float compares If both comparisons are used as sources for instructions other than the iand, this transformation is detrimental. If the non-identical value in both compares is constant, the fmin or fmax will be constant-folded away, so the transformation is always a win. It is interesting to me that on Iron Lake only 81 shaders have instruction counts changed, but 726 shaders have cycle counts changed. shader-db results: Skylake total instructions in shared programs: 14525728 -> 14521017 (-0.03%) instructions in affected programs: 1164726 -> 1160015 (-0.40%) helped: 1692 HURT: 5 helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2 helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33% HURT stats (abs) min: 1 max: 12 x̄: 3.20 x̃: 1 HURT stats (rel) min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86% 95% mean confidence interval for instructions value: -3.52 -2.03 95% mean confidence interval for instructions %-change: -0.86% -0.74% Instructions are helped. total cycles in shared programs: 533115449 -> 532991404 (-0.02%) cycles in affected programs: 119401803 -> 119277758 (-0.10%) helped: 1145 HURT: 467 helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18 helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42% HURT stats (abs) min: 1 max: 1590 x̄: 92.15 x̃: 15 HURT stats (rel) min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39% 95% mean confidence interval for cycles value: -122.16 -31.74 95% mean confidence interval for cycles %-change: -0.94% -0.57% Cycles are helped. total spills in shared programs: 9597 -> 9534 (-0.66%) spills in affected programs: 403 -> 340 (-15.63%) helped: 1 HURT: 1 total fills in shared programs: 13904 -> 13790 (-0.82%) fills in affected programs: 1627 -> 1513 (-7.01%) helped: 2 HURT: 1 LOST: 0 GAINED: 2 Broadwell total instructions in shared programs: 14816966 -> 14812590 (-0.03%) instructions in affected programs: 1499885 -> 1495509 (-0.29%) helped: 1672 HURT: 15 helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33% HURT stats (abs) min: 1 max: 21 x̄: 9.20 x̃: 8 HURT stats (rel) min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53% 95% mean confidence interval for instructions value: -3.14 -2.05 95% mean confidence interval for instructions %-change: -0.85% -0.73% Instructions are helped. total cycles in shared programs: 559353622 -> 559345595 (<.01%) cycles in affected programs: 139893703 -> 139885676 (<.01%) helped: 921 HURT: 697 helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18 helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87% HURT stats (abs) min: 1 max: 2370 x̄: 178.03 x̃: 38 HURT stats (rel) min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14% 95% mean confidence interval for cycles value: -59.64 49.72 95% mean confidence interval for cycles %-change: -1.02% -0.66% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78902 -> 78861 (-0.05%) spills in affected programs: 2418 -> 2377 (-1.70%) helped: 1 HURT: 11 total fills in shared programs: 83782 -> 83678 (-0.12%) fills in affected programs: 3515 -> 3411 (-2.96%) helped: 2 HURT: 11 LOST: 0 GAINED: 5 Haswell and Ivy Bridge had similar results. Haswell shown. total instructions in shared programs: 9033898 -> 9032010 (-0.02%) instructions in affected programs: 308064 -> 306176 (-0.61%) helped: 921 HURT: 4 helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1 helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23% 95% mean confidence interval for instructions value: -2.21 -1.87 95% mean confidence interval for instructions %-change: -0.88% -0.68% Instructions are helped. total cycles in shared programs: 84628949 -> 84620520 (<.01%) cycles in affected programs: 2164913 -> 2156484 (-0.39%) helped: 518 HURT: 359 helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20 helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01% HURT stats (abs) min: 1 max: 586 x̄: 36.43 x̃: 8 HURT stats (rel) min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40% 95% mean confidence interval for cycles value: -15.17 -4.05 95% mean confidence interval for cycles %-change: -0.77% -0.32% Cycles are helped. LOST: 0 GAINED: 4 Sandy Bridge total instructions in shared programs: 10544860 -> 10542933 (-0.02%) instructions in affected programs: 360019 -> 358092 (-0.54%) helped: 931 HURT: 4 helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1 helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33% 95% mean confidence interval for instructions value: -2.23 -1.89 95% mean confidence interval for instructions %-change: -0.76% -0.58% Instructions are helped. total cycles in shared programs: 146106820 -> 146098397 (<.01%) cycles in affected programs: 3435047 -> 3426624 (-0.25%) helped: 572 HURT: 329 helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15 helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33% HURT stats (abs) min: 1 max: 1714 x̄: 30.93 x̃: 6 HURT stats (rel) min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19% 95% mean confidence interval for cycles value: -16.85 -1.85 95% mean confidence interval for cycles %-change: -0.39% -0.01% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 7886925 -> 7886711 (<.01%) instructions in affected programs: 25763 -> 25549 (-0.83%) helped: 75 HURT: 6 helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1 helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53% HURT stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 1 HURT stats (rel) min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86% 95% mean confidence interval for instructions value: -3.69 -1.60 95% mean confidence interval for instructions %-change: -2.54% -0.57% Instructions are helped. total cycles in shared programs: 178116888 -> 178115324 (<.01%) cycles in affected programs: 5858790 -> 5857226 (-0.03%) helped: 484 HURT: 242 helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6 helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06% HURT stats (abs) min: 2 max: 76 x̄: 4.07 x̃: 2 HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03% 95% mean confidence interval for cycles value: -2.76 -1.55 95% mean confidence interval for cycles %-change: -0.12% 0.01% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 4857870 -> 4857762 (<.01%) instructions in affected programs: 13994 -> 13886 (-0.77%) helped: 39 HURT: 5 helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48% HURT stats (abs) min: 1 max: 16 x̄: 4.00 x̃: 1 HURT stats (rel) min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86% 95% mean confidence interval for instructions value: -3.86 -1.05 95% mean confidence interval for instructions %-change: -2.61% 0.04% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 122180744 -> 122179674 (<.01%) cycles in affected programs: 3686646 -> 3685576 (-0.03%) helped: 273 HURT: 141 helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6 helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06% HURT stats (abs) min: 2 max: 76 x̄: 3.66 x̃: 2 HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02% 95% mean confidence interval for cycles value: -3.42 -1.75 95% mean confidence interval for cycles %-change: -0.15% 0.03% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Ian Romanick	821e7a4d32	nir: Separate a weird compare with zero to two compares with zero min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0). No shader-db changes, but it does prevent 6 to 12 instruction regressions in the next patch on all measured Intel platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Ian Romanick	68420d8322	nir: Simplify min and max of b2f v2: Rebase on almost 2 years. Require that one of the arguments to fmin or fmax be used only once. This prevents some regressions. shader-db results: Skylake and Broadwell had similar results. Skylake shown. total instructions in shared programs: 14526021 -> 14525913 (<.01%) instructions in affected programs: 4613 -> 4505 (-2.34%) helped: 31 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4 helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42% total cycles in shared programs: 533118710 -> 533118403 (<.01%) cycles in affected programs: 34334 -> 34027 (-0.89%) helped: 24 HURT: 0 helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14 helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03% No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Ian Romanick	d8d18516b0	nir: Undo possible damage caused by rearranging or-compounded float compares shader-db results: Skylake and Broadwell had similar results (Skylake shown) total instructions in shared programs: 14525898 -> 14525836 (<.01%) instructions in affected programs: 1964 -> 1902 (-3.16%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1 helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86% 95% mean confidence interval for instructions value: -9.46 0.60 95% mean confidence interval for instructions %-change: -3.97% -0.24% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533119892 -> 533115756 (<.01%) cycles in affected programs: 96061 -> 91925 (-4.31%) helped: 13 HURT: 1 helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300 helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46% 95% mean confidence interval for cycles value: -379.43 -211.43 95% mean confidence interval for cycles %-change: -4.84% -3.01% Cycles are helped. Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown). total instructions in shared programs: 9033948 -> 9033898 (<.01%) instructions in affected programs: 535 -> 485 (-9.35%) helped: 2 HURT: 0 total cycles in shared programs: 84631402 -> 84628949 (<.01%) cycles in affected programs: 63197 -> 60744 (-3.88%) helped: 13 HURT: 2 helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140 helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01% HURT stats (abs) min: 4 max: 8 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for cycles value: -253.40 -73.67 95% mean confidence interval for cycles %-change: -4.24% -2.25% Cycles are helped. No changes on GM45 or Iron Lake. v2: Add a couple more tautological compares. Suggested by Elie. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Ian Romanick	3941cba0f7	nir: Be more conservative about rearranging or-compounded compares If both comparisons are used as sources for instructions other than the ior, this transformation is detrimental. If the non-identical value in both compares is constant, the fmin or fmax will be constant-folded away, so the transformation is always a win. shader-db results: Skylake total instructions in shared programs: 14526147 -> 14525898 (<.01%) instructions in affected programs: 70239 -> 69990 (-0.35%) helped: 102 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1 helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20% 95% mean confidence interval for instructions value: -2.86 -2.02 95% mean confidence interval for instructions %-change: -0.46% -0.31% Instructions are helped. total cycles in shared programs: 533120531 -> 533119892 (<.01%) cycles in affected programs: 994875 -> 994236 (-0.06%) helped: 76 HURT: 26 helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13 helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18% HURT stats (abs) min: 1 max: 167 x̄: 54.62 x̃: 26 HURT stats (rel) min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39% 95% mean confidence interval for cycles value: -19.44 6.91 95% mean confidence interval for cycles %-change: -0.30% 0.15% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 14816005 -> 14815787 (<.01%) instructions in affected programs: 64658 -> 64440 (-0.34%) helped: 97 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1 helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20% 95% mean confidence interval for instructions value: -2.62 -1.87 95% mean confidence interval for instructions %-change: -0.45% -0.30% Instructions are helped. total cycles in shared programs: 559340386 -> 559339907 (<.01%) cycles in affected programs: 1090491 -> 1090012 (-0.04%) helped: 66 HURT: 28 helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16 helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27% HURT stats (abs) min: 2 max: 226 x̄: 39.07 x̃: 11 HURT stats (rel) min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20% 95% mean confidence interval for cycles value: -15.94 5.75 95% mean confidence interval for cycles %-change: -0.35% 0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 1 Haswell total instructions in shared programs: 9034106 -> 9033948 (<.01%) instructions in affected programs: 24096 -> 23938 (-0.66%) helped: 38 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4 helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64% 95% mean confidence interval for instructions value: -4.71 -3.60 95% mean confidence interval for instructions %-change: -0.84% -0.58% Instructions are helped. total cycles in shared programs: 84631628 -> 84631402 (<.01%) cycles in affected programs: 148674 -> 148448 (-0.15%) helped: 14 HURT: 14 helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12 helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21% HURT stats (abs) min: 1 max: 10 x̄: 6.00 x̃: 5 HURT stats (rel) min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11% 95% mean confidence interval for cycles value: -19.42 3.28 95% mean confidence interval for cycles %-change: -0.59% 0.05% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 10015456 -> 10015293 (<.01%) instructions in affected programs: 27701 -> 27538 (-0.59%) helped: 38 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4 helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52% 95% mean confidence interval for instructions value: -4.87 -3.71 95% mean confidence interval for instructions %-change: -0.82% -0.51% Instructions are helped. total cycles in shared programs: 87524771 -> 87524569 (<.01%) cycles in affected programs: 112324 -> 112122 (-0.18%) helped: 6 HURT: 12 helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20 helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26% HURT stats (abs) min: 1 max: 16 x̄: 5.50 x̃: 5 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -29.14 6.69 95% mean confidence interval for cycles %-change: -0.93% 0.08% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Sandy Bridge total instructions in shared programs: 10545655 -> 10545465 (<.01%) instructions in affected programs: 37198 -> 37008 (-0.51%) helped: 42 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4 helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49% 95% mean confidence interval for instructions value: -5.14 -3.91 95% mean confidence interval for instructions %-change: -0.68% -0.47% Instructions are helped. total cycles in shared programs: 146113059 -> 146112427 (<.01%) cycles in affected programs: 423514 -> 422882 (-0.15%) helped: 32 HURT: 10 helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12 helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11% HURT stats (abs) min: 12 max: 19 x̄: 14.70 x̃: 14 HURT stats (rel) min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14% 95% mean confidence interval for cycles value: -26.03 -4.07 95% mean confidence interval for cycles %-change: -0.43% -0.05% Cycles are helped. Iron Lake total instructions in shared programs: 7886959 -> 7886925 (<.01%) instructions in affected programs: 1340 -> 1306 (-2.54%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8 helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43% 95% mean confidence interval for instructions value: -20.44 3.44 95% mean confidence interval for instructions %-change: -5.78% 0.89% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 178116996 -> 178116888 (<.01%) cycles in affected programs: 6262 -> 6154 (-1.72%) helped: 2 HURT: 2 helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61 helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62% HURT stats (abs) min: 6 max: 8 x̄: 7.00 x̃: 7 HURT stats (rel) min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -93.27 39.27 95% mean confidence interval for cycles %-change: -5.38% 2.27% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857887 -> 4857870 (<.01%) instructions in affected programs: 674 -> 657 (-2.52%) helped: 2 HURT: 0 total cycles in shared programs: 122180816 -> 122180744 (<.01%) cycles in affected programs: 3764 -> 3692 (-1.91%) helped: 1 HURT: 1 helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78 helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Ian Romanick	cfc0d34802	nir: See through an fneg to apply existing optimizations Doing the same for the existing feq and fne transformations didn't help anything in shader-db. shader-db results: Broadwell and Skylake (Skylake shown) total instructions in shared programs: 14529463 -> 14526147 (-0.02%) instructions in affected programs: 402420 -> 399104 (-0.82%) helped: 2136 HURT: 131 helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1 helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57% 95% mean confidence interval for instructions value: -1.51 -1.41 95% mean confidence interval for instructions %-change: -3.06% -2.78% Instructions are helped. total cycles in shared programs: 533146915 -> 533120531 (<.01%) cycles in affected programs: 10356261 -> 10329877 (-0.25%) helped: 1933 HURT: 844 helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16 helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88% HURT stats (abs) min: 1 max: 423 x̄: 36.17 x̃: 12 HURT stats (rel) min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59% 95% mean confidence interval for cycles value: -11.78 -7.22 95% mean confidence interval for cycles %-change: -1.98% -1.65% Cycles are helped. Haswell total instructions in shared programs: 9037416 -> 9034106 (-0.04%) instructions in affected programs: 389831 -> 386521 (-0.85%) helped: 2184 HURT: 120 helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57% 95% mean confidence interval for instructions value: -1.49 -1.39 95% mean confidence interval for instructions %-change: -2.68% -2.41% Instructions are helped. total cycles in shared programs: 84636243 -> 84631628 (<.01%) cycles in affected programs: 4745058 -> 4740443 (-0.10%) helped: 1904 HURT: 960 helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18 helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38% HURT stats (abs) min: 1 max: 1080 x̄: 55.11 x̃: 14 HURT stats (rel) min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81% 95% mean confidence interval for cycles value: -4.51 1.29 95% mean confidence interval for cycles %-change: -1.64% -1.25% Inconclusive result (value mean confidence interval includes 0). LOST: 1 GAINED: 0 Sandy Bridge and Ivy Bridge (Ivy Bridge shown) total instructions in shared programs: 10018873 -> 10015456 (-0.03%) instructions in affected programs: 512820 -> 509403 (-0.67%) helped: 2268 HURT: 162 helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88% HURT stats (abs) min: 1 max: 4 x̄: 1.59 x̃: 1 HURT stats (rel) min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50% 95% mean confidence interval for instructions value: -1.46 -1.35 95% mean confidence interval for instructions %-change: -2.38% -2.12% Instructions are helped. total cycles in shared programs: 87538223 -> 87524771 (-0.02%) cycles in affected programs: 5435520 -> 5422068 (-0.25%) helped: 1916 HURT: 946 helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18 helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97% HURT stats (abs) min: 1 max: 633 x̄: 45.41 x̃: 11 HURT stats (rel) min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62% 95% mean confidence interval for cycles value: -7.34 -2.06 95% mean confidence interval for cycles %-change: -1.62% -1.26% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 7888446 -> 7886959 (-0.02%) instructions in affected programs: 331581 -> 330094 (-0.45%) helped: 1160 HURT: 97 helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1 helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25% 95% mean confidence interval for instructions value: -1.25 -1.12 95% mean confidence interval for instructions %-change: -0.91% -0.75% Instructions are helped. total cycles in shared programs: 178130766 -> 178116996 (<.01%) cycles in affected programs: 12534564 -> 12520794 (-0.11%) helped: 1856 HURT: 187 helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4 helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11% HURT stats (abs) min: 2 max: 26 x̄: 3.55 x̃: 2 HURT stats (rel) min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02% 95% mean confidence interval for cycles value: -7.41 -6.07 95% mean confidence interval for cycles %-change: -0.28% -0.22% Cycles are helped. GM45 total instructions in shared programs: 4858912 -> 4857887 (-0.02%) instructions in affected programs: 237565 -> 236540 (-0.43%) helped: 867 HURT: 57 helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1 helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: -1.18 -1.04 95% mean confidence interval for instructions %-change: -0.88% -0.71% Instructions are helped. total cycles in shared programs: 122189118 -> 122180816 (<.01%) cycles in affected programs: 8776418 -> 8768116 (-0.09%) helped: 1213 HURT: 166 helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4 helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11% HURT stats (abs) min: 2 max: 26 x̄: 3.35 x̃: 2 HURT stats (rel) min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02% 95% mean confidence interval for cycles value: -6.78 -5.26 95% mean confidence interval for cycles %-change: -0.24% -0.18% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-01-30 15:40:14 -08:00
Connor Abbott	de91461575	nir: fix algebraic optimizations The optimizations are only valid for 32-bit integers. They were mistakenly firing for 64-bit integers as well. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-08-01 12:20:49 -07:00
Matt Turner	aff108f2fd	nir: Optimize find_lsb/imsb/umsb error checks Two of the ARB_shader_ballot piglit tests hit the find_lsb case, removing some of the noise allowed me to better debug the test when it was failing. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2017-07-20 16:56:50 -07:00
Timothy Arceri	7a7ee40c2d	nir/i965: add before ffma algebraic opts This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Timothy Arceri	fb2269fed1	nir: shuffle constants to the top V2: mark float opts as inexact If one of the inputs to an mul/add is the result of another mul/add there is a chance that we can reuse the result of that mul/add in other calls if we do the multiplication in the right order. Also by attempting to move all constants to the top we increase the chance of constant folding. For example it is a fairly common pattern for shaders to do something similar to this: const float a = 0.5; in vec4 b; in float c; ... b.x = b.x * c; b.y = b.y * c; ... b.x = b.x * a + a; b.y = b.y * a + a; So by simply detecting that constant a is part of the multiplication in ffma and switching it with previous fmul that updates b we end up with: ... c = a * c; ... b.x = b.x * c + a; b.y = b.y * c + a; Shader-db results BDW: total instructions in shared programs: 13011050 -> 12967888 (-0.33%) instructions in affected programs: 4118366 -> 4075204 (-1.05%) helped: 17739 HURT: 1343 total cycles in shared programs: 246717952 -> 246410716 (-0.12%) cycles in affected programs: 166870802 -> 166563566 (-0.18%) helped: 18493 HURT: 7965 total spills in shared programs: 14937 -> 14560 (-2.52%) spills in affected programs: 9331 -> 8954 (-4.04%) helped: 284 HURT: 33 total fills in shared programs: 20211 -> 19671 (-2.67%) fills in affected programs: 12586 -> 12046 (-4.29%) helped: 286 HURT: 33 LOST: 39 GAINED: 33 Some of the hurt will go away when we shuffle things back down to the bottom in the following patch. It's also noteworthy that almost all of the spill changes are in Deus Ex both hurt and helped. Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Timothy Arceri	83f7fdf83a	nir: add flt comparision simplification Didn't turn out as useful as I'd hoped, but it will help alot more on i965 by reducing regressions when we drop brw_do_channel_expressions() and brw_do_vector_splitting(). I'm not sure how much sense 'is_not_used_by_conditional' makes on platforms other than i965 but since this is a new opt it at least won't do any harm. shader-db BDW: total instructions in shared programs: 13029581 -> 13029415 (-0.00%) instructions in affected programs: 15268 -> 15102 (-1.09%) helped: 86 HURT: 0 total cycles in shared programs: 247038346 -> 247036198 (-0.00%) cycles in affected programs: 692634 -> 690486 (-0.31%) helped: 183 HURT: 27 Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-04-24 12:08:14 +10:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Emil Velikov	e4c7911150	nir: remove shebang from python scripts Analogous to earlier commit(s). Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-03-10 14:12:47 +00:00
Jason Ekstrand	70e86a3f2d	nir/algebraic: Optimize 64bit pack/unpack This reduces the instruction count in some fp64 and int64 piglit tests Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-02-16 17:28:03 -08:00
Jason Ekstrand	161d3e81be	nir: Combine the int and double [un]pack opcodes NIR is a typeless IR and the two opcodes, when considered bitwise, do exactly the same thing. There's no reason to have two versions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-02-16 17:28:03 -08:00
Ian Romanick	fda33e09d8	nir: Shift count for shift opcodes is always 32-bits Previously both sources were unsized. This caused problems when the thing being shifted was 64-bit but the shift count was 32-bit. The expectation in NIR is that all unsized sources (and destination) will ultimately have the same size. The changes in nir_opt_algebraic.py are to prevent errors like: Failed to parse transformation: 03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte') 03:12:25 Traceback (most recent call last): 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__ 03:12:25 xform = SearchAndReplace(xform) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__ 03:12:25 BitSizeValidator(varset).validate(self.search, self.replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate 03:12:25 validate_dst_class = self._validate_bit_class_up(replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up 03:12:25 src_class = self._validate_bit_class_up(val.sources[i]) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up 03:12:25 assert src_class == src_type_bits 03:12:25 AssertionError Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: Jason Ekstrand <jason@jlekstrand.net>	2017-01-20 15:41:23 -08:00
Elie TOURNIER	9fdaeb7776	nir: add min/max optimisation Add the following optimisations: min(x, -x) = -abs(x) min(x, -abs(x)) = -abs(x) min(x, abs(x)) = x max(x, -abs(x)) = x max(x, abs(x)) = abs(x) max(x, -x) = abs(x) shader-db: total instructions in shared programs: 13067779 -> 13067775 (-0.00%) instructions in affected programs: 249 -> 245 (-1.61%) helped: 4 HURT: 0 total cycles in shared programs: 252054838 -> 252054806 (-0.00%) cycles in affected programs: 504 -> 472 (-6.35%) helped: 2 HURT: 0 Signed-off-by: Elie Tournier <tournier.elie@gmail.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-19 21:44:28 -08:00
Timothy Arceri	772cd31048	nir: optimise min/max fadd combos shader-db results BDW: total instructions in shared programs: 13060410 -> 13060313 (-0.00%) instructions in affected programs: 24533 -> 24436 (-0.40%) helped: 88 HURT: 0 total cycles in shared programs: 256585692 -> 256586698 (0.00%) cycles in affected programs: 647290 -> 648296 (0.16%) helped: 35 HURT: 30 Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-01-14 23:26:22 +11:00
Timothy Arceri	de8b03f5fb	nir: don't turn ieq/ine into inot if used by an if Otherwise we will end up with an extra instruction to compare the result of the inot. On BDW: total instructions in shared programs: 13060620 -> 13060481 (-0.00%) instructions in affected programs: 103379 -> 103240 (-0.13%) helped: 127 HURT: 0 total cycles in shared programs: 256590950 -> 256587408 (-0.00%) cycles in affected programs: 11324730 -> 11321188 (-0.03%) helped: 114 HURT: 21 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-12 09:47:29 +11:00
Timothy Arceri	7acc865226	nir: add late opt to turn inot/b2f combos back to bcsel We turn these from bcsel into inot/b2f combos in order for other optimisation passes to get further. Once we have finished turn the ones that remain and are used in more than a single expression back into a bcsel. On BDW: total instructions in shared programs: 13060965 -> 13060297 (-0.01%) instructions in affected programs: 835701 -> 835033 (-0.08%) helped: 670 HURT: 2 total cycles in shared programs: 256599536 -> 256598006 (-0.00%) cycles in affected programs: 114655488 -> 114653958 (-0.00%) helped: 419 HURT: 240 LOST: 0 GAINED: 1 The 2 HURT is because inserting bcsel creates the only use of const 1.0 in two shaders from tri-of-friendship-and-madness. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-12 09:47:29 +11:00
Timothy Arceri	8f37fc7066	nir: add imprecise flrp optimisation On BDW: total instructions in shared programs: 13061890 -> 13061877 (-0.00%) instructions in affected programs: 2441 -> 2428 (-0.53%) helped: 13 HURT: 0 total cycles in shared programs: 256612254 -> 256611784 (-0.00%) cycles in affected programs: 16418 -> 15948 (-2.86%) helped: 10 HURT: 2 V2: don't use ffma directly Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-12 09:47:29 +11:00
Timothy Arceri	1130f82a88	nir: add another comparison simplification On BDW: total instructions in shared programs: 13061877 -> 13060965 (-0.01%) instructions in affected programs: 133569 -> 132657 (-0.68%) helped: 566 HURT: 0 total cycles in shared programs: 256611784 -> 256599536 (-0.00%) cycles in affected programs: 861016 -> 848768 (-1.42%) helped: 379 HURT: 73 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-09 12:32:16 +11:00
Kenneth Graunke	3371de38f2	nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences. On BDW: total instructions in shared programs: 13074882 -> 13068703 (-0.05%) instructions in affected programs: 1823116 -> 1816937 (-0.34%) helped: 4187 HURT: 537 total cycles in shared programs: 256622718 -> 256425382 (-0.08%) cycles in affected programs: 123790120 -> 123592784 (-0.16%) helped: 3823 HURT: 2037 total spills in shared programs: 15276 -> 14929 (-2.27%) spills in affected programs: 9446 -> 9099 (-3.67%) helped: 352 HURT: 1 total fills in shared programs: 20496 -> 20144 (-1.72%) fills in affected programs: 13040 -> 12688 (-2.70%) helped: 352 HURT: 1 LOST: 2 GAINED: 21 v2: Rely on 'a' being a well-formed boolean (Connor, Eric). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-01-09 12:32:16 +11:00
Kenneth Graunke	1c50d31c26	nir: Convert ineg(b2i(a)) to a if it's a boolean. On BDW: total instructions in shared programs: 13071119 -> 13070371 (-0.01%) instructions in affected programs: 83424 -> 82676 (-0.90%) helped: 505 HURT: 45 (all TCS, all hurt by a single instruction) total cycles in shared programs: 256601322 -> 256588932 (-0.00%) cycles in affected programs: 819410 -> 807020 (-1.51%) helped: 450 HURT: 57 total loops in shared programs: 2950 -> 2942 (-0.27%) loops in affected programs: 8 -> 0 helped: 7 HURT: 0 v2: Drop unnecessary 'a@bool' annotation (Connor, Eric). Add a comment explaining the rule (Ian). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1] Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-01-09 12:32:16 +11:00
Jason Ekstrand	d55835b8bd	nir/algebraic: Add optimizations for "a == a && a CMP b" This sequence shows up The Talos Principal, at least under Vulkan, and prevents loop analysis from properly computing trip counts in a few loops. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-12-22 16:27:19 -08:00
Matt Turner	ac6646129f	nir: Move fsat outside of fmin/fmax if second arg is 0 to 1. instructions in affected programs: 550 -> 544 (-1.09%) helped: 6 cycles in affected programs: 6952 -> 6850 (-1.47%) helped: 6 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-12-12 12:39:27 -08:00
Ian Romanick	4d35683d91	nir: Optimize integer division and modulus with 1 The previous power-of-two rules didn't catch idiv (because i965 doesn't set lower_idiv) and imod cases. The udiv and umod cases should have been caught, but I included them for orthogonality. This fixes silly code observed from compute shaders with local_size_[xy] = 1. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-10-19 14:25:10 -07:00
Kenneth Graunke	7d0554f341	nir: Rely on the fact that bcsel takes a well formed boolean. According to Connor, it's safe to assume that the first operand of bcsel, as well as the operand of b2f and b2i, must be well formed booleans. https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html With the previous improvements to a@bool handling, this now has no change in shader-db instruction counts on Broadwell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-08-19 02:05:23 -07:00
Ian Romanick	cceb50e14e	nir/algebraic: Optimize common array indexing sequence Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-17 10:52:38 +01:00
Ian Romanick	0b626d7524	nir/algebraic: Optimize fabs(u2f(x)) I noticed this when I tried to do frexp(float(some_unsigned)) in the ir_unop_find_lsb lowering pass. The code generated for frexp() uses fabs, and this resulted in an extra instruction. Ultimately I ended up not using frexp. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-07-19 12:19:30 -07:00
Eric Anholt	c93f6938d5	nir: Add optimization for (a \|\| True == True) This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering producing some constants that were getting compared. total instructions in shared programs: 112276 -> 112198 (-0.07%) instructions in affected programs: 2239 -> 2161 (-3.48%) total estimated cycles in shared programs: 283102 -> 283038 (-0.02%) estimated cycles in affected programs: 2365 -> 2301 (-2.71%) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-12 15:46:09 -07:00
Jason Ekstrand	68e308d853	nir/algebraic: Remove imprecise flog2 optimizations While mathematically correct, these two optimizations result in an expression with substantially lower precision than the original. For any positive finite floating-point value, log2(x) is well-defined and finite. More precisely, it is in the range [-150, 150] so any sum of logarithms log2(a) + log2(b) is also well-defined and finite as long as a and b are both positive and finite. However, if a and b are either very small or very large, their product may get flushed to infinity or zero causing log2(a * b) to be nowhere close to log2(a) + log2(b). This imprecision was causing incorrect rendering in Talos Principal because part of its HDR rendering process involves doing 8 texture operations, clamping the result to [0, 65000], taking a dot-product with a constant, and then taking the log2. This is done 6 or 8 times and summed to produce the final result which is written to a red texture. In cases where you have a region of the screen that is very dark, it can end up getting a result value of -inf which is not what is intended. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425 Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>	2016-06-20 11:56:57 -07:00
Rob Clark	dfbae7d64f	nir/algebraic: support for power-of-two optimizations Some optimizations, like converting integer multiply/divide into left/ right shifts, have additional constraints on the search expression. Like requiring that a variable is a constant power of two. Support these cases by allowing a fxn name to be appended to the search var expression (ie. "a#32(is_power_of_two)"). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-06-03 16:05:03 -04:00
Jason Ekstrand	1b72c31e1f	nir/algebraic: Separate ffma lowering from fusing The i965 driver has its own pass for fusing mul+add combinations that's much smarter than what nir_opt_algebraic can do so we don't want to get the nir_opt_algebraic one just because we didn't set lower_ffma. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-11 11:44:35 -07:00
Samuel Iglesias Gonsálvez	2ab2d2e588	nir: Separate 32 and 64-bit fmod lowering Split 32-bit and 64-bit fmod lowering as the drivers might need to lower them separately inside NIR depending on the HW support. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-05-04 08:07:49 +02:00
Jason Ekstrand	6d4a426745	nir/algebraic: Support lowering for both 64 and 32-bit ldexp Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-04-28 21:36:52 -07:00
Jason Ekstrand	f0af5b87ec	nir/opcodes: Make ldexp take an explicitly 32-bit int There is no sense in having the double version of ldexp take a 64-bit integer. Instead, let's just take a 32-bit int all the time. This also matches what GLSL does where both variants of ldexp take a regular integer for the exponent argument. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-04-28 21:36:52 -07:00
Samuel Iglesias Gonsálvez	db07b46f2c	nir: Add lrp lowering for doubles in opt_algebraic Some hardware (i965 on Broadwell generation, for example) does not support natively the execution of lrp instruction with double arguments. Add 'lower_flrp64' flag to lower this instruction in that case. v2: - Rename lower_flrp_double to lower_flrp64 (Jason) - Fix typo (Jason) - Adapt the code to define bit_size information in the opcodes. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 12:01:40 +02:00
Samuel Iglesias Gonsálvez	443600d51e	nir: rename lower_flrp to lower_flrp32 A later patch will add lower_flrp64 option to NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 12:01:40 +02:00
Jason Ekstrand	8a3e344180	nir/opt_algebraic: Fix some expressions with ambiguous bit sizes Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-04-27 11:21:06 -07:00
Jason Ekstrand	7e0ee3a38b	nir/search: Respect the bit_size parameter on nir_search_value Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-04-27 11:21:06 -07:00
Jason Ekstrand	fcc1c8a437	nir/algebraic: Add a mechanism for specifying the bit size of a value Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-04-27 11:21:06 -07:00
Jason Ekstrand	4455bfa9a0	nir/algebraic: Add lowering for ldexp The algorithm used is different from both the naive suggestion from the GLSL spec and the one used in GLSL IR today. Unfortunately, the GLSL IR implementation that we have today doesn't handle denormals (for those that care) or the case where the float source is +-inf. Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-04-13 15:44:19 -07:00
Jason Ekstrand	745b3d295e	nir: Add more modulus opcodes These are all needed for SPIR-V Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-04-13 15:44:00 -07:00
Markus Wick	18c8b927e2	nir: Merge redudant integer clamping. Dolphin uses them a lot. Range tracking would be better in the long term, but this two lines works fine for now. Signed-off-by: Markus Wick <markus@selfnet.de> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-11 18:48:50 -07:00
Kenneth Graunke	5886cd79a0	nir: Do basic constant reassociation. Many shaders contain expression trees of the form: const_1 * (value * const_2) Reorganizing these to (const_1 * const_2) * value will allow constant folding to combine the constants. Sometimes, these constants are 2 and 0.5, so we can remove a multiply altogether. Other times, it can create more immediate constants, which can actually hurt. Finding a good balance here is tricky. While much more could be done, this simple patch seems to have a lot of positive benefit while having a low downside. shader-db results on Broadwell: total instructions in shared programs: 8963768 -> 8961369 (-0.03%) instructions in affected programs: 438318 -> 435919 (-0.55%) helped: 1502 HURT: 245 total cycles in shared programs: 71527354 -> 71421516 (-0.15%) cycles in affected programs: 11541788 -> 11435950 (-0.92%) helped: 3445 HURT: 1224 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-04-11 18:43:55 -07:00
Ian Romanick	08ff5f4d1f	nir: Simplify a bcsel to logical-or Oddly, this did not affect the shader where I first noticed the pattern. That particular shader doesn't get its if-statement converted to a bcsel because there are two assignments in the else-statement. This led to me submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747. shader-db results: Sandy Bridge total instructions in shared programs: 8467384 -> 8467069 (-0.00%) instructions in affected programs: 36594 -> 36279 (-0.86%) helped: 46 HURT: 0 total cycles in shared programs: 117573448 -> 117568518 (-0.00%) cycles in affected programs: 339114 -> 334184 (-1.45%) helped: 46 HURT: 0 Ivy Bridge / Haswell / Broadwell / Skylake: total instructions in shared programs: 7774258 -> 7773999 (-0.00%) instructions in affected programs: 30874 -> 30615 (-0.84%) helped: 46 HURT: 0 total cycles in shared programs: 65739190 -> 65734530 (-0.01%) cycles in affected programs: 180380 -> 175720 (-2.58%) helped: 45 HURT: 1 No change on G45 or Ironlake. I also tried these expressions, but none of them affected any shaders in shader-db: (('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)), (('bcsel', a, 'b@bool', False), ('iand', a, b)), (('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)), Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-03-31 14:59:36 -07:00
Matt Turner	05ee6627d6	nir: Fix typo from commit `6702f1acde`.	2016-03-30 19:18:35 -07:00
Matt Turner	6702f1acde	nir: Propagate negates up multiplication chains. total instructions in shared programs: 7112159 -> 7088092 (-0.34%) instructions in affected programs: 1374915 -> 1350848 (-1.75%) helped: 7392 HURT: 621 GAINED: 2 LOST: 2	2016-03-30 13:12:34 -07:00
Jason Ekstrand	0dbda153aa	nir/algebraic: Flag inexact optimizations Many of our optimizations, while great for cutting shaders down to size, aren't really precision-safe. This commit tries to flag all of the inexact floating-point optimizations so they don't get run on values that are flagged "exact". It's a bit conservative and maybe flags some safe optimizations as unsafe but that's better than missing one. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-03-23 16:28:02 -07:00
Jason Ekstrand	ed3a029e80	nir/algebraic: Fix fmin detection to match the spec The previous transformation got the arguments to fmin backwards. When NaNs are involved, the GLSL min/max aren't commutative so it matters. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-03-23 16:28:00 -07:00
Jason Ekstrand	89545b1314	nir/algebraic: Get rid of an invlid fxor optimization The fxor opcode is required to return 1.0f or 0.0f but the input variable may not be 1.0f or 0.0f. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-03-23 16:27:58 -07:00
Jason Ekstrand	3a7cb6534c	nir/algebraic: Allow for flagging operations as being inexact Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-03-23 16:27:55 -07:00
Ian Romanick	d7a25a9def	nir: Don't abs slt and friends No shader-db changes, but this is symmetric with the previous commit. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-03-22 14:48:02 -07:00
Ian Romanick	2bb006af68	nir: Don't abs the result of b2f or b2i In the results below, 2 SIMD16 shaders in Trine are lost. G4X total instructions in shared programs: 4012279 -> 4011108 (-0.03%) instructions in affected programs: 116776 -> 115605 (-1.00%) helped: 339 HURT: 0 total cycles in shared programs: 84315862 -> 84313584 (-0.00%) cycles in affected programs: 1767232 -> 1764954 (-0.13%) helped: 274 HURT: 81 Ironlake total instructions in shared programs: 6399073 -> 6396998 (-0.03%) instructions in affected programs: 218050 -> 215975 (-0.95%) helped: 600 HURT: 0 total cycles in shared programs: 128892088 -> 128888810 (-0.00%) cycles in affected programs: 2867452 -> 2864174 (-0.11%) helped: 422 HURT: 137 Sandy Bridge total instructions in shared programs: 8462174 -> 8460759 (-0.02%) instructions in affected programs: 178529 -> 177114 (-0.79%) helped: 596 HURT: 0 total cycles in shared programs: 117542276 -> 117534098 (-0.01%) cycles in affected programs: 1239166 -> 1230988 (-0.66%) helped: 369 HURT: 150 Ivy Bridge total instructions in shared programs: 7775131 -> 7773410 (-0.02%) instructions in affected programs: 162903 -> 161182 (-1.06%) helped: 590 HURT: 0 total cycles in shared programs: 65759882 -> 65747268 (-0.02%) cycles in affected programs: 1004354 -> 991740 (-1.26%) helped: 467 HURT: 141 Haswell total instructions in shared programs: 7107786 -> 7106327 (-0.02%) instructions in affected programs: 140954 -> 139495 (-1.04%) helped: 590 HURT: 0 total cycles in shared programs: 64668028 -> 64655322 (-0.02%) cycles in affected programs: 967080 -> 954374 (-1.31%) helped: 452 HURT: 149 LOST: 2 GAINED: 0 Broadwell total instructions in shared programs: 8980029 -> 8978287 (-0.02%) instructions in affected programs: 197232 -> 195490 (-0.88%) helped: 715 HURT: 0 total cycles in shared programs: 70070448 -> 70055970 (-0.02%) cycles in affected programs: 975724 -> 961246 (-1.48%) helped: 471 HURT: 111 LOST: 2 GAINED: 0 Skylake total instructions in shared programs: 9115178 -> 9113436 (-0.02%) instructions in affected programs: 203012 -> 201270 (-0.86%) helped: 715 HURT: 0 total cycles in shared programs: 68848660 -> 68834004 (-0.02%) cycles in affected programs: 993888 -> 979232 (-1.47%) helped: 473 HURT: 116 LOST: 2 GAINED: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-03-22 14:48:02 -07:00
Ian Romanick	348e5a71d8	nir: Simplify 0 < fabs(a) Sandy Bridge / Ivy Bridge / Haswell total instructions in shared programs: 8462180 -> 8462174 (-0.00%) instructions in affected programs: 564 -> 558 (-1.06%) helped: 6 HURT: 0 total cycles in shared programs: 117542462 -> 117542276 (-0.00%) cycles in affected programs: 9768 -> 9582 (-1.90%) helped: 12 HURT: 0 Broadwell / Skylake total instructions in shared programs: 8980833 -> 8980826 (-0.00%) instructions in affected programs: 626 -> 619 (-1.12%) helped: 7 HURT: 0 total cycles in shared programs: 70077900 -> 70077714 (-0.00%) cycles in affected programs: 9378 -> 9192 (-1.98%) helped: 12 HURT: 0 G45 and Ironlake showed no change. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-03-22 14:47:56 -07:00
Ian Romanick	564a8b8a26	nir: Simplify 0 >= b2f(a) This also prevented some regressions with other patches in my local tree. Broadwell / Skylake total instructions in shared programs: 8980835 -> 8980833 (-0.00%) instructions in affected programs: 45 -> 43 (-4.44%) helped: 1 HURT: 0 total cycles in shared programs: 70077904 -> 70077900 (-0.00%) cycles in affected programs: 122 -> 118 (-3.28%) helped: 1 HURT: 0 No changes on earlier platforms. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-03-22 14:44:57 -07:00
Ian Romanick	bf0d60aa11	nir: Simplify i2b with negated or abs operand This enables removing ssa_201 and ssa_202 in sequences like: vec1 ssa_200 = flt ssa_199, ssa_194 vec1 ssa_201 = b2i ssa_200 vec1 ssa_202 = i2b -ssa_201 shader-db results: Sandy Bridge total instructions in shared programs: 8462257 -> 8462180 (-0.00%) instructions in affected programs: 3846 -> 3769 (-2.00%) helped: 35 HURT: 0 total cycles in shared programs: 117542934 -> 117542462 (-0.00%) cycles in affected programs: 20072 -> 19600 (-2.35%) helped: 20 HURT: 1 Ivy Bridge total instructions in shared programs: 7775252 -> 7775137 (-0.00%) instructions in affected programs: 3645 -> 3530 (-3.16%) helped: 35 HURT: 0 total cycles in shared programs: 65760522 -> 65760068 (-0.00%) cycles in affected programs: 21082 -> 20628 (-2.15%) helped: 25 HURT: 2 Haswell total instructions in shared programs: 7108666 -> 7108589 (-0.00%) instructions in affected programs: 3253 -> 3176 (-2.37%) helped: 35 HURT: 0 total cycles in shared programs: 64675726 -> 64675272 (-0.00%) cycles in affected programs: 21034 -> 20580 (-2.16%) helped: 26 HURT: 1 Broadwell / Skylake total instructions in shared programs: 8980912 -> 8980835 (-0.00%) instructions in affected programs: 3223 -> 3146 (-2.39%) helped: 35 HURT: 0 total cycles in shared programs: 70077926 -> 70077904 (-0.00%) cycles in affected programs: 21886 -> 21864 (-0.10%) helped: 21 HURT: 6 G45 and Ironlake showed no change. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-03-22 14:43:28 -07:00
Ian Romanick	a4079f1cb2	nir: Lower flrp with Boolean interpolator to bcsel On Intel platforms that don't set lower_flrp, using bcsel instead of flrp seems to be a small amount worse. On those platforms, the use of flrp, bcsel, and multiply of b2f is still an active area of research. In review, Matt suggested this is because bcsel turns into CMP+SEL, and because of the flag register we can't schedule instructions well. shader-db results: G4X / Ironlake total instructions in shared programs: 4016538 -> 4012279 (-0.11%) instructions in affected programs: 161556 -> 157297 (-2.64%) helped: 1077 HURT: 1 total cycles in shared programs: 84328296 -> 84315862 (-0.01%) cycles in affected programs: 4174570 -> 4162136 (-0.30%) helped: 926 HURT: 53 Unsurprisingly, no changes on later platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-03-22 14:42:42 -07:00
Matt Turner	905ff86198	nir: Recognize open-coded extract_u16. No shader-db changes, but does recognize some extract_u16 which enables the next patch to optimize some code. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-03-04 11:52:34 -08:00
Matt Turner	76289fbfa8	nir: Recognize open-coded extract_u8. Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack three bytes from an integer and convert each into a float: float((val >> 16u) & 0xffu) float((val >> 8u) & 0xffu) float((val >> 0u) & 0xffu) Instead of shifting, masking, and type converting like this: shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD mov(8) g17<1>F g16<8,8,1>UD shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD mov(8) g20<1>F g19<8,8,1>UD and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD mov(8) g22<1>F g21<8,8,1>UD i965 can simply extract a byte and convert to float in a single instruction: mov(8) g17<1>F g25.2<32,8,4>UB mov(8) g20<1>F g25.1<32,8,4>UB mov(8) g22<1>F g25.0<32,8,4>UB This patch implements the first step: recognizing byte extraction. A later patch will optimize out the conversion to float. instructions in affected programs: 28568 -> 27450 (-3.91%) helped: 7 cycles in affected programs: 210076 -> 203144 (-3.30%) helped: 7 This patch decreases the number of instructions in the two Unigine programs by: #1721: 4520 -> 4374 instructions (-3.23%) #1706: 3752 -> 3582 instructions (-4.53%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-03-04 11:52:34 -08:00
Matt Turner	371c4b3c48	nir: Recognize open-coded bitfield_reverse. Helps 11 shaders in UnrealEngine4 demos. I seriously hope they would have given us bitfieldReverse() if we exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that anyway?). instructions in affected programs: 4875 -> 4633 (-4.96%) cycles in affected programs: 270516 -> 244516 (-9.61%) I suspect there's a lot of room to improve nir_search/opt_algebraic's handling of this. We'd actually like to match, e.g., step2 by matching step1 once and then doing a pointer comparison for the second instance of step1, but unfortunately we generate an enormous tuple for instead. The .text size increases by 6.5% and the .data by 17.5%. text data bss dec hex filename 22957 45224 0 68181 10a55 nir_libnir_la-nir_opt_algebraic.o 24461 53160 0 77621 12f35 nir_libnir_la-nir_opt_algebraic.o I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is in a GL 4.0 context once we expose GL 4.0. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-02-08 21:20:58 -08:00
Matt Turner	a8f0960816	nir: Recognize product of open-coded pow()s. Prevents regressions in the next commit. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-02-08 20:38:17 -08:00
Matt Turner	9f02e3ab03	nir: Add opt_algebraic rules for xor with zero. instructions in affected programs: 668 -> 664 (-0.60%) helped: 4 Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-02-08 20:38:17 -08:00
Matt Turner	955d052058	nir: Add lowering support for unpacking opcodes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	9b8786eba9	nir: Add lowering support for packing opcodes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	68f8c5730b	nir: Add opcodes to extract bytes or words. The uint versions zero extend while the int versions sign extend. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Emil Velikov	a39a8fbbaa	nir: move to compiler/ Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-26 16:08:30 +00:00

... 3 4 5 6 7 ...

433 Commits