Samuel Pitoiset
59427b6d1d
nir/opt_algebraic: lower 64-bit fmin3/fmax3/fmed3
...
This unconditionally lowers 64-bit fmin3/fmax3/fmed3 because
AMD hardware doesn't have native instructions, and no drivers
except RADV uses these instructions.
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.*.f64.*
with ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4570 >
2020-04-20 06:59:47 +00:00
Ian Romanick
b097e326b8
nir/algebraic: Remove a redundant fabs pattern
...
Made redundant by 5544b2cbbd
("nir/algebraic: Use value range analysis
to eliminate useless unary ops").
No shader-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
2020-04-01 00:28:38 +00:00
Ian Romanick
af1bc7e0c7
nir/algebraic: Use value range analysis to convert fmax to fsat
...
This is conceptually similar to the 1-fsat(a) <=> fsat(1-a) rearragement
done in:
3b74790941
("nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))")
2d259713b7 ("nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all
non-fmul instructions").
Note: this helps the Aztex Ruins shader that was hurt for spills and
fills on Braodwell in the previous commit, but it does not fix the
spills or fills. :(
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14528985 -> 14526116 (-0.02%)
instructions in affected programs: 477300 -> 474431 (-0.60%)
helped: 2332
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.23 x̃: 1
helped stats (rel) min: 0.07% max: 8.89% x̄: 0.88% x̃: 0.64%
95% mean confidence interval for instructions value: -1.27 -1.19
95% mean confidence interval for instructions %-change: -0.92% -0.85%
Instructions are helped.
total cycles in shared programs: 203723684 -> 203692984 (-0.02%)
cycles in affected programs: 4878847 -> 4848147 (-0.63%)
helped: 1764
HURT: 324
helped stats (abs) min: 1 max: 706 x̄: 22.94 x̃: 17
helped stats (rel) min: <.01% max: 17.75% x̄: 1.94% x̃: 1.66%
HURT stats (abs) min: 1 max: 400 x̄: 30.15 x̃: 10
HURT stats (rel) min: <.01% max: 17.76% x̄: 1.91% x̃: 0.69%
95% mean confidence interval for cycles value: -16.55 -12.86
95% mean confidence interval for cycles %-change: -1.44% -1.24%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
2020-04-01 00:28:38 +00:00
Ian Romanick
62795475e8
nir/algebraic: Distribute source modifiers into instructions
...
There are three main classes of cases that are helped by this change:
1. When the negation is applied to a value being type converted (e.g.,
float(-x)). This could possibly also be handled with more clever
code generation.
2. When the negation is applied to a phi node source (e.g., x = -(...);
at the end of a basic block). This was the original case that caught
my attention while looking at shader-db dumps.
3. When the negation is applied to the source of an instruction that
cannot have source modifiers. This includes texture instructions and
math box instructions on pre-Gen7 platforms (see more details below).
In many these cases the negation can be propagated into the instructions
that generate the value (e.g., -(a*b) = (-a)*b).
In addition to the operations implemtned in this patch, I also tried:
- frcp - Helped 6 or fewer shaders on Gen7+, and hurt just as many on
pre-Gen7. On Gen6 and earlier, frcp is a math box instruction, and
math box instructions cannot have source modifiers.
I suspect this is why so many more shaders are helped on Gen6 than on
Gen5 or Gen7. Gen6 supports OpenGL 3.3, so a lot more shaders
compile on it. A lot of these shaders may have things like cos(-x)
or rcp(-x) that could result in an explicit negation instruction.
- bcsel - Hurt a few shaders with none helped. bcsel operates on
integer sources, so the fabs or fneg cannot be a source modifier in
the bcsel itself.
- Integer instructions - No changes on any Intel platform.
Some notes about the shader-db results below.
- On Tiger Lake, a single Deus Ex fragment shader is hurt for both
spills and fills.
- On Haswell, a different Deus Ex fragment shader is hurt for both
spills and fills.
- On GM45, the "LOST: 1" and "GAINED: 1" is a single Left4Dead 2
(very high graphics settings, lol) fragment shader that upgrades
from SIMD8 to SIMD16.
v2: Add support for fsign. Add some patterns that remove redundant
negations and redundant absolute value rather than trying to push them
down the tree.
Tiger Lake
total instructions in shared programs: 17611333 -> 17586465 (-0.14%)
instructions in affected programs: 3033734 -> 3008866 (-0.82%)
helped: 10310
HURT: 632
helped stats (abs) min: 1 max: 35 x̄: 2.61 x̃: 1
helped stats (rel) min: 0.04% max: 16.67% x̄: 1.43% x̃: 1.01%
HURT stats (abs) min: 1 max: 47 x̄: 3.21 x̃: 2
HURT stats (rel) min: 0.04% max: 5.08% x̄: 0.88% x̃: 0.63%
95% mean confidence interval for instructions value: -2.33 -2.21
95% mean confidence interval for instructions %-change: -1.32% -1.27%
Instructions are helped.
total cycles in shared programs: 338365223 -> 338262252 (-0.03%)
cycles in affected programs: 125291811 -> 125188840 (-0.08%)
helped: 5224
HURT: 2031
helped stats (abs) min: 1 max: 5670 x̄: 46.73 x̃: 12
helped stats (rel) min: <.01% max: 34.78% x̄: 1.91% x̃: 0.97%
HURT stats (abs) min: 1 max: 2882 x̄: 69.50 x̃: 14
HURT stats (rel) min: <.01% max: 44.93% x̄: 2.35% x̃: 0.74%
95% mean confidence interval for cycles value: -18.71 -9.68
95% mean confidence interval for cycles %-change: -0.80% -0.63%
Cycles are helped.
total spills in shared programs: 8942 -> 8946 (0.04%)
spills in affected programs: 8 -> 12 (50.00%)
helped: 0
HURT: 1
total fills in shared programs: 9399 -> 9401 (0.02%)
fills in affected programs: 21 -> 23 (9.52%)
helped: 0
HURT: 1
Ice Lake
total instructions in shared programs: 16124348 -> 16102258 (-0.14%)
instructions in affected programs: 2830928 -> 2808838 (-0.78%)
helped: 11294
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.96 x̃: 1
helped stats (rel) min: 0.07% max: 17.65% x̄: 1.32% x̃: 0.93%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 3.45% max: 4.00% x̄: 3.72% x̃: 3.72%
95% mean confidence interval for instructions value: -1.99 -1.93
95% mean confidence interval for instructions %-change: -1.34% -1.29%
Instructions are helped.
total cycles in shared programs: 335393932 -> 335325794 (-0.02%)
cycles in affected programs: 123834609 -> 123766471 (-0.06%)
helped: 5034
HURT: 2128
helped stats (abs) min: 1 max: 3256 x̄: 43.39 x̃: 11
helped stats (rel) min: <.01% max: 35.79% x̄: 1.98% x̃: 1.00%
HURT stats (abs) min: 1 max: 2634 x̄: 70.63 x̃: 16
HURT stats (rel) min: <.01% max: 49.49% x̄: 2.73% x̃: 0.62%
95% mean confidence interval for cycles value: -13.66 -5.37
95% mean confidence interval for cycles %-change: -0.69% -0.48%
Cycles are helped.
LOST: 0
GAINED: 2
Skylake
total instructions in shared programs: 14949240 -> 14927930 (-0.14%)
instructions in affected programs: 2594756 -> 2573446 (-0.82%)
helped: 11000
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.94 x̃: 1
helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76%
95% mean confidence interval for instructions value: -1.97 -1.91
95% mean confidence interval for instructions %-change: -1.42% -1.37%
Instructions are helped.
total cycles in shared programs: 324829346 -> 324821596 (<.01%)
cycles in affected programs: 121566087 -> 121558337 (<.01%)
helped: 4611
HURT: 2147
helped stats (abs) min: 1 max: 3715 x̄: 33.29 x̃: 10
helped stats (rel) min: <.01% max: 36.08% x̄: 1.94% x̃: 1.00%
HURT stats (abs) min: 1 max: 2551 x̄: 67.88 x̃: 16
HURT stats (rel) min: <.01% max: 53.79% x̄: 3.69% x̃: 0.89%
95% mean confidence interval for cycles value: -4.25 1.96
95% mean confidence interval for cycles %-change: -0.28% -0.02%
Inconclusive result (value mean confidence interval includes 0).
Broadwell
total instructions in shared programs: 14971203 -> 14949957 (-0.14%)
instructions in affected programs: 2635699 -> 2614453 (-0.81%)
helped: 10982
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.93 x̃: 1
helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76%
95% mean confidence interval for instructions value: -1.97 -1.90
95% mean confidence interval for instructions %-change: -1.42% -1.37%
Instructions are helped.
total cycles in shared programs: 336215033 -> 336086458 (-0.04%)
cycles in affected programs: 127383198 -> 127254623 (-0.10%)
helped: 4884
HURT: 1963
helped stats (abs) min: 1 max: 25696 x̄: 51.78 x̃: 12
helped stats (rel) min: <.01% max: 58.28% x̄: 2.00% x̃: 1.05%
HURT stats (abs) min: 1 max: 3401 x̄: 63.33 x̃: 16
HURT stats (rel) min: <.01% max: 39.95% x̄: 2.20% x̃: 0.70%
95% mean confidence interval for cycles value: -29.99 -7.57
95% mean confidence interval for cycles %-change: -0.89% -0.71%
Cycles are helped.
total fills in shared programs: 24905 -> 24901 (-0.02%)
fills in affected programs: 117 -> 113 (-3.42%)
helped: 4
HURT: 0
LOST: 0
GAINED: 16
Haswell
total instructions in shared programs: 13148927 -> 13131528 (-0.13%)
instructions in affected programs: 2220941 -> 2203542 (-0.78%)
helped: 8017
HURT: 4
helped stats (abs) min: 1 max: 12 x̄: 2.17 x̃: 1
helped stats (rel) min: 0.07% max: 15.25% x̄: 1.40% x̃: 0.93%
HURT stats (abs) min: 1 max: 7 x̄: 2.50 x̃: 1
HURT stats (rel) min: 0.33% max: 4.76% x̄: 2.73% x̃: 2.91%
95% mean confidence interval for instructions value: -2.21 -2.13
95% mean confidence interval for instructions %-change: -1.43% -1.37%
Instructions are helped.
total cycles in shared programs: 321221791 -> 321079870 (-0.04%)
cycles in affected programs: 126886055 -> 126744134 (-0.11%)
helped: 4674
HURT: 1729
helped stats (abs) min: 1 max: 23654 x̄: 56.47 x̃: 16
helped stats (rel) min: <.01% max: 53.22% x̄: 2.13% x̃: 1.05%
HURT stats (abs) min: 1 max: 3694 x̄: 70.58 x̃: 18
HURT stats (rel) min: <.01% max: 63.06% x̄: 2.48% x̃: 0.90%
95% mean confidence interval for cycles value: -33.31 -11.02
95% mean confidence interval for cycles %-change: -0.99% -0.78%
Cycles are helped.
total spills in shared programs: 19872 -> 19874 (0.01%)
spills in affected programs: 21 -> 23 (9.52%)
helped: 0
HURT: 1
total fills in shared programs: 20941 -> 20941 (0.00%)
fills in affected programs: 62 -> 62 (0.00%)
helped: 1
HURT: 1
LOST: 0
GAINED: 8
Ivy Bridge
total instructions in shared programs: 11875553 -> 11853839 (-0.18%)
instructions in affected programs: 1553112 -> 1531398 (-1.40%)
helped: 7304
HURT: 3
helped stats (abs) min: 1 max: 16 x̄: 2.97 x̃: 2
helped stats (rel) min: 0.07% max: 15.25% x̄: 1.62% x̃: 1.15%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.05% max: 3.33% x̄: 2.44% x̃: 2.94%
95% mean confidence interval for instructions value: -3.04 -2.90
95% mean confidence interval for instructions %-change: -1.65% -1.59%
Instructions are helped.
total cycles in shared programs: 178246425 -> 178184484 (-0.03%)
cycles in affected programs: 13702146 -> 13640205 (-0.45%)
helped: 4409
HURT: 1566
helped stats (abs) min: 1 max: 531 x̄: 24.52 x̃: 13
helped stats (rel) min: <.01% max: 38.67% x̄: 2.14% x̃: 1.02%
HURT stats (abs) min: 1 max: 356 x̄: 29.48 x̃: 10
HURT stats (rel) min: <.01% max: 64.73% x̄: 1.87% x̃: 0.70%
95% mean confidence interval for cycles value: -11.60 -9.14
95% mean confidence interval for cycles %-change: -1.19% -0.99%
Cycles are helped.
LOST: 0
GAINED: 10
Sandy Bridge
total instructions in shared programs: 10695740 -> 10667483 (-0.26%)
instructions in affected programs: 2337607 -> 2309350 (-1.21%)
helped: 10720
HURT: 1
helped stats (abs) min: 1 max: 49 x̄: 2.64 x̃: 2
helped stats (rel) min: 0.07% max: 20.00% x̄: 1.54% x̃: 1.13%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.04% max: 1.04% x̄: 1.04% x̃: 1.04%
95% mean confidence interval for instructions value: -2.69 -2.58
95% mean confidence interval for instructions %-change: -1.57% -1.51%
Instructions are helped.
total cycles in shared programs: 153478839 -> 153416223 (-0.04%)
cycles in affected programs: 22050900 -> 21988284 (-0.28%)
helped: 5342
HURT: 2200
helped stats (abs) min: 1 max: 1020 x̄: 20.34 x̃: 16
helped stats (rel) min: <.01% max: 24.05% x̄: 1.51% x̃: 0.86%
HURT stats (abs) min: 1 max: 335 x̄: 20.93 x̃: 6
HURT stats (rel) min: <.01% max: 20.18% x̄: 1.03% x̃: 0.30%
95% mean confidence interval for cycles value: -9.18 -7.42
95% mean confidence interval for cycles %-change: -0.82% -0.71%
Cycles are helped.
Iron Lake
total instructions in shared programs: 8114882 -> 8105574 (-0.11%)
instructions in affected programs: 1232504 -> 1223196 (-0.76%)
helped: 4109
HURT: 2
helped stats (abs) min: 1 max: 6 x̄: 2.27 x̃: 1
helped stats (rel) min: 0.05% max: 8.33% x̄: 0.99% x̃: 0.66%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.94% max: 4.35% x̄: 2.65% x̃: 2.65%
95% mean confidence interval for instructions value: -2.31 -2.21
95% mean confidence interval for instructions %-change: -1.01% -0.96%
Instructions are helped.
total cycles in shared programs: 188504036 -> 188466296 (-0.02%)
cycles in affected programs: 31203798 -> 31166058 (-0.12%)
helped: 3447
HURT: 36
helped stats (abs) min: 2 max: 92 x̄: 11.03 x̃: 8
helped stats (rel) min: <.01% max: 5.41% x̄: 0.21% x̃: 0.13%
HURT stats (abs) min: 2 max: 30 x̄: 7.33 x̃: 6
HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.18% x̃: 0.10%
95% mean confidence interval for cycles value: -11.16 -10.51
95% mean confidence interval for cycles %-change: -0.22% -0.20%
Cycles are helped.
LOST: 0
GAINED: 1
GM45
total instructions in shared programs: 4989697 -> 4984531 (-0.10%)
instructions in affected programs: 703952 -> 698786 (-0.73%)
helped: 2493
HURT: 2
helped stats (abs) min: 1 max: 6 x̄: 2.07 x̃: 1
helped stats (rel) min: 0.05% max: 8.33% x̄: 1.03% x̃: 0.66%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.95% max: 4.35% x̄: 2.65% x̃: 2.65%
95% mean confidence interval for instructions value: -2.13 -2.01
95% mean confidence interval for instructions %-change: -1.07% -0.99%
Instructions are helped.
total cycles in shared programs: 128929136 -> 128903886 (-0.02%)
cycles in affected programs: 21583096 -> 21557846 (-0.12%)
helped: 2214
HURT: 17
helped stats (abs) min: 2 max: 92 x̄: 11.44 x̃: 8
helped stats (rel) min: <.01% max: 5.41% x̄: 0.24% x̃: 0.13%
HURT stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4
HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.20% x̃: 0.09%
95% mean confidence interval for cycles value: -11.75 -10.88
95% mean confidence interval for cycles %-change: -0.25% -0.22%
Cycles are helped.
LOST: 1
GAINED: 1
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
2020-04-01 00:28:38 +00:00
Jason Ekstrand
84ab61160a
nir/algebraic: Add downcast-of-pack opts
...
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365 >
2020-03-31 00:18:05 +00:00
Jason Ekstrand
b2db84153a
nir: Add b2b opcodes
...
These exist to convert between different types of boolean values. In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants. In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.
The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338 >
2020-03-30 15:46:19 +00:00
Samuel Pitoiset
3935a729d9
nir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimization
...
Helps some Wolfenstein II and Wolfenstein Youngblood shaders.
pipeline-db (VEGA10/ACO):
Totals from affected shaders:
SGPRS: 17904 -> 17904 (0.00 %)
VGPRS: 14492 -> 14492 (0.00 %)
Spilled SGPRs: 20 -> 20 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1753152 -> 1749708 (-0.20 %) bytes
Max Waves: 2581 -> 2581 (0.00 %)
pipeline-db (VEGA10/LLVM):
Totals from affected shaders:
SGPRS: 26656 -> 26656 (0.00 %)
VGPRS: 23780 -> 23780 (0.00 %)
Spilled SGPRs: 2112 -> 2112 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2552712 -> 2549236 (-0.14 %) bytes
Max Waves: 3359 -> 3359 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353 >
2020-03-30 14:07:43 +00:00
Ian Romanick
b421c0466d
soft-fp64/flt: Perform checks in a different order
...
The change to nir_opt_algebraic cleans up a pattern that was never
produced before the rest of this commit was added.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 843005 -> 841666 (-0.16%)
instructions in affected programs: 460655 -> 459316 (-0.29%)
helped: 64
HURT: 17
helped stats (abs) min: 1 max: 72 x̄: 21.72 x̃: 20
helped stats (rel) min: 0.01% max: 28.07% x̄: 12.67% x̃: 16.07%
HURT stats (abs) min: 1 max: 7 x̄: 3.00 x̃: 2
HURT stats (rel) min: 0.01% max: 0.04% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -20.87 -12.19
95% mean confidence interval for instructions %-change: -12.35% -7.66%
Instructions are helped.
total cycles in shared programs: 6944998 -> 6927246 (-0.26%)
cycles in affected programs: 3891872 -> 3874120 (-0.46%)
helped: 71
HURT: 10
helped stats (abs) min: 2 max: 772 x̄: 254.21 x̃: 156
helped stats (rel) min: <.01% max: 66.44% x̄: 21.72% x̃: 18.40%
HURT stats (abs) min: 18 max: 69 x̄: 29.70 x̃: 20
HURT stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -270.82 -167.50
95% mean confidence interval for cycles %-change: -24.41% -13.65%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
4e3d69ad07
nir/algebraic: Simplify a contradiction that can occur in __flt64_nonnan
...
The pattern is added to opt_algebraic because, for example, comparisons
with constant 0.0 will produce (a1 < 0).
Even with a pass that optimized Boolean expressions, I think this would
be very difficult to automatically recognize and optimize.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 933054 -> 929619 (-0.37%)
instructions in affected programs: 784041 -> 780606 (-0.44%)
helped: 59
HURT: 0
helped stats (abs) min: 2 max: 213 x̄: 58.22 x̃: 44
helped stats (rel) min: 0.02% max: 2.51% x̄: 0.72% x̃: 0.46%
95% mean confidence interval for instructions value: -70.80 -45.64
95% mean confidence interval for instructions %-change: -0.92% -0.53%
Instructions are helped.
total cycles in shared programs: 7304712 -> 7280180 (-0.34%)
cycles in affected programs: 7176260 -> 7151728 (-0.34%)
helped: 92
HURT: 0
helped stats (abs) min: 8 max: 1414 x̄: 266.65 x̃: 166
helped stats (rel) min: 0.04% max: 2.34% x̄: 0.43% x̃: 0.22%
95% mean confidence interval for cycles value: -333.05 -200.26
95% mean confidence interval for cycles %-change: -0.54% -0.31%
Cycles are helped.
Regular shader-db changes:
No changes on any Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
e0cefc5a23
nir/algebraic: Constant reassociation for bitwise operations too
...
Like 5886cd79a0
, but for iand, ior, and ixor.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake
total instructions in shared programs: 903108 -> 902830 (-0.03%)
instructions in affected programs: 654910 -> 654632 (-0.04%)
helped: 31
HURT: 5
helped stats (abs) min: 2 max: 31 x̄: 9.58 x̃: 7
helped stats (rel) min: 0.01% max: 0.23% x̄: 0.06% x̃: 0.04%
HURT stats (abs) min: 1 max: 10 x̄: 3.80 x̃: 3
HURT stats (rel) min: 0.01% max: 0.10% x̄: 0.03% x̃: 0.02%
95% mean confidence interval for instructions value: -10.55 -4.89
95% mean confidence interval for instructions %-change: -0.07% -0.03%
Instructions are helped.
total cycles in shared programs: 7059681 -> 7058006 (-0.02%)
cycles in affected programs: 5081309 -> 5079634 (-0.03%)
helped: 33
HURT: 12
helped stats (abs) min: 1 max: 444 x̄: 60.91 x̃: 18
helped stats (rel) min: <.01% max: 2.17% x̄: 0.25% x̃: 0.05%
HURT stats (abs) min: 1 max: 288 x̄: 27.92 x̃: 2
HURT stats (rel) min: <.01% max: 1.00% x̄: 0.23% x̃: 0.02%
95% mean confidence interval for cycles value: -68.32 -6.12
95% mean confidence interval for cycles %-change: -0.28% 0.03%
Inconclusive result (%-change mean confidence interval includes 0).
Ice Lake
total instructions in shared programs: 895384 -> 895159 (-0.03%)
instructions in affected programs: 658678 -> 658453 (-0.03%)
helped: 37
HURT: 0
helped stats (abs) min: 3 max: 16 x̄: 6.08 x̃: 4
helped stats (rel) min: <.01% max: 0.07% x̄: 0.04% x̃: 0.04%
95% mean confidence interval for instructions value: -7.46 -4.70
95% mean confidence interval for instructions %-change: -0.04% -0.03%
Instructions are helped.
total cycles in shared programs: 7092224 -> 7091195 (-0.01%)
cycles in affected programs: 5221666 -> 5220637 (-0.02%)
helped: 35
HURT: 11
helped stats (abs) min: 1 max: 247 x̄: 43.46 x̃: 12
helped stats (rel) min: <.01% max: 2.17% x̄: 0.23% x̃: 0.05%
HURT stats (abs) min: 2 max: 432 x̄: 44.73 x̃: 5
HURT stats (rel) min: <.01% max: 1.00% x̄: 0.25% x̃: 0.02%
95% mean confidence interval for cycles value: -49.00 4.26
95% mean confidence interval for cycles %-change: -0.27% 0.03%
Inconclusive result (value mean confidence interval includes 0).
Regular shader-db results:
All Haswell+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611408 -> 17611398 (<.01%)
instructions in affected programs: 1648 -> 1638 (-0.61%)
helped: 2
HURT: 0
total cycles in shared programs: 338366148 -> 338366124 (<.01%)
cycles in affected programs: 124048 -> 124024 (-0.02%)
helped: 2
HURT: 0
No changes on any earlier Intel platforms.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
1d36af9338
nir/algebraic: Generalize some and-of-shift-right patterns [v2]
...
Generalizes some of the patterns from 76289fbfa8
and 905ff86198
. In
particular, some of the soft-fp64 code generates (a & 0x7fffffff) << 1
when constant 0.0 is compared (flt or feq).
v2: Reduce the set of added patterns to those that actually help
something. This reduces the size of the state transition tables by
about 29k. Suggested by Jason. Remove the existing patterns that this
commit subsumes.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake
total instructions in shared programs: 903171 -> 903108 (<.01%)
instructions in affected programs: 635903 -> 635840 (<.01%)
helped: 25
HURT: 11
helped stats (abs) min: 1 max: 16 x̄: 5.04 x̃: 3
helped stats (rel) min: <.01% max: 0.15% x̄: 0.04% x̃: 0.03%
HURT stats (abs) min: 2 max: 14 x̄: 5.73 x̃: 5
HURT stats (rel) min: <.01% max: 0.11% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for instructions value: -3.91 0.41
95% mean confidence interval for instructions %-change: -0.03% <.01%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 7059527 -> 7059681 (<.01%)
cycles in affected programs: 5249401 -> 5249555 (<.01%)
helped: 41
HURT: 9
helped stats (abs) min: 2 max: 76 x̄: 11.90 x̃: 10
helped stats (rel) min: <.01% max: 11.86% x̄: 0.99% x̃: 0.01%
HURT stats (abs) min: 2 max: 380 x̄: 71.33 x̃: 12
HURT stats (rel) min: <.01% max: 0.22% x̄: 0.04% x̃: 0.01%
95% mean confidence interval for cycles value: -14.93 21.09
95% mean confidence interval for cycles %-change: -1.40% -0.20%
Inconclusive result (value mean confidence interval includes 0).
Ice Lake
total instructions in shared programs: 895506 -> 895384 (-0.01%)
instructions in affected programs: 658800 -> 658678 (-0.02%)
helped: 37
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 3.30 x̃: 2
helped stats (rel) min: <.01% max: 0.03% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -4.00 -2.59
95% mean confidence interval for instructions %-change: -0.02% -0.02%
Instructions are helped.
total cycles in shared programs: 7092748 -> 7092224 (<.01%)
cycles in affected programs: 5272008 -> 5271484 (<.01%)
helped: 36
HURT: 14
helped stats (abs) min: 2 max: 440 x̄: 21.67 x̃: 8
helped stats (rel) min: <.01% max: 11.86% x̄: 1.12% x̃: 0.02%
HURT stats (abs) min: 2 max: 122 x̄: 18.29 x̃: 6
HURT stats (rel) min: <.01% max: 0.07% x̄: 0.01% x̃: <.01%
95% mean confidence interval for cycles value: -29.24 8.28
95% mean confidence interval for cycles %-change: -1.40% -0.21%
Inconclusive result (value mean confidence interval includes 0).
Regular shader-db results:
All Haswell+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611489 -> 17611408 (<.01%)
instructions in affected programs: 21188 -> 21107 (-0.38%)
helped: 23
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 3.78 x̃: 3
helped stats (rel) min: 0.03% max: 5.82% x̄: 1.13% x̃: 0.85%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.60% max: 0.60% x̄: 0.60% x̃: 0.60%
95% mean confidence interval for instructions value: -5.27 -1.48
95% mean confidence interval for instructions %-change: -1.70% -0.42%
Instructions are helped.
total cycles in shared programs: 338418502 -> 338366148 (-0.02%)
cycles in affected programs: 2289052 -> 2236698 (-2.29%)
helped: 18
HURT: 3
helped stats (abs) min: 4 max: 18000 x̄: 2909.67 x̃: 38
helped stats (rel) min: 0.09% max: 4.07% x̄: 0.96% x̃: 0.43%
HURT stats (abs) min: 2 max: 14 x̄: 6.67 x̃: 4
HURT stats (rel) min: 0.22% max: 1.13% x̄: 0.66% x̃: 0.64%
95% mean confidence interval for cycles value: -5204.00 217.91
95% mean confidence interval for cycles %-change: -1.31% -0.14%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total instructions in shared programs: 11875617 -> 11875615 (<.01%)
instructions in affected programs: 1339 -> 1337 (-0.15%)
helped: 2
HURT: 0
No changes on any earlier Intel platforms.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
d6d63aec18
nir/algebraic: optimize ior(ine(a, 0), ine(b, 0)) to ine(ior(a, b), 0)
...
Like 70f9e2589e
. Also scrub the unnecessary size qualifier in both
replacement patterns.
This occurs in a handful of places in the soft-fp64 code, and that is
the primary reason for the change.
Perhaps the patterns that generate umin should be conditioned on
something, but I'm not sure what. lower_bitops might cover the cases
that matter, but it seems ugly.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 936505 -> 933388 (-0.33%)
instructions in affected programs: 925719 -> 922602 (-0.34%)
helped: 154
HURT: 1
helped stats (abs) min: 1 max: 211 x̄: 35.45 x̃: 16
helped stats (rel) min: 0.34% max: 9.30% x̄: 2.28% x̃: 0.96%
HURT stats (abs) min: 2342 max: 2342 x̄: 2342.00 x̃: 2342
HURT stats (rel) min: 2.28% max: 2.28% x̄: 2.28% x̃: 2.28%
95% mean confidence interval for instructions value: -51.21 10.99
95% mean confidence interval for instructions %-change: -2.61% -1.89%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 7323502 -> 7306184 (-0.24%)
cycles in affected programs: 7220376 -> 7203058 (-0.24%)
helped: 126
HURT: 1
helped stats (abs) min: 2 max: 946 x̄: 159.10 x̃: 95
helped stats (rel) min: 0.01% max: 9.62% x̄: 0.80% x̃: 0.37%
HURT stats (abs) min: 2728 max: 2728 x̄: 2728.00 x̃: 2728
HURT stats (rel) min: 0.37% max: 0.37% x̄: 0.37% x̃: 0.37%
95% mean confidence interval for cycles value: -192.07 -80.66
95% mean confidence interval for cycles %-change: -1.07% -0.51%
Cycles are helped.
total spills in shared programs: 635 -> 817 (28.66%)
spills in affected programs: 635 -> 817 (28.66%)
helped: 0
HURT: 3
total fills in shared programs: 2065 -> 2438 (18.06%)
fills in affected programs: 2019 -> 2392 (18.47%)
helped: 0
HURT: 2
Regular shader-db results:
All Haswell+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611506 -> 17611489 (<.01%)
instructions in affected programs: 33442 -> 33425 (-0.05%)
helped: 32
HURT: 6
helped stats (abs) min: 1 max: 6 x̄: 1.69 x̃: 1
helped stats (rel) min: 0.08% max: 1.90% x̄: 0.27% x̃: 0.11%
HURT stats (abs) min: 1 max: 15 x̄: 6.17 x̃: 5
HURT stats (rel) min: 0.09% max: 1.50% x̄: 0.65% x̃: 0.55%
95% mean confidence interval for instructions value: -1.70 0.80
95% mean confidence interval for instructions %-change: -0.30% 0.05%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 338419218 -> 338418502 (<.01%)
cycles in affected programs: 385795 -> 385079 (-0.19%)
helped: 42
HURT: 3
helped stats (abs) min: 2 max: 192 x̄: 24.57 x̃: 16
helped stats (rel) min: 0.04% max: 2.09% x̄: 0.33% x̃: 0.22%
HURT stats (abs) min: 64 max: 164 x̄: 105.33 x̃: 88
HURT stats (rel) min: 0.77% max: 1.58% x̄: 1.09% x̃: 0.93%
95% mean confidence interval for cycles value: -29.76 -2.06
95% mean confidence interval for cycles %-change: -0.40% -0.07%
Cycles are helped.
Ivy Bridge and Sandy Bridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11875620 -> 11875617 (<.01%)
instructions in affected programs: 421 -> 418 (-0.71%)
helped: 2
HURT: 0
total cycles in shared programs: 178245336 -> 178245326 (<.01%)
cycles in affected programs: 3425 -> 3415 (-0.29%)
helped: 2
HURT: 0
No changes on Gen4 or Gen5.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
88eb8f190b
nir/algebraic: Simplify logic to detect sign of an integer
...
This occurs in a handful of places in the soft-fp64 code, and that is
the primary reason for the change.
v2: Fix a typo in a comment. Noticed by Matt. Copy the correct fp64
shader-db results to the commit message. I realized that I used
accidentally used the results from the next commit.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 906235 -> 906149 (<.01%)
instructions in affected programs: 353966 -> 353880 (-0.02%)
helped: 31
HURT: 2
helped stats (abs) min: 1 max: 8 x̄: 3.03 x̃: 3
helped stats (rel) min: 0.01% max: 1.59% x̄: 0.10% x̃: 0.04%
HURT stats (abs) min: 3 max: 5 x̄: 4.00 x̃: 4
HURT stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -3.51 -1.70
95% mean confidence interval for instructions %-change: -0.19% <.01%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs: 7076552 -> 7076173 (<.01%)
cycles in affected programs: 2878361 -> 2877982 (-0.01%)
helped: 37
HURT: 2
helped stats (abs) min: 2 max: 48 x̄: 10.81 x̃: 6
helped stats (rel) min: <.01% max: 2.17% x̄: 0.47% x̃: 0.01%
HURT stats (abs) min: 1 max: 20 x̄: 10.50 x̃: 10
HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -13.96 -5.48
95% mean confidence interval for cycles %-change: -0.72% -0.16%
Cycles are helped.
total fills in shared programs: 2064 -> 2065 (0.05%)
fills in affected programs: 45 -> 46 (2.22%)
helped: 0
HURT: 1
Regular shader-db results:
All Gen7+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611530 -> 17611506 (<.01%)
instructions in affected programs: 5934 -> 5910 (-0.40%)
helped: 10
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2
helped stats (rel) min: 0.14% max: 1.24% x̄: 0.47% x̃: 0.34%
95% mean confidence interval for instructions value: -3.53 -1.27
95% mean confidence interval for instructions %-change: -0.78% -0.17%
Instructions are helped.
total cycles in shared programs: 338419178 -> 338419218 (<.01%)
cycles in affected programs: 19244 -> 19284 (0.21%)
helped: 4
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.05% max: 0.11% x̄: 0.08% x̃: 0.08%
HURT stats (abs) min: 26 max: 26 x̄: 26.00 x̃: 26
HURT stats (rel) min: 1.20% max: 1.20% x̄: 1.20% x̃: 1.20%
95% mean confidence interval for cycles value: -9.08 22.41
95% mean confidence interval for cycles %-change: -0.35% 1.04%
Inconclusive result (value mean confidence interval includes 0).
No changes on any earlier Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Juan A. Suarez Romero
90550b2a3e
nir/algebraic: coalesce fmod lowering
...
As fmod for 16/32/64 bits lowering does the same, let's merge all of
them in a single case.
Fixes dEQP-VK.glsl.builtin.precision_double.mod.compute.* on ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4118 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4118 >
2020-03-12 16:42:52 +00:00
Jason Ekstrand
b20693be41
nir: Flush to zero with OOB low exponents in ldexp
...
Reviewed-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2020-03-04 11:39:50 -06:00
Hyunjun Ko
9e8466a866
nir: Add optimization for doing removing f16/f32 conversions
...
This eliminates conversions between f16 and f32 where possible. We can
always remove an upcast followed by a down cast, that is:
f2f16 ( f2f32 (a) ) -> a
f2fmp ( f2f32 (a) ) -> a
In the other direction, f2f16 loses precision and can't be undone by a
f2f32. However, by definition it's always safe to elminate f2fmp:
f2f32 ( f2fmp (a) ) -> a
v2. [Neil Roberts (nroberts@igalia.com )]
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822 >
2020-02-24 17:24:13 +00:00
Neil Roberts
125f867d3d
nir/opcodes: Add nir_op_f2fmp
...
This opcode is the same as the f2f16 opcode except that it comes with
a promise that it is safe to optimise it out if the result is
immediately converted back to a 32-bit float again. Normally this
would be a lossy conversion and so it would be visible to the
application, but if the conversion is generated as part of the mediump
lowering process then this removal doesn’t matter. The opcode is
eventually replaced with a regular f2f16 in the late optimisations so
the backends don’t need to handle it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822 >
2020-02-24 17:24:13 +00:00
Erik Faye-Lund
0c1ba69a27
Revert "nir: Add a couple trivial abs optimizations"
...
These were already added in 9fdaeb7776
("nir: add min/max optimisation"),
and there's no point in doing them twice.
This reverts commit e4d346c86d
.
Fixes: e4d346c86d
("nir: Add a couple trivial abs optimizations")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3786 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3786 >
2020-02-13 09:18:27 +00:00
Samuel Pitoiset
8e77280774
nir: do not use De Morgan's Law rules for flt and fge
...
In presence of NaNs, "!(flt(a, b) && flt(c, d))" is NOT EQUAL
to "fge(a, b) || fge(c, d)". These optimizations are unsafe for
apps that rely on NaN behaviour.
pipeline-db (GFX9/LLVM):
Totals from affected shaders:
SGPRS: 3176 -> 3136 (-1.26 %)
VGPRS: 2188 -> 2144 (-2.01 %)
Spilled SGPRs: 227 -> 169 (-25.55 %)
Code Size: 150572 -> 151800 (0.82 %) bytes
Max Waves: 307 -> 310 (0.98 %)
pipeline-db (GFX9/ACO):
Totals from affected shaders:
SGPRS: 18744 -> 18744 (0.00 %)
VGPRS: 15576 -> 15580 (0.03 %)
Spilled SGPRs: 164 -> 164 (0.00 %)
Code Size: 1573012 -> 1576492 (0.22 %) bytes
Max Waves: 1534 -> 1532 (-0.13 %)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2127
Fixes: d1ed4ffe0b
("nir: Use De Morgan's Law on logic compounded comparisons")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3696 >
2020-02-11 08:36:00 +01:00
Rhys Perry
1f72857739
nir/algebraic: add some half packing optimizations
...
pipeline-db (ACO):
Totals from affected shaders:
SGPRS: 29200 -> 29200 (0.00 %)
VGPRS: 17372 -> 17372 (0.00 %)
Spilled SGPRs: 105 -> 105 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1406576 -> 1389256 (-1.23 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Max Waves: 3976 -> 3976 (0.00 %)
pipeline-db (LLVM):
Totals from affected shaders:
SGPRS: 21320 -> 21320 (0.00 %)
VGPRS: 17056 -> 17036 (-0.12 %)
Spilled SGPRs: 22 -> 22 (0.00 %)
Spilled VGPRs: 503 -> 487 (-3.18 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 396 -> 396 (0.00 %) dwords per thread
Code Size: 1441244 -> 1423292 (-1.25 %) bytes
LDS: 463 -> 463 (0.00 %) blocks
Max Waves: 3609 -> 3611 (0.06 %)
v2: add pattern for ishr
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2271 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2271 >
2020-01-29 14:30:33 +00:00
Rhys Perry
5476d18183
nir/algebraic: add patterns for a >> #b << #b
...
Fixes compilation of a Battlefront 2 shader with ACO by removing VGPR
spilling. The reassociation makes it worse on LLVM though.
pipeline-db (ACO):
Totals from affected shaders:
SGPRS: 10704 -> 10688 (-0.15 %)
VGPRS: 18736 -> 18528 (-1.11 %)
Spilled SGPRs: 70 -> 70 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 909696 -> 885796 (-2.63 %) bytes
LDS: 225 -> 225 (0.00 %) blocks
Max Waves: 1115 -> 1129 (1.26 %)
pipeline-db (LLVM):
Totals from affected shaders:
SGPRS: 8472 -> 8424 (-0.57 %)
VGPRS: 14284 -> 14368 (0.59 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 442 -> 503 (13.80 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 268 -> 396 (47.76 %) dwords per thread
Code Size: 862568 -> 853028 (-1.11 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 971 -> 964 (-0.72 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2271 >
2020-01-29 14:30:33 +00:00
Ian Romanick
b065d8fb8c
nir/algebraic: Optimize some 64-bit integer comparisons involving zero
...
I noticed that we can do better for these kinds of comparisons while
working on the lowering for iadd_sat@64 and isub_sat@64. This
eliminated 11 instruction from the fs-addSaturate-int64.shader_test.
My hope is that this will improve the run-time of int64 tests on Ice
Lake. I have no data to support or refute this.
Unsurprisingly, no changes on shader-db.
v2: Condition the min and max patterns with nir_lower_minmax64.
Suggested by Caio. Very long discussion in the MR. :)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
2020-01-23 00:18:57 +00:00
Ian Romanick
d3d970166c
nir/algebraic: Add lowering for 64-bit iadd_sat and isub_sat
...
v2: Rearranged and expand the comment about the optimizations applied to
the lowering. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
2020-01-23 00:18:57 +00:00
Ian Romanick
dcadbd2dd2
nir/algebraic: Add lowering for 64-bit uadd_sat
...
Fixes piglit fs-addsaturate-uint64 and vs-addsaturate-uint64 on Ice
Lake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
2020-01-23 00:18:57 +00:00
Ian Romanick
1bdfc6d7cb
nir/algebraic: Add lowering for 64-bit usub_sat
...
v2: Rebase on 272e927d0e
("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_usub_sat64 flag that only applies to the 64-bit
version of the nir_op_usub_sat instruction.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
2020-01-23 00:18:57 +00:00
Ian Romanick
a483771045
nir/algebraic: Add lowering for 64-bit hadd and rhadd
...
v2: Rebase on 272e927d0e
("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_hadd64 flag that only applies to the 64-bit versions
of the instructions.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
2020-01-23 00:18:57 +00:00
Ian Romanick
ea435560ee
nir/algebraic: Add lowering for uabs_usub and uabs_isub
...
v2: Remove some rebase failures noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767 >
2020-01-23 00:18:57 +00:00
Elie Tournier
22c5c54a4f
nir/algebraic: sqrt(x)*sqrt(x) -> fabs(x)
...
total instructions in shared programs: 12840840 -> 12839341 (-0.01%)
instructions in affected programs: 122581 -> 121082 (-1.22%)
helped: 559
HURT: 0
total cycles in shared programs: 302505756 -> 302490031 (<.01%)
cycles in affected programs: 2022900 -> 2007175 (-0.78%)
helped: 1090
HURT: 130
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/948 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/948 >
2020-01-15 00:30:52 +00:00
Elie Tournier
6f394343b1
nir/algebraic: i2f(f2i()) -> trunc()
...
total instructions in shared programs: 12840968 -> 12840784 (<.01%)
instructions in affected programs: 17886 -> 17702 (-1.03%)
helped: 77
HURT: 0
total cycles in shared programs: 302508917 -> 302505592 (<.01%)
cycles in affected programs: 249964 -> 246639 (-1.33%)
helped: 70
HURT: 7
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/948 >
2020-01-15 00:30:52 +00:00
Rhys Perry
cc3ef3643a
nir/algebraic: a & ~(a >> 31) -> imax(a, 0)
...
Found in some Doom shaders
Totals from affected shaders:
SGPRS: 30056 -> 30064 (0.03 %)
VGPRS: 28024 -> 28024 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 4278648 -> 4270852 (-0.18 %) bytes
Max Waves: 1476 -> 1476 (0.00 %)
Instructions: 835287 -> 833338 (-0.23 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3089 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3089 >
2020-01-14 17:54:40 +00:00
Jonathan Marek
004797002f
nir: add option to lower half packing opcodes
...
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106 >
2019-12-16 19:20:07 -05:00
Ian Romanick
fbd5359a0a
nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select
...
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14660366 -> 14653437 (-0.05%)
instructions in affected programs: 316166 -> 309237 (-2.19%)
helped: 905
HURT: 10
helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6
helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97%
95% mean confidence interval for instructions value: -7.91 -7.23
95% mean confidence interval for instructions %-change: -4.46% -3.99%
Instructions are helped.
total cycles in shared programs: 228571646 -> 228549759 (<.01%)
cycles in affected programs: 56239919 -> 56218032 (-0.04%)
helped: 681
HURT: 216
helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10
helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65%
HURT stats (abs) min: 1 max: 320 x̄: 42.09 x̃: 14
HURT stats (rel) min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49%
95% mean confidence interval for cycles value: -41.51 -7.29
95% mean confidence interval for cycles %-change: -0.80% -0.49%
Cycles are helped.
LOST: 1
GAINED: 0
2019-12-02 16:46:20 -08:00
Ian Romanick
780b5c1037
nir/algebraic: Simplify some Inf and NaN avoidance code
...
Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq
should only return Inf when fsqrt returns 0.
The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.
An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max. That's more explicit, but we may also want to
have specific bit encodings of float values later. I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.
total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.
Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
2019-12-02 16:46:20 -08:00
Ian Romanick
9be4a422a0
nir/algebraic: Mark other comparison exact when removing a == a
...
This prevents some additional optimizations that would change the
original result. This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a. Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.
This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958 . However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false. If they had left
the calls to isnan() alone, everything would have just worked.
No shader-db changes on any Intel platform.
I also tried marking the comparison generated by the isnan() function
precise. The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().
I also considered adding a new ir_unop_isnan opcode that would implement
the functionality. During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).
This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.
Fixes: d55835b8bd
("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-04 14:05:49 -08:00
Ian Romanick
ea19f2fb68
nir/algebraic: Add the ability to mark a replacement as exact
...
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-04 14:05:49 -08:00
Rob Clark
5e08f070f0
nir: add nir_lower_amul pass
...
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Rob Clark
1bdde31392
nir: add address calc related opt rules
...
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
6320e37d4b
nir: add amul instruction
...
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size. For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Daniel Schürmann
239423d234
nir: Remove unnecessary subtraction optimizations
...
These optimizations are already covered after lowering.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann
99848a57b7
nir: recombine nir_op_*sub when lower_sub = false
...
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.
This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Ian Romanick
317a88b920
nir/algebraic: Additional D3D Boolean optimization
...
I observed this pattern in several shaders in Hand of Fate 2 while
investigating bugzilla #111490 . This also led to the related
bugzilla #111578 . The shaders from HoF2 are *not* in shader-db.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Skylake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 16222621 -> 16205419 (-0.11%)
instructions in affected programs: 798418 -> 781216 (-2.15%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35
helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09%
95% mean confidence interval for instructions value: -33.22 -29.56
95% mean confidence interval for instructions %-change: -3.11% -2.56%
Instructions are helped.
total cycles in shared programs: 364676209 -> 363345763 (-0.36%)
cycles in affected programs: 112810504 -> 111480058 (-1.18%)
helped: 546
HURT: 7
helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340
helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08%
HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43
HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35%
95% mean confidence interval for cycles value: -2884.33 -1927.41
95% mean confidence interval for cycles %-change: -1.59% -1.21%
Cycles are helped.
total spills in shared programs: 8870 -> 8514 (-4.01%)
spills in affected programs: 1230 -> 874 (-28.94%)
helped: 161
HURT: 0
total fills in shared programs: 21901 -> 21348 (-2.52%)
fills in affected programs: 2120 -> 1567 (-26.08%)
helped: 155
HURT: 5
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 14994910 -> 14975495 (-0.13%)
instructions in affected programs: 839033 -> 819618 (-2.31%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49
helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22%
95% mean confidence interval for instructions value: -37.46 -33.40
95% mean confidence interval for instructions %-change: -3.12% -2.70%
Instructions are helped.
total cycles in shared programs: 386032453 -> 384450722 (-0.41%)
cycles in affected programs: 117807357 -> 116225626 (-1.34%)
helped: 547
HURT: 6
helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926
helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31%
HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29
HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65%
95% mean confidence interval for cycles value: -3060.28 -2660.27
95% mean confidence interval for cycles %-change: -1.59% -1.37%
Cycles are helped.
total spills in shared programs: 23372 -> 21869 (-6.43%)
spills in affected programs: 11730 -> 10227 (-12.81%)
helped: 352
HURT: 0
total fills in shared programs: 34747 -> 35351 (1.74%)
fills in affected programs: 11013 -> 11617 (5.48%)
helped: 3
HURT: 347
Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956420 -> 11956126 (<.01%)
instructions in affected programs: 14898 -> 14604 (-1.97%)
helped: 98
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -2.18% -1.98%
Instructions are helped.
total cycles in shared programs: 178791217 -> 178790792 (<.01%)
cycles in affected programs: 149763 -> 149338 (-0.28%)
helped: 91
HURT: 7
helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16
helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18%
HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322
HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41%
95% mean confidence interval for cycles value: -18.94 10.27
95% mean confidence interval for cycles %-change: -1.28% 0.49%
Inconclusive result (value mean confidence interval includes 0).
2019-09-19 14:22:22 -07:00
Ian Romanick
92f70df8c3
nir/algebraic: Do not apply late DPH optimization in vertex processing stages
...
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant. For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry. This
can result in Z-fighting artifacts (at best). For now, disable this
optimization in these stages.
In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7
("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.
total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
2019-09-19 14:21:31 -07:00
Andres Gomez
3f782cdd25
nir/algebraic: mark float optimizations returning one parameter as inexact
...
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.
For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.
Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-17 23:39:18 +03:00
Ian Romanick
e07248d2a8
nir/algebraic: Clean up value range analysis-based optimizations
...
Fix the a / b ordering in some compares. Delete duplicate patterns.
Add a table explaining things. While I was cleaning this up, I managed
to confuse myself. The table helped sort that out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-29 13:15:52 -07:00
Ian Romanick
ccb236d1bc
nir/algebraic: Mark some value range analysis-based optimizations imprecise
...
This didn't fix bug #111308 , but it was found will trying to find the
actual cause of that bug.
Fixes piglit tests (new in piglit!110):
- fs-fract-of-NaN.shader_test
- fs-lt-nan-tautology.shader_test
- fs-ge-nan-tautology.shader_test
No shader-db changes on any Intel platform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: b77070e293
("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-29 13:15:52 -07:00
Ian Romanick
d3fd1c761a
nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is enabled
...
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend. This was encountered in some Unreal4 tech
demos in shader-db. The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.
The fixes tag is a bit a lie. The actual bug was introduced about
26,000 commits earlier in 371c4b3c48
("nir: Recognize open-coded
bitfield_reverse."). Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist. Hopefully nobody will care to
fix this on an earlier Mesa release.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3
("nir: Add lowering for nir_op_bitfield_reverse.")
2019-08-28 11:38:51 -07:00
Daniel Schürmann
7fa1740035
nir/algebraic: some subtraction optimizations
...
Changes with RADV/ACO:
Totals from affected shaders:
SGPRS: 444087 -> 455543 (2.58 %)
VGPRS: 436468 -> 436768 (0.07 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 13448928 -> 13353520 (-0.71 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 68060 -> 67979 (-0.12 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-21 08:51:49 +00:00
Rhys Perry
0a790c3019
nir/algebraic: add a few masking-before-unpack optimizations
...
Helps some Dawn of War 3 and F1 2017 shaders with ACO:
Totals from affected shaders:
SGPRS: 2136 -> 2128 (-0.37 %)
VGPRS: 1624 -> 1628 (0.25 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 168068 -> 164332 (-2.22 %) bytes
LDS: 44 -> 44 (0.00 %) blocks
Max Waves: 222 -> 221 (-0.45 %)
Wait states: 0 -> 0 (0.00 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-16 12:13:01 +01:00
Ian Romanick
0e6581b87d
nir/algebraic: Reassociate shift-by-constant of shift-by-constant
...
v2: After some review discussion with Alyssa, the replacements now
correct account for cases where (b+c) >= bitsize.
v3: Use a temporary to simplify the Python code quite a bit. Suggested
by Jason.
Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16251155 -> 16249576 (<.01%)
instructions in affected programs: 232627 -> 231048 (-0.68%)
helped: 547
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -3.12 -2.65
95% mean confidence interval for instructions %-change: -1.20% -1.06%
Instructions are helped.
total cycles in shared programs: 365924392 -> 365372103 (-0.15%)
cycles in affected programs: 59207053 -> 58654764 (-0.93%)
helped: 497
HURT: 34
helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16
helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82%
HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63
HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06%
95% mean confidence interval for cycles value: -1426.41 -653.77
95% mean confidence interval for cycles %-change: -1.66% -1.15%
Cycles are helped.
total spills in shared programs: 8870 -> 8871 (0.01%)
spills in affected programs: 104 -> 105 (0.96%)
helped: 0
HURT: 1
Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956236 -> 11955635 (<.01%)
instructions in affected programs: 94110 -> 93509 (-0.64%)
helped: 106
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4
helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76%
95% mean confidence interval for instructions value: -6.62 -4.72
95% mean confidence interval for instructions %-change: -2.27% -1.64%
Instructions are helped.
total cycles in shared programs: 179296340 -> 178788044 (-0.28%)
cycles in affected programs: 51009603 -> 50501307 (-1.00%)
helped: 82
HURT: 7
helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16
helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11%
HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2
HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10%
95% mean confidence interval for cycles value: -7649.38 -3773.00
95% mean confidence interval for cycles %-change: -2.71% -1.99%
Cycles are helped.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-14 11:15:37 -07:00
Ian Romanick
73aaeac0a3
nir/algebraic: Reassociate add-and-shift to be shift-and-add
...
A common thing in many shaders:
uniform vs { vec4 bones[...]; };
...
x = some_calculation(bones[i + 0]);
y = some_calculation(bones[i + 1]);
z = some_calculation(bones[i + 2]);
This turns into stuff like
vec1 32 ssa_12 = iadd ssa_11, ssa_0
vec1 32 ssa_13 = ishl ssa_12, ssa_3
vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
vec1 32 ssa_15 = iadd ssa_11, ssa_1
vec1 32 ssa_16 = ishl ssa_15, ssa_3
vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
vec1 32 ssa_18 = iadd ssa_11, ssa_2
vec1 32 ssa_19 = ishl ssa_18, ssa_3
vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)
By reassociating the shift and the add, we can reduce this to
vec1 32 ssa_12 = ishl ssa_11, ssa_3
vec1 32 ssa_13 = iadd ssa_12, ssa_0
vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
vec1 32 ssa_16 = iadd ssa_12, ssa_1
vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
vec1 32 ssa_19 = iadd ssa_12, ssa_2
vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)
v2: Add some commentary from Rhys Perry's nearly identical patch.
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16277758 -> 16250704 (-0.17%)
instructions in affected programs: 1440284 -> 1413230 (-1.88%)
helped: 4920
HURT: 6
helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4
helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79%
HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3
HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55%
95% mean confidence interval for instructions value: -5.67 -5.31
95% mean confidence interval for instructions %-change: -2.26% -2.16%
Instructions are helped.
total cycles in shared programs: 367118526 -> 365895358 (-0.33%)
cycles in affected programs: 93504145 -> 92280977 (-1.31%)
helped: 2754
HURT: 1269
helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16
helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12%
HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9
HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75%
95% mean confidence interval for cycles value: -387.31 -220.78
95% mean confidence interval for cycles %-change: -2.11% -1.68%
Cycles are helped.
LOST: 1
GAINED: 1
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-14 11:15:32 -07:00