Commit Graph

382 Commits

Author SHA1 Message Date
Eric Engestrom f1eae2f8bb python: drop python2 support
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
2021-08-14 21:44:32 +00:00
Ian Romanick 84d2e53789 Revert "nir/algebraic: Convert some f2u to f2i"
Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666,
the assumption fundamental to this optimization is false.  Section
2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation.
The wording of the section is somewhat confusing (because it doesn't
clearly delineate between signed and unsigned integers), but the last
two rows of the table make it clear that F->UD conversion clamps
negative float values to 0.

All other hardware mentioned in that thread seems to behave the same
way.

The real problem is that, with hardware that behaves in this ways,
converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern
that would be produced from 0x80000000 to 0x7fffffff.

This reverts commit ad05920258.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
2021-08-10 22:16:13 +00:00
Rhys Perry 4e2b94331b nir/algebraic: improve irem by power-of-two optimization
Requires one less instruction.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry b009467b81 nir/algebraic: add optimizations for imul(a, INT_MIN)
is_pos_power_of_two would catch this, but nir_op_imul has signed sources,
so is_neg_power_of_two catches it instead, which creates a useless
nir_op_ineg.

fossil-db (Sienna Cichlid):
Totals from 1014 (0.68% of 150170) affected shaders:
CodeSize: 3592296 -> 3592288 (-0.00%); split: -0.00%, +0.00%
Instrs: 671211 -> 670426 (-0.12%)
Latency: 5268917 -> 5268479 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 2187349 -> 2187343 (-0.00%); split: -0.00%, +0.00%
VClause: 8634 -> 8636 (+0.02%)
Copies: 97585 -> 97604 (+0.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry 65cd5a0f22 nir/algebraic: don't optimize umod/imod/irem if lower_bitops=true
Match the udiv/idiv/imul by power-of-two optimizations.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Rhys Perry ec4b425f59 nir/algebraic: fix imod by negative power-of-two
If "a" is a multiple of "b", then the result would have been "b" instead
of 0.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0ef5f3552f ("nir: add strength reduction pattern for imod/irem with pow2 divisor.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
2021-08-09 11:00:39 +00:00
Dave Airlie ad92c2b253 nir: add fisnormal lowering
just lower the 32-bit version for now.

Thanks to alyssa for this suggested lowering.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>
2021-08-06 14:27:48 +10:00
Sagar Ghuge 06ab737686 nir: Add optimizations for iadd3
This patch also adds has_iadd3 bit to give more control if backend
supports ternary add instruction or not.

v2:
- Add patterns in late optimization (Connor Abbott)

Suggested-by: Alyssa/Jason

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:56 +00:00
Jason Ekstrand 2e08bae9b3 nir,vc4: Suffix a bunch of unorm 4x8 opcodes _vc4
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>
2021-06-21 09:04:08 -05:00
Rhys Perry 1cbcfb8b38 nir, nir/algebraic: add byte/word insertion instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:42 +00:00
Rhys Perry edae3e5623 nir/algebraic: optimize extract of extract
Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff))

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:42 +00:00
Ian Romanick d246c31ec1 nir/algebraic: Add algebraic opt for float comparisons with identical operands.
The flt version could have been added in 56e21647e2, but our
collective understanding of NaN and comparisons was poor in 2015.  The
new "is_a_number" predicate makes the others possible.

All of the helped shaders in shader-db are either from Mad Max or Skia.
Some of the Skia shaders just get decimated by this change:

instructions helped:   shaders/skia/580-4.shader_test FS SIMD8:          81 -> 29 (-64.20%) (scheduled: top-down)

I looked at a couple of those shaders, and they had sequences like:

        vec1 32 ssa_44 = flt32 ssa_32, ssa_32
        vec1 32 ssa_45 = b32csel ssa_44, ssa_43, ssa_0
        vec1 32 ssa_46 = fge32 ssa_32, ssa_32
        vec1 32 ssa_47 = b32csel ssa_46, ssa_0, ssa_45
        vec1 32 ssa_48 = iand ssa_46, ssa_44
        vec1 32 ssa_49 = b32csel ssa_48, ssa_43, ssa_0

ssa_44 is replaced with False.  Then ssa_47 selects between ssa_0 and
ssa_0, so ssa_47 and ssa_46 are eliminated.  ssa_48 is (False && don't
care), so ssa_48 and ssa_49 are eliminated.  After that, many
calculations now involve constants of zero, so they are optimized down
too.  So it continues until there's not much left!

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21072238 -> 21071386 (<.01%)
instructions in affected programs: 33722 -> 32870 (-2.53%)
helped: 146
HURT: 1
helped stats (abs) min: 1 max: 62 x̄: 5.84 x̃: 2
helped stats (rel) min: 0.19% max: 62.35% x̄: 4.09% x̃: 1.07%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.20% max: 0.20% x̄: 0.20% x̃: 0.20%
95% mean confidence interval for instructions value: -7.94 -3.65
95% mean confidence interval for instructions %-change: -5.87% -2.25%
Instructions are helped.

total cycles in shared programs: 856203326 -> 856192238 (<.01%)
cycles in affected programs: 749966 -> 738878 (-1.48%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 1226 x̄: 74.92 x̃: 18
helped stats (rel) min: 0.07% max: 49.70% x̄: 2.69% x̃: 0.46%
95% mean confidence interval for cycles value: -104.82 -45.02
95% mean confidence interval for cycles %-change: -4.01% -1.37%
Cycles are helped.

LOST:   4
GAINED: 0

Fossil-db results:

Tiger Lake
Instructions in all programs: 160915223 -> 160898354 (-0.0%)
SENDs in all programs: 6812780 -> 6812780 (+0.0%)
Loops in all programs: 38340 -> 38340 (+0.0%)
Cycles in all programs: 7434144207 -> 7433978462 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304537 -> 304537 (+0.0%)

Ice Lake
Instructions in all programs: 145296298 -> 145279531 (-0.0%)
SENDs in all programs: 6863692 -> 6863692 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800257014 -> 8800088384 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)

Skylake
Instructions in all programs: 135891664 -> 135874910 (-0.0%)
SENDs in all programs: 6802946 -> 6802946 (+0.0%)
Loops in all programs: 38331 -> 38331 (+0.0%)
Cycles in all programs: 8444273433 -> 8444130932 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301114 -> 301114 (+0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 64bcfc3a17 nir/algebraic: Rearrange some logic-joined comparisons and reduce
On Skylake and Broadwell, a single big compute shader in Dirt Rally has
spills and fills *REALLY* helped.  That same shader is hurt very
slightly for spills and fills on Ice Lake.

v2: Move the patterns earlier to be nearer other patterns that are
similar.  Mark the replacement fmin and fmax exact.  Both suggested by
Rhys.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21073812 -> 21073041 (<.01%)
instructions in affected programs: 77608 -> 76837 (-0.99%)
helped: 522
HURT: 33
helped stats (abs) min: 1 max: 26 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.22% max: 14.29% x̄: 1.29% x̃: 1.02%
HURT stats (abs)   min: 1 max: 8 x̄: 1.67 x̃: 1
HURT stats (rel)   min: 0.25% max: 3.42% x̄: 1.06% x̃: 0.86%
95% mean confidence interval for instructions value: -1.57 -1.20
95% mean confidence interval for instructions %-change: -1.25% -1.05%
Instructions are helped.

total cycles in shared programs: 856224346 -> 856211096 (<.01%)
cycles in affected programs: 2394231 -> 2380981 (-0.55%)
helped: 603
HURT: 25
helped stats (abs) min: 1 max: 5218 x̄: 59.37 x̃: 28
helped stats (rel) min: 0.06% max: 5.61% x̄: 1.52% x̃: 1.37%
HURT stats (abs)   min: 2 max: 21394 x̄: 901.92 x̃: 10
HURT stats (rel)   min: 0.02% max: 5.90% x̄: 0.95% x̃: 0.59%
95% mean confidence interval for cycles value: -93.61 51.41
95% mean confidence interval for cycles %-change: -1.50% -1.34%
Inconclusive result (value mean confidence interval includes 0).

LOST:   1
GAINED: 1

Ice Lake
total instructions in shared programs: 20025692 -> 20024554 (<.01%)
instructions in affected programs: 104981 -> 103843 (-1.08%)
helped: 738
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 1.54 x̃: 1
helped stats (rel) min: 0.31% max: 10.53% x̄: 1.20% x̃: 1.06%
95% mean confidence interval for instructions value: -1.66 -1.43
95% mean confidence interval for instructions %-change: -1.26% -1.14%
Instructions are helped.

total cycles in shared programs: 979474407 -> 979422333 (<.01%)
cycles in affected programs: 4136364 -> 4084290 (-1.26%)
helped: 759
HURT: 59
helped stats (abs) min: 2 max: 11010 x̄: 72.78 x̃: 28
helped stats (rel) min: 0.03% max: 6.43% x̄: 1.23% x̃: 1.02%
HURT stats (abs)   min: 1 max: 698 x̄: 53.66 x̃: 8
HURT stats (rel)   min: 0.02% max: 24.05% x̄: 1.64% x̃: 0.33%
95% mean confidence interval for cycles value: -97.08 -30.24
95% mean confidence interval for cycles %-change: -1.14% -0.91%
Cycles are helped.

total spills in shared programs: 10568 -> 10569 (<.01%)
spills in affected programs: 102 -> 103 (0.98%)
helped: 0
HURT: 1

total fills in shared programs: 11347 -> 11349 (0.02%)
fills in affected programs: 277 -> 279 (0.72%)
helped: 0
HURT: 1

LOST:   2
GAINED: 2

Skylake
total instructions in shared programs: 18190419 -> 18188523 (-0.01%)
instructions in affected programs: 102502 -> 100606 (-1.85%)
helped: 791
HURT: 0
helped stats (abs) min: 1 max: 676 x̄: 2.40 x̃: 1
helped stats (rel) min: 0.34% max: 20.23% x̄: 1.41% x̃: 1.23%
95% mean confidence interval for instructions value: -4.07 -0.72
95% mean confidence interval for instructions %-change: -1.47% -1.34%
Instructions are helped.

total cycles in shared programs: 960737969 -> 960498951 (-0.02%)
cycles in affected programs: 4435351 -> 4196333 (-5.39%)
helped: 804
HURT: 67
helped stats (abs) min: 1 max: 198540 x̄: 300.54 x̃: 24
helped stats (rel) min: 0.03% max: 25.41% x̄: 1.21% x̃: 0.92%
HURT stats (abs)   min: 2 max: 680 x̄: 39.06 x̃: 6
HURT stats (rel)   min: 0.05% max: 23.98% x̄: 1.12% x̃: 0.19%
95% mean confidence interval for cycles value: -722.03 173.20
95% mean confidence interval for cycles %-change: -1.15% -0.91%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 9757 -> 9722 (-0.36%)
spills in affected programs: 138 -> 103 (-25.36%)
helped: 1
HURT: 0

total fills in shared programs: 9861 -> 9576 (-2.89%)
fills in affected programs: 564 -> 279 (-50.53%)
helped: 1
HURT: 0

LOST:   5
GAINED: 2

Broadwell
total instructions in shared programs: 17853870 -> 17852414 (<.01%)
instructions in affected programs: 101276 -> 99820 (-1.44%)
helped: 777
HURT: 0
helped stats (abs) min: 1 max: 264 x̄: 1.87 x̃: 1
helped stats (rel) min: 0.34% max: 8.44% x̄: 1.37% x̃: 1.23%
95% mean confidence interval for instructions value: -2.54 -1.21
95% mean confidence interval for instructions %-change: -1.42% -1.32%
Instructions are helped.

total cycles in shared programs: 1029846029 -> 1029725458 (-0.01%)
cycles in affected programs: 4435791 -> 4315220 (-2.72%)
helped: 813
HURT: 43
helped stats (abs) min: 2 max: 68560 x̄: 149.95 x̃: 24
helped stats (rel) min: 0.02% max: 73.73% x̄: 1.43% x̃: 0.92%
HURT stats (abs)   min: 2 max: 726 x̄: 31.12 x̃: 13
HURT stats (rel)   min: 0.01% max: 8.43% x̄: 0.62% x̃: 0.31%
95% mean confidence interval for cycles value: -299.58 17.87
95% mean confidence interval for cycles %-change: -1.63% -1.02%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 20333 -> 20307 (-0.13%)
spills in affected programs: 151 -> 125 (-17.22%)
helped: 1
HURT: 0

total fills in shared programs: 25899 -> 25775 (-0.48%)
fills in affected programs: 573 -> 449 (-21.64%)
helped: 1
HURT: 0

LOST:   5
GAINED: 0

Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 16417658 -> 16416320 (<.01%)
instructions in affected programs: 96495 -> 95157 (-1.39%)
helped: 774
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.73 x̃: 1
helped stats (rel) min: 0.33% max: 9.80% x̄: 1.52% x̃: 1.20%
95% mean confidence interval for instructions value: -1.83 -1.63
95% mean confidence interval for instructions %-change: -1.59% -1.46%
Instructions are helped.

total cycles in shared programs: 1037104346 -> 1037080579 (<.01%)
cycles in affected programs: 3787747 -> 3763980 (-0.63%)
helped: 791
HURT: 53
helped stats (abs) min: 1 max: 5411 x̄: 65.87 x̃: 32
helped stats (rel) min: 0.02% max: 21.17% x̄: 1.44% x̃: 1.18%
HURT stats (abs)   min: 2 max: 14160 x̄: 534.72 x̃: 18
HURT stats (rel)   min: 0.02% max: 15.37% x̄: 5.70% x̃: 0.54%
95% mean confidence interval for cycles value: -69.39 13.07
95% mean confidence interval for cycles %-change: -1.19% -0.80%
Inconclusive result (value mean confidence interval includes 0).

LOST:   12
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8132855 -> 8132703 (<.01%)
instructions in affected programs: 8782 -> 8630 (-1.73%)
helped: 38
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.66% max: 3.23% x̄: 1.77% x̃: 1.72%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.88% -1.65%
Instructions are helped.

total cycles in shared programs: 238300850 -> 238298568 (<.01%)
cycles in affected programs: 257202 -> 254920 (-0.89%)
helped: 62
HURT: 2
helped stats (abs) min: 4 max: 58 x̄: 36.90 x̃: 50
helped stats (rel) min: 0.15% max: 1.55% x̄: 0.87% x̃: 1.12%
HURT stats (abs)   min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 0.12% max: 0.22% x̄: 0.17% x̃: 0.17%
95% mean confidence interval for cycles value: -41.34 -29.98
95% mean confidence interval for cycles %-change: -0.95% -0.73%
Cycles are helped.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 145296888 -> 145296346 (-0.0%)
SENDs in all programs: 6863696 -> 6863696 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800262303 -> 8800258950 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick adc2835646 nir/algebraic: Mark some more logic-joined comparison reductions as exact
If the values are known to be numbers, the the replacements are exact.
This is only applied to the patterns with constants.  Constants should
always be numbers, and shaders with NaN constants should be handled in a
different way.

No shader-db or fossil-db changes on any Intel platform.  The intention
is to make these patterns more future proof.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 23bbf3932b nir/algebraic: Mark some more comparison reductions exact
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Haswell and later Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049056 -> 21048939 (<.01%)
instructions in affected programs: 4716 -> 4599 (-2.48%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.99% max: 5.43% x̄: 2.80% x̃: 2.51%
95% mean confidence interval for instructions value: -3.46 -2.54
95% mean confidence interval for instructions %-change: -3.22% -2.38%
Instructions are helped.

total cycles in shared programs: 855141411 -> 855141159 (<.01%)
cycles in affected programs: 54491 -> 54239 (-0.46%)
helped: 28
HURT: 5
helped stats (abs) min: 2 max: 34 x̄: 12.82 x̃: 12
helped stats (rel) min: 0.06% max: 2.73% x̄: 0.94% x̃: 0.75%
HURT stats (abs)   min: 2 max: 52 x̄: 21.40 x̃: 6
HURT stats (rel)   min: 0.11% max: 2.46% x̄: 0.90% x̃: 0.56%
95% mean confidence interval for cycles value: -13.72 -1.55
95% mean confidence interval for cycles %-change: -1.01% -0.31%
Cycles are helped.

Tiger Lake
Instructions in all programs: 160902191 -> 160899554 (-0.0%)
SENDs in all programs: 6812435 -> 6812435 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7428581420 -> 7428555881 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)

A lot of fragment shaders in Shadow of the Tomb Raider were helped, and
a bunch of vertex shaders in Octopath Traveler were hurt.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 7d85dc4f35 nir/algebraic: Equality comparison inversions require sources be numbers
v2: Update A630 expected image checksum for minetest.trace.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21036690 -> 21049485 (0.06%)
instructions in affected programs: 852085 -> 864880 (1.50%)
helped: 240
HURT: 2514
helped stats (abs) min: 1 max: 46 x̄: 2.45 x̃: 2
helped stats (rel) min: 0.15% max: 4.30% x̄: 0.79% x̃: 0.55%
HURT stats (abs)   min: 1 max: 198 x̄: 5.32 x̃: 2
HURT stats (rel)   min: 0.06% max: 10.71% x̄: 1.48% x̃: 1.04%
95% mean confidence interval for instructions value: 4.14 5.15
95% mean confidence interval for instructions %-change: 1.23% 1.34%
Instructions are HURT.

total cycles in shared programs: 856045255 -> 855816220 (-0.03%)
cycles in affected programs: 16743786 -> 16514751 (-1.37%)
helped: 790
HURT: 1973
helped stats (abs) min: 1 max: 10766 x̄: 627.97 x̃: 18
helped stats (rel) min: <.01% max: 32.59% x̄: 3.01% x̃: 0.64%
HURT stats (abs)   min: 1 max: 4078 x̄: 135.36 x̃: 18
HURT stats (rel)   min: <.01% max: 54.56% x̄: 2.80% x̃: 0.82%
95% mean confidence interval for cycles value: -131.36 -34.42
95% mean confidence interval for cycles %-change: 0.88% 1.40%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total spills in shared programs: 9771 -> 9766 (-0.05%)
spills in affected programs: 47 -> 42 (-10.64%)
helped: 1
HURT: 0

total fills in shared programs: 9451 -> 9430 (-0.22%)
fills in affected programs: 91 -> 70 (-23.08%)
helped: 1
HURT: 0

LOST:   16
GAINED: 51

All Intel GPUs from Sandybridge through Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 20024781 -> 20025568 (<.01%)
instructions in affected programs: 103309 -> 104096 (0.76%)
helped: 12
HURT: 389
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.20% max: 2.70% x̄: 1.36% x̃: 1.37%
HURT stats (abs)   min: 1 max: 8 x̄: 2.06 x̃: 1
HURT stats (rel)   min: 0.05% max: 7.14% x̄: 1.25% x̃: 0.95%
95% mean confidence interval for instructions value: 1.78 2.15
95% mean confidence interval for instructions %-change: 1.06% 1.28%
Instructions are HURT.

total cycles in shared programs: 979419070 -> 979439180 (<.01%)
cycles in affected programs: 4968711 -> 4988821 (0.40%)
helped: 60
HURT: 381
helped stats (abs) min: 1 max: 1296 x̄: 96.92 x̃: 26
helped stats (rel) min: <.01% max: 27.10% x̄: 1.64% x̃: 0.65%
HURT stats (abs)   min: 1 max: 7320 x̄: 68.04 x̃: 30
HURT stats (rel)   min: <.01% max: 19.77% x̄: 1.32% x̃: 0.87%
95% mean confidence interval for cycles value: 10.25 80.95
95% mean confidence interval for cycles %-change: 0.69% 1.15%
Cycles are HURT.

LOST:   1
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8128474 -> 8132527 (0.05%)
instructions in affected programs: 642323 -> 646376 (0.63%)
helped: 12
HURT: 1972
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
helped stats (rel) min: 0.72% max: 1.72% x̄: 1.09% x̃: 0.83%
HURT stats (abs)   min: 1 max: 16 x̄: 2.07 x̃: 3
HURT stats (rel)   min: 0.12% max: 7.14% x̄: 0.77% x̃: 0.70%
95% mean confidence interval for instructions value: 1.99 2.10
95% mean confidence interval for instructions %-change: 0.74% 0.79%
Instructions are HURT.

total cycles in shared programs: 238280994 -> 238294376 (<.01%)
cycles in affected programs: 8841250 -> 8854632 (0.15%)
helped: 84
HURT: 1192
helped stats (abs) min: 4 max: 64 x̄: 12.50 x̃: 8
helped stats (rel) min: 0.02% max: 1.61% x̄: 0.28% x̃: 0.17%
HURT stats (abs)   min: 2 max: 198 x̄: 12.11 x̃: 12
HURT stats (rel)   min: 0.02% max: 8.03% x̄: 0.28% x̃: 0.14%
95% mean confidence interval for cycles value: 9.65 11.32
95% mean confidence interval for cycles %-change: 0.22% 0.27%
Cycles are HURT.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 4246c2869c nir/algebraic: Invert comparisons less often
This fixes the piglit test range_analysis_fsat_of_nan.shader_test.  That
test contains some code like

    o = saturate(X) > 0 ? vec4(1.0, 0.0, 0.0, 1.0)
                        : vec4(0.0, 1.0, 0.0, 1.0);

A clever optimizer will convert this to

    o = vec4(float(saturate(X) > 0),
             float(!(saturate(X) > 0)),
             0, 1);

Due to the ordering of optimizations in the compiler, the `saturate`
operations are removed.  This is safe even in the presense of NaN.

    o = vec4(float(X > 0), float(!(X > 0)), 0, 1);

Since the calculations are not marked precise, an overzealous
optimizer may reduce this to

    o = vec4(float(X > 0), float(X <= 0), 0, 1);

This will result in black being output.  The GLSL spec gives quite a bit
of leeway with respect to NaN, but that seems too far.  The shader
author asked for a result of red or green.  A result of black is still
"undefined behavior," but it's also a little mean.

This also enables CSE to do its job better.

v2: Update A530 expected image checksum for minetest.trace.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4531
Fixes: 0dbda153aa ("nir/algebraic: Flag inexact optimizations")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21041563 -> 21041789 (<.01%)
instructions in affected programs: 992066 -> 992292 (0.02%)
helped: 526
HURT: 548
helped stats (abs) min: 1 max: 16 x̄: 2.48 x̃: 2
helped stats (rel) min: 0.04% max: 5.56% x̄: 0.74% x̃: 0.49%
HURT stats (abs)   min: 1 max: 27 x̄: 2.80 x̃: 2
HURT stats (rel)   min: 0.04% max: 4.55% x̄: 0.59% x̃: 0.38%
95% mean confidence interval for instructions value: -0.00 0.42
95% mean confidence interval for instructions %-change: -0.12% <.01%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 855885569 -> 856118189 (0.03%)
cycles in affected programs: 343637248 -> 343869868 (0.07%)
helped: 907
HURT: 541
helped stats (abs) min: 1 max: 7724 x̄: 206.45 x̃: 36
helped stats (rel) min: <.01% max: 29.97% x̄: 1.01% x̃: 0.37%
HURT stats (abs)   min: 1 max: 14177 x̄: 776.09 x̃: 31
HURT stats (rel)   min: <.01% max: 29.94% x̄: 1.24% x̃: 0.35%
95% mean confidence interval for cycles value: 84.30 237.00
95% mean confidence interval for cycles %-change: -0.32% -0.01%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

LOST:   3
GAINED: 5

Ice Lake
total instructions in shared programs: 20027107 -> 20025352 (<.01%)
instructions in affected programs: 1068856 -> 1067101 (-0.16%)
helped: 1153
HURT: 273
helped stats (abs) min: 1 max: 14 x̄: 1.83 x̃: 1
helped stats (rel) min: 0.03% max: 5.66% x̄: 0.61% x̃: 0.35%
HURT stats (abs)   min: 1 max: 15 x̄: 1.29 x̃: 1
HURT stats (rel)   min: 0.16% max: 1.30% x̄: 0.58% x̃: 0.60%
95% mean confidence interval for instructions value: -1.33 -1.13
95% mean confidence interval for instructions %-change: -0.43% -0.34%
Instructions are helped.

total cycles in shared programs: 979499227 -> 979448725 (<.01%)
cycles in affected programs: 344261539 -> 344211037 (-0.01%)
helped: 1079
HURT: 441
helped stats (abs) min: 1 max: 9384 x̄: 147.78 x̃: 48
helped stats (rel) min: <.01% max: 31.83% x̄: 0.90% x̃: 0.33%
HURT stats (abs)   min: 1 max: 7220 x̄: 247.07 x̃: 32
HURT stats (rel)   min: <.01% max: 31.30% x̄: 1.52% x̃: 0.53%
95% mean confidence interval for cycles value: -70.01 3.56
95% mean confidence interval for cycles %-change: -0.35% -0.05%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 10564 -> 10568 (0.04%)
spills in affected programs: 143 -> 147 (2.80%)
helped: 0
HURT: 1

total fills in shared programs: 11343 -> 11347 (0.04%)
fills in affected programs: 287 -> 291 (1.39%)
helped: 0
HURT: 1

LOST:   3
GAINED: 2

Skylake
total instructions in shared programs: 18192274 -> 18190128 (-0.01%)
instructions in affected programs: 1000188 -> 998042 (-0.21%)
helped: 1149
HURT: 55
helped stats (abs) min: 1 max: 14 x̄: 1.92 x̃: 1
helped stats (rel) min: 0.04% max: 6.67% x̄: 0.67% x̃: 0.42%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.55% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.87 -1.69
95% mean confidence interval for instructions %-change: -0.67% -0.58%
Instructions are helped.

total cycles in shared programs: 960856054 -> 960728040 (-0.01%)
cycles in affected programs: 340840968 -> 340712954 (-0.04%)
helped: 1079
HURT: 233
helped stats (abs) min: 1 max: 7640 x̄: 170.95 x̃: 46
helped stats (rel) min: <.01% max: 30.20% x̄: 0.96% x̃: 0.28%
HURT stats (abs)   min: 1 max: 6864 x̄: 242.23 x̃: 26
HURT stats (rel)   min: <.01% max: 34.64% x̄: 2.10% x̃: 0.22%
95% mean confidence interval for cycles value: -135.62 -59.53
95% mean confidence interval for cycles %-change: -0.59% -0.25%
Cycles are helped.

LOST:   15
GAINED: 1

Broadwell
total instructions in shared programs: 17855624 -> 17853580 (-0.01%)
instructions in affected programs: 1012209 -> 1010165 (-0.20%)
helped: 1105
HURT: 52
helped stats (abs) min: 1 max: 13 x̄: 1.90 x̃: 1
helped stats (rel) min: 0.03% max: 6.67% x̄: 0.67% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.13% max: 0.52% x̄: 0.26% x̃: 0.25%
95% mean confidence interval for instructions value: -1.86 -1.67
95% mean confidence interval for instructions %-change: -0.68% -0.58%
Instructions are helped.

total cycles in shared programs: 1029905447 -> 1029840699 (<.01%)
cycles in affected programs: 347102680 -> 347037932 (-0.02%)
helped: 1007
HURT: 211
helped stats (abs) min: 1 max: 1360 x̄: 89.76 x̃: 48
helped stats (rel) min: <.01% max: 16.26% x̄: 0.69% x̃: 0.25%
HURT stats (abs)   min: 1 max: 1297 x̄: 121.51 x̃: 20
HURT stats (rel)   min: <.01% max: 31.31% x̄: 1.21% x̃: 0.20%
95% mean confidence interval for cycles value: -62.39 -43.92
95% mean confidence interval for cycles %-change: -0.47% -0.25%
Cycles are helped.

total spills in shared programs: 20335 -> 20333 (<.01%)
spills in affected programs: 19 -> 17 (-10.53%)
helped: 2
HURT: 0

total fills in shared programs: 25905 -> 25899 (-0.02%)
fills in affected programs: 23 -> 17 (-26.09%)
helped: 2
HURT: 0

LOST:   9
GAINED: 0

Haswell
total instructions in shared programs: 16418516 -> 16417293 (<.01%)
instructions in affected programs: 223785 -> 222562 (-0.55%)
helped: 590
HURT: 67
helped stats (abs) min: 1 max: 15 x̄: 2.19 x̃: 1
helped stats (rel) min: 0.03% max: 6.52% x̄: 0.87% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.04 x̃: 1
HURT stats (rel)   min: 0.04% max: 1.85% x̄: 0.44% x̃: 0.25%
95% mean confidence interval for instructions value: -2.01 -1.71
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.

total cycles in shared programs: 1037179754 -> 1037084874 (<.01%)
cycles in affected programs: 352541071 -> 352446191 (-0.03%)
helped: 1093
HURT: 182
helped stats (abs) min: 1 max: 888 x̄: 111.03 x̃: 64
helped stats (rel) min: <.01% max: 27.30% x̄: 0.84% x̃: 0.20%
HURT stats (abs)   min: 1 max: 6777 x̄: 145.49 x̃: 21
HURT stats (rel)   min: <.01% max: 24.10% x̄: 1.99% x̃: 0.29%
95% mean confidence interval for cycles value: -88.10 -60.73
95% mean confidence interval for cycles %-change: -0.58% -0.29%
Cycles are helped.

total spills in shared programs: 17457 -> 17456 (<.01%)
spills in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0

total fills in shared programs: 20387 -> 20385 (<.01%)
fills in affected programs: 15 -> 13 (-13.33%)
helped: 1
HURT: 0

LOST:   6
GAINED: 1

Ivy Bridge and earlier platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15515482 -> 15513998 (<.01%)
instructions in affected programs: 239739 -> 238255 (-0.62%)
helped: 573
HURT: 57
helped stats (abs) min: 1 max: 20 x̄: 2.73 x̃: 2
helped stats (rel) min: 0.03% max: 9.84% x̄: 0.94% x̃: 0.55%
HURT stats (abs)   min: 1 max: 2 x̄: 1.39 x̃: 1
HURT stats (rel)   min: 0.09% max: 1.85% x̄: 0.52% x̃: 0.35%
95% mean confidence interval for instructions value: -2.57 -2.14
95% mean confidence interval for instructions %-change: -0.89% -0.73%
Instructions are helped.

total cycles in shared programs: 584509880 -> 584463152 (<.01%)
cycles in affected programs: 11765280 -> 11718552 (-0.40%)
helped: 661
HURT: 152
helped stats (abs) min: 1 max: 3073 x̄: 101.99 x̃: 32
helped stats (rel) min: <.01% max: 34.38% x̄: 1.46% x̃: 0.50%
HURT stats (abs)   min: 1 max: 6637 x̄: 136.10 x̃: 15
HURT stats (rel)   min: <.01% max: 24.19% x̄: 1.75% x̃: 0.25%
95% mean confidence interval for cycles value: -82.79 -32.16
95% mean confidence interval for cycles %-change: -1.11% -0.61%
Cycles are helped.

LOST:   9
GAINED: 0

Tiger Lake
Instructions in all programs: 160905127 -> 160900949 (-0.0%)
SENDs in all programs: 6812418 -> 6812085 (-0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7431911114 -> 7433914697 (+0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304537 (-0.0%)

Ice Lake
Instructions in all programs: 145296733 -> 145292370 (-0.0%)
SENDs in all programs: 6863818 -> 6863485 (-0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798257570 -> 8800204360 (+0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334248 (-0.0%)

Skylake
Instructions in all programs: 135891485 -> 135887357 (-0.0%)
SENDs in all programs: 6803031 -> 6802698 (-0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442221881 -> 8444201959 (+0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301114 (-0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 49177b9e2f nir/algebraic: Tautology replacements require sources be numbers
It seems worth the small amount of damage to give an extra cushion of
not having to debug problems later.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043197 -> 21043359 (<.01%)
instructions in affected programs: 4409 -> 4571 (3.67%)
helped: 0
HURT: 25
HURT stats (abs)   min: 1 max: 16 x̄: 6.48 x̃: 5
HURT stats (rel)   min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40%
95% mean confidence interval for instructions value: 4.37 8.59
95% mean confidence interval for instructions %-change: 2.93% 6.26%
Instructions are HURT.

total cycles in shared programs: 856175986 -> 856176921 (<.01%)
cycles in affected programs: 58908 -> 59843 (1.59%)
helped: 0
HURT: 25
HURT stats (abs)   min: 7 max: 70 x̄: 37.40 x̃: 38
HURT stats (rel)   min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39%
95% mean confidence interval for cycles value: 31.11 43.69
95% mean confidence interval for cycles %-change: 1.35% 2.39%
Cycles are HURT.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick d69ba58644 nir/algebraic: Remove some optimizations of comparisons with fsat
When most of these patterns were created, we believed, incorrectly, that
fsat(NaN) was NaN.  We have since realized that fsat(NaN) is zero.
Originally, this changed the patterns to use is_a_number.  This didn't
help any shaders, so it's easier to just drop the optimizations.

This commit crossed paths with 4c3ad4d065 ("nir/algebraic: mark more
optimization with fsat(NaN) as inexact") and bc123c396a
("nir/algebraic: mark some optimizations with fsat(NaN) as inexact").
Given that these don't impact very many shaders, it seems safer to just
remove them.

As discussed in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716, I tried
modifying these patterns to use !(b cmp a).  Unfortunately, on Intel
GPUs, the results were much worse than just removing the patterns
altogether.

Some other related patterns will be addressed in later commits.

There are still a number of patterns that use the identity fsat(1-X) ==
1 - fsat(X).  If X is NaN, the former is zero while the latter is 1.0.
I haven't evaluted these patterns yet.  If changes are needed in these
patterns, it should be a separate commit anyway.

v2: Replace arrow `=>` with `->` in comments because the `=>` looks a
lot like `<=` comparison.  Suggested by Rhys.

Fixes: 92b75c126b ("nir/algebraic: Replace checks that a value is between (or not) [0, 1]")
Fixes: a7f0c57673 ("nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel hardware had similar results. (Ice Lake shown)
total instructions in shared programs: 20029060 -> 20029670 (<.01%)
instructions in affected programs: 69236 -> 69846 (0.88%)
helped: 0
HURT: 263
HURT stats (abs)   min: 1 max: 20 x̄: 2.32 x̃: 1
HURT stats (rel)   min: 0.30% max: 11.11% x̄: 1.35% x̃: 0.98%
95% mean confidence interval for instructions value: 1.86 2.78
95% mean confidence interval for instructions %-change: 1.18% 1.52%
Instructions are HURT.

total cycles in shared programs: 979821278 -> 979834425 (<.01%)
cycles in affected programs: 1476848 -> 1489995 (0.89%)
helped: 49
HURT: 204
helped stats (abs) min: 1 max: 812 x̄: 102.31 x̃: 20
helped stats (rel) min: 0.01% max: 21.43% x̄: 2.23% x̃: 0.52%
HURT stats (abs)   min: 2 max: 2600 x̄: 89.02 x̃: 16
HURT stats (rel)   min: 0.04% max: 27.27% x̄: 1.49% x̃: 0.72%
95% mean confidence interval for cycles value: 13.18 90.75
95% mean confidence interval for cycles %-change: 0.29% 1.25%
Cycles are HURT.

No fossil-db changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Jesse Natalie d7ca0319d7 nir: Add relaxed 24bit opcodes
These are equivalent to the 32bit opcodes if there are no more efficient
24bit opcodes available, but inputs are guaranteed to already be 24bit,
so the 24bit opcodes can be used instead if they exist and are efficient.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10549>
2021-05-05 22:06:42 +00:00
Gert Wollny a199697642 nir/opt_algebraic: optimizations for add umax/umin with zero
For unsigned comparisons with zero these ops can be eliminated.

v2: Add comparison optimizations with -1 (Rhys Perry)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10583>
2021-05-04 09:33:32 +02:00
Jesse Natalie 3c8bcdc863 nir: Add a new opcode for [un]packing doubles
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.

Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
2021-04-09 01:54:33 +00:00
Gert Wollny 0f5b3c37c5 nir: Add opcodes for fused comp + csel and optimizations
Some backends, like r600 support a fused version of int and float compare
against zero and and csel. Adding these opcodes here makes it possible to
optimize this in nir.

v2: Add rules for float compare + csel

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>
2021-03-22 15:19:46 +01:00
Gert Wollny a5747f8ab3 nir: add opcodes for *find_msb_rev and lowering
Some hardware supports a version of find_msb where the bits are counted
starting at the high bit, and this needs some lowering to obtain the
value that is expected by *find_msb

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>
2021-03-22 15:19:46 +01:00
Timur Kristóf 132171dc4e nir: Add a few more algebraic optimizations to help address calculation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9201>
2021-03-17 12:42:23 +00:00
Ian Romanick 2c4fd24c01 nir/algebraic: Apply addition property of equality to the other ordering too
Inequality comparison operations are not commutative, so `foo < bar` and
`bar < foo` both have to be explicitly listed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs: 20027051 -> 20026899 (<.01%)
instructions in affected programs: 37181 -> 37029 (-0.41%)
helped: 85
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.79 x̃: 1
helped stats (rel) min: 0.05% max: 6.78% x̄: 0.92% x̃: 0.68%
95% mean confidence interval for instructions value: -2.42 -1.15
95% mean confidence interval for instructions %-change: -1.23% -0.61%
Instructions are helped.

total cycles in shared programs: 979762793 -> 979753527 (<.01%)
cycles in affected programs: 2653905 -> 2644639 (-0.35%)
helped: 104
HURT: 50
helped stats (abs) min: 1 max: 1048 x̄: 119.99 x̃: 11
helped stats (rel) min: <.01% max: 9.88% x̄: 0.77% x̃: 0.20%
HURT stats (abs)   min: 1 max: 734 x̄: 64.26 x̃: 8
HURT stats (rel)   min: <.01% max: 3.06% x̄: 0.36% x̃: 0.10%
95% mean confidence interval for cycles value: -98.65 -21.68
95% mean confidence interval for cycles %-change: -0.66% -0.15%
Cycles are helped.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9374>
2021-03-04 22:50:53 +00:00
Ian Romanick 33031bdab6 nir/algebraic: Apply addition property of equality more conservatively
This allows a lot more CSE.  Depending on where the addition and the
comparison are scheduled, it may also reduce register pressure by
reducing the live range of the addends.

Across all the platforms, the shaders affected for spills or fills were
all fragment shaders from Dirt Rally.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043103 -> 21038804 (-0.02%)
instructions in affected programs: 892878 -> 888579 (-0.48%)
helped: 1549
HURT: 724
helped stats (abs) min: 1 max: 225 x̄: 4.14 x̃: 2
helped stats (rel) min: 0.05% max: 11.18% x̄: 1.04% x̃: 0.78%
HURT stats (abs)   min: 1 max: 71 x̄: 2.93 x̃: 1
HURT stats (rel)   min: 0.07% max: 6.90% x̄: 0.80% x̃: 0.56%
95% mean confidence interval for instructions value: -2.33 -1.45
95% mean confidence interval for instructions %-change: -0.50% -0.40%
Instructions are helped.

total cycles in shared programs: 855054155 -> 855757566 (0.08%)
cycles in affected programs: 58275918 -> 58979329 (1.21%)
helped: 1213
HURT: 1680
helped stats (abs) min: 1 max: 107405 x̄: 1684.00 x̃: 10
helped stats (rel) min: <.01% max: 38.09% x̄: 1.51% x̃: 0.25%
HURT stats (abs)   min: 1 max: 126632 x̄: 1634.59 x̃: 12
HURT stats (rel)   min: <.01% max: 85.91% x̄: 2.75% x̃: 0.49%
95% mean confidence interval for cycles value: -98.06 584.35
95% mean confidence interval for cycles %-change: 0.71% 1.22%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 9843 -> 9771 (-0.73%)
spills in affected programs: 72 -> 0
helped: 5
HURT: 0

total fills in shared programs: 9600 -> 9451 (-1.55%)
fills in affected programs: 149 -> 0
helped: 5
HURT: 0

LOST:   14
GAINED: 9

Skylake
total instructions in shared programs: 18185074 -> 18183866 (<.01%)
instructions in affected programs: 575180 -> 573972 (-0.21%)
helped: 1286
HURT: 468
helped stats (abs) min: 1 max: 15 x̄: 1.55 x̃: 1
helped stats (rel) min: 0.03% max: 4.08% x̄: 0.67% x̃: 0.65%
HURT stats (abs)   min: 1 max: 8 x̄: 1.69 x̃: 1
HURT stats (rel)   min: 0.13% max: 7.69% x̄: 0.87% x̃: 0.45%
95% mean confidence interval for instructions value: -0.77 -0.60
95% mean confidence interval for instructions %-change: -0.30% -0.22%
Instructions are helped.

total cycles in shared programs: 960518105 -> 960608234 (<.01%)
cycles in affected programs: 42536073 -> 42626202 (0.21%)
helped: 1210
HURT: 1714
helped stats (abs) min: 1 max: 7015 x̄: 123.41 x̃: 10
helped stats (rel) min: <.01% max: 33.76% x̄: 1.32% x̃: 0.26%
HURT stats (abs)   min: 1 max: 14474 x̄: 139.71 x̃: 14
HURT stats (rel)   min: <.01% max: 58.94% x̄: 2.00% x̃: 0.44%
95% mean confidence interval for cycles value: 4.02 57.63
95% mean confidence interval for cycles %-change: 0.43% 0.82%
Cycles are HURT.

LOST:   16
GAINED: 42

Broadwell
total instructions in shared programs: 17856880 -> 17852158 (-0.03%)
instructions in affected programs: 564836 -> 560114 (-0.84%)
helped: 1243
HURT: 418
helped stats (abs) min: 1 max: 115 x̄: 4.36 x̃: 1
helped stats (rel) min: 0.03% max: 9.67% x̄: 0.90% x̃: 0.67%
HURT stats (abs)   min: 1 max: 8 x̄: 1.67 x̃: 1
HURT stats (rel)   min: 0.14% max: 7.69% x̄: 0.89% x̃: 0.46%
95% mean confidence interval for instructions value: -3.45 -2.23
95% mean confidence interval for instructions %-change: -0.51% -0.38%
Instructions are helped.

total cycles in shared programs: 1031140321 -> 1029856892 (-0.12%)
cycles in affected programs: 66986946 -> 65703517 (-1.92%)
helped: 1084
HURT: 1653
helped stats (abs) min: 1 max: 415168 x̄: 1835.32 x̃: 10
helped stats (rel) min: <.01% max: 57.16% x̄: 1.19% x̃: 0.28%
HURT stats (abs)   min: 1 max: 43930 x̄: 427.14 x̃: 12
HURT stats (rel)   min: <.01% max: 57.53% x̄: 1.32% x̃: 0.39%
95% mean confidence interval for cycles value: -915.76 -22.07
95% mean confidence interval for cycles %-change: 0.17% 0.47%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total spills in shared programs: 20891 -> 20335 (-2.66%)
spills in affected programs: 1567 -> 1011 (-35.48%)
helped: 70
HURT: 0

total fills in shared programs: 27307 -> 25905 (-5.13%)
fills in affected programs: 5381 -> 3979 (-26.05%)
helped: 71
HURT: 0

LOST:   17
GAINED: 20

Haswell
total instructions in shared programs: 16411850 -> 16409414 (-0.01%)
instructions in affected programs: 602666 -> 600230 (-0.40%)
helped: 1152
HURT: 781
helped stats (abs) min: 1 max: 103 x̄: 3.59 x̃: 1
helped stats (rel) min: 0.03% max: 8.61% x̄: 0.85% x̃: 0.65%
HURT stats (abs)   min: 1 max: 41 x̄: 2.18 x̃: 1
HURT stats (rel)   min: 0.12% max: 7.69% x̄: 0.88% x̃: 0.69%
95% mean confidence interval for instructions value: -1.74 -0.78
95% mean confidence interval for instructions %-change: -0.21% -0.10%
Instructions are helped.

total cycles in shared programs: 1035338781 -> 1036977801 (0.16%)
cycles in affected programs: 68961096 -> 70600116 (2.38%)
helped: 1246
HURT: 2206
helped stats (abs) min: 1 max: 392022 x̄: 1040.28 x̃: 14
helped stats (rel) min: <.01% max: 56.44% x̄: 2.32% x̃: 0.38%
HURT stats (abs)   min: 1 max: 68630 x̄: 1330.56 x̃: 18
HURT stats (rel)   min: <.01% max: 69.97% x̄: 3.31% x̃: 0.61%
95% mean confidence interval for cycles value: 90.43 859.17
95% mean confidence interval for cycles %-change: 1.02% 1.54%
Cycles are HURT.

total spills in shared programs: 17805 -> 17457 (-1.95%)
spills in affected programs: 1202 -> 854 (-28.95%)
helped: 34
HURT: 31

total fills in shared programs: 20939 -> 20387 (-2.64%)
fills in affected programs: 2702 -> 2150 (-20.43%)
helped: 34
HURT: 31

LOST:   24
GAINED: 45

Ivy Bridge and earlier Intel GPUs had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15515912 -> 15516757 (<.01%)
instructions in affected programs: 396569 -> 397414 (0.21%)
helped: 578
HURT: 858
helped stats (abs) min: 1 max: 9 x̄: 1.32 x̃: 1
helped stats (rel) min: 0.04% max: 3.70% x̄: 0.65% x̃: 0.65%
HURT stats (abs)   min: 1 max: 11 x̄: 1.87 x̃: 1
HURT stats (rel)   min: 0.08% max: 12.90% x̄: 0.95% x̃: 0.53%
95% mean confidence interval for instructions value: 0.47 0.70
95% mean confidence interval for instructions %-change: 0.24% 0.37%
Instructions are HURT.

total cycles in shared programs: 584395455 -> 584466352 (0.01%)
cycles in affected programs: 20346570 -> 20417467 (0.35%)
helped: 1192
HURT: 1896
helped stats (abs) min: 1 max: 4108 x̄: 123.27 x̃: 14
helped stats (rel) min: <.01% max: 37.20% x̄: 2.27% x̃: 0.46%
HURT stats (abs)   min: 1 max: 3698 x̄: 114.89 x̃: 19
HURT stats (rel)   min: <.01% max: 70.28% x̄: 3.02% x̃: 0.71%
95% mean confidence interval for cycles value: 10.75 35.16
95% mean confidence interval for cycles %-change: 0.73% 1.23%
Cycles are HURT.

LOST:   20
GAINED: 12
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9374>
2021-03-04 22:50:53 +00:00
Timothy Arceri 9f474bd4b4 nir: handle negatives in ffma reassociation optimisation
shader-db results Iris (BDW):

total instructions in shared programs: 16632076 -> 16631057 (<.01%)
instructions in affected programs: 48010 -> 46991 (-2.12%)
helped: 47
HURT: 6

total cycles in shared programs: 915266726 -> 915263622 (<.01%)
cycles in affected programs: 1182283 -> 1179179 (-0.26%)
helped: 18
HURT: 27

total loops in shared programs: 4929 -> 4929 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 18834 -> 18801 (-0.18%)
spills in affected programs: 525 -> 492 (-6.29%)
helped: 3
HURT: 0

total fills in shared programs: 23008 -> 22981 (-0.12%)
fills in affected programs: 435 -> 408 (-6.21%)
helped: 3
HURT: 0

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8608>
2021-02-22 00:49:13 +00:00
Ian Romanick 3250e04d25 nir/algebraic: Add some max/min optimizations with 3 variables
Specifically, ARB assembly shaders with code like

    SLT    r0, r0, c[0].xxxx;
    ...
    KIL    r0.xyzx;

can result in this pattern.  The other cases (e.g., 'KIL r0.xxxx' and
'KIL r0.xyxx') are handled by existing patterns.

Reviewed-by: Matt Turner <mattst88@gmail.com>

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21050098 -> 21050065 (<.01%)
instructions in affected programs: 2062 -> 2029 (-1.60%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.10 x̃: 1
helped stats (rel) min: 1.14% max: 4.35% x̄: 1.89% x̃: 1.69%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.65% max: 0.65% x̄: 0.65% x̃: 0.65%
95% mean confidence interval for instructions value: -1.23 -0.84
95% mean confidence interval for instructions %-change: -2.12% -1.50%
Instructions are helped.

total cycles in shared programs: 855105466 -> 855105055 (<.01%)
cycles in affected programs: 50136 -> 49725 (-0.82%)
helped: 33
HURT: 0
helped stats (abs) min: 3 max: 22 x̄: 12.45 x̃: 12
helped stats (rel) min: 0.13% max: 1.57% x̄: 0.86% x̃: 0.92%
95% mean confidence interval for cycles value: -13.78 -11.13
95% mean confidence interval for cycles %-change: -0.97% -0.76%
Cycles are helped.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9122>
2021-02-19 17:31:27 -08:00
Ian Romanick d9b5bce85a nir/algebraic: Remove some redundant b2f logic-op reduction patterns
There are patterns that will re-write the fmin or fmax part into a form
that other patterns will gradually convert to the same ior or iand.  For
example,

    fmax(b2f(a), b2f(b)) != 0
    b2f(a || b) != 0
    a || b

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9122>
2021-02-19 17:31:24 -08:00
Ian Romanick 7e127c1fca nir/algebraic: Fix some min/max of b2f replacements
fmin(-A, -B) is -fmax(A, B), and fmax(-A, -B) is -fmin(A, B).  Therefore
the logic joining A and B should toggle between ior and iand for the
negated versions.

At the very least, a shader from Euro Truck Simulator 2 in shader-db is
affected by this.  The KIL instruction in the (ARB assembly) shader ends
up with the wrong logic.  This is _probably_ the source of
https://gitlab.freedesktop.org/mesa/mesa/-/issues/1346.

That said, the issue mentions that Mesa 18.0.5 works, but commit
68420d8322 ("nir: Simplify min and max of b2f") was added in 17.3.
Moreover, I was not able to reproduce the error in the ETS2 shader from
shader-db from any Mesa commit near the time the original fd.o bugzilla
was submitted (December 2018). 🤷

In fact, the current error in that shader starts with 9167324a86
("nir/algebraic: Mark some logic-joined comparison reductions as
exact").  That's a bit of a red herring as 9167324a86 just sets off a
chain of replacements that eventually leads to the incorrect min/max of
b2f patterns fixed by this commit.

The other affected shaders in the shader-db results are from Cargo
Commander.  These are also ARB assembly shaders.

I think any ARB assembly shader that uses the pattern

    SLT    r0, ...;
    ...
    KIL    -r0;

will suffer from issues related to this.

This change fixes the piglit
tests/spec/arb_fragment_program/kil-of-slt.shader_test test added in
https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/454.

shader-db results:

All Gen6+ platforms had similar result. (Ice Lake shown)
total instructions in shared programs: 20034604 -> 20034486 (<.01%)
instructions in affected programs: 3885 -> 3767 (-3.04%)
helped: 47
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 2.64 x̃: 2
helped stats (rel) min: 2.33% max: 8.33% x̄: 3.48% x̃: 3.39%
HURT stats (abs)   min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 13.64% max: 16.67% x̄: 15.15% x̃: 15.15%
95% mean confidence interval for instructions value: -2.83 -1.99
95% mean confidence interval for instructions %-change: -3.84% -1.60%
Instructions are helped.

total cycles in shared programs: 979881379 -> 979879406 (<.01%)
cycles in affected programs: 119873 -> 117900 (-1.65%)
helped: 46
HURT: 3
helped stats (abs) min: 10 max: 756 x̄: 45.41 x̃: 26
helped stats (rel) min: 0.53% max: 19.72% x̄: 1.67% x̃: 1.26%
HURT stats (abs)   min: 28 max: 56 x̄: 38.67 x̃: 32
HURT stats (rel)   min: 1.44% max: 3.54% x̄: 2.75% x̃: 3.27%
95% mean confidence interval for cycles value: -70.83 -9.70
95% mean confidence interval for cycles %-change: -2.23% -0.57%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8115098 -> 8115076 (<.01%)
instructions in affected programs: 2592 -> 2570 (-0.85%)
helped: 32
HURT: 2
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.88% max: 2.70% x̄: 1.35% x̃: 1.31%
HURT stats (abs)   min: 5 max: 5 x̄: 5.00 x̃: 5
HURT stats (rel)   min: 17.24% max: 18.52% x̄: 17.88% x̃: 17.88%
95% mean confidence interval for instructions value: -1.15 -0.15
95% mean confidence interval for instructions %-change: -1.83% 1.39%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 238189718 -> 238189802 (<.01%)
cycles in affected programs: 75076 -> 75160 (0.11%)
helped: 3
HURT: 31
helped stats (abs) min: 2 max: 130 x̄: 44.67 x̃: 2
helped stats (rel) min: 0.18% max: 5.70% x̄: 2.02% x̃: 0.19%
HURT stats (abs)   min: 2 max: 70 x̄: 7.03 x̃: 4
HURT stats (rel)   min: 0.07% max: 6.41% x̄: 0.53% x̃: 0.15%
95% mean confidence interval for cycles value: -7.27 12.21
95% mean confidence interval for cycles %-change: -0.33% 0.94%
Inconclusive result (value mean confidence interval includes 0).

No fossil-db changes on any Intel platform.

Fixes: 68420d8322 ("nir: Simplify min and max of b2f")
Closes: #1346
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9122>
2021-02-19 17:30:53 -08:00
Jason Ekstrand 2491d5a662 nir/algebraic: Covert up-cast of down-cast to extract on Intel
This starts generating extract for bit sizes other than 32 but our
back-end handles that just fine.

Reviewed-by: Ian Romanick <ian.d.romanick@.intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8872>
2021-02-16 16:36:31 +00:00
Jason Ekstrand f9b3be09e1 nir/algebraic: Clean up up-cast of down-cast when we can
There are a bunch of cases where we can pretty quickly determine that
the high bits don't matter.  In these cases, delete the casts.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8872>
2021-02-16 16:36:31 +00:00
Ian Romanick ed138f2861 nir/algebraic: Partially revert 3f782cdd25
I'm not sure what the logic was, but there is no opportunity for
anything to flush to zero here.  'a' is a Boolean value, and b2f
produces 1.0 or 0.0.

This was originally part of
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3765/.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Andres Gomez <agomez@igalia.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8910>
2021-02-07 18:31:01 -08:00
Ian Romanick 5923742356 nir/algebraic: add patterns for a >> #b << #b and a << #b >> #b
Commit 5476d18183 ("nir/algebraic: add patterns for a >> #b << #b")
added the ushr version, but it missed the ishr.  A bunch of compute
shaders with stores to shared storage generate the ishr pattern.

Enabling this optimization also enables the iadd/iand reassociation
(right after this hunk), and that enables merging of stores to shared
storage.  A couple shaders have spills and fills hurt on some
platforms.  These all occur in shaders that also have SENDs helped.
On Gen9 and Gen11, the helped SENDs more than makes up for the extra
spills and fills.

On Gen7 and Gen8, it's not as clear.  All of the shaders affected are
compute shaders in DiRT Rally 2 or Bioshock Inifinite.  The most
affected Bioshock shader on Broadwell looks like:

Before: CS SIMD8 shader: 1335 inst, 0 loops, 22411 cycles, 42:36 spills:fills, 159 sends, scheduled with mode lifo, Promoted 2 constants, compacted 21360 to 16528 bytes.

After:  CS SIMD8 shader: 1175 inst, 0 loops, 25916 cycles, 96:135 spills:fills, 72 sends, scheduled with mode lifo, Promoted 2 constants, compacted 18800 to 13648 bytes.

The results on Haswell and Ivy Bridge are similar.  Given that there
are only 2 promoted constants, MR !7698 won't have any effect.

There were no statistically significant changes on Gen9+ in Bioshock in
our performance CI.  Gen8 isn't in that CI, and DiRT Showdown 2 is also
not included in that CI.  It is possible that these shaders aren't used
in the settings or demos used in the CI.

The other pattern, which switches the order of the shifts, only helps a
couple shaders.  If I wasn't already adding another pattern, I
definitely wouldn't bother with that one.

v2: s/ishr/ushr/ in the replacement for the ushr pattern.  Noticed by
Rhys.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21052760 -> 21049269 (-0.02%)
instructions in affected programs: 59497 -> 56006 (-5.87%)
helped: 46
HURT: 0
helped stats (abs) min: 2 max: 552 x̄: 75.89 x̃: 53
helped stats (rel) min: 0.28% max: 43.43% x̄: 5.87% x̃: 4.10%
95% mean confidence interval for instructions value: -108.96 -42.82
95% mean confidence interval for instructions %-change: -8.38% -3.35%
Instructions are helped.

total cycles in shared programs: 855229761 -> 855148518 (<.01%)
cycles in affected programs: 8491373 -> 8410130 (-0.96%)
helped: 33
HURT: 15
helped stats (abs) min: 42 max: 26940 x̄: 6200.70 x̃: 4329
helped stats (rel) min: 0.09% max: 38.78% x̄: 7.97% x̃: 4.29%
HURT stats (abs)   min: 2 max: 18132 x̄: 8225.33 x̃: 7288
HURT stats (rel)   min: <.01% max: 13.37% x̄: 5.72% x̃: 4.53%
95% mean confidence interval for cycles value: -4331.52 946.40
95% mean confidence interval for cycles %-change: -6.78% -0.61%
Inconclusive result (value mean confidence interval includes 0).

total sends in shared programs: 989947 -> 989694 (-0.03%)
sends in affected programs: 523 -> 270 (-48.37%)
helped: 5
HURT: 0
helped stats (abs) min: 9 max: 87 x̄: 50.60 x̃: 37
helped stats (rel) min: 25.71% max: 54.72% x̄: 43.49% x̃: 42.53%
95% mean confidence interval for sends value: -93.95 -7.25
95% mean confidence interval for sends %-change: -58.48% -28.50%
Sends are helped.

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20033498 -> 20030552 (-0.01%)
instructions in affected programs: 59220 -> 56274 (-4.97%)
helped: 48
HURT: 0
helped stats (abs) min: 1 max: 465 x̄: 61.38 x̃: 39
helped stats (rel) min: 0.03% max: 42.27% x̄: 5.19% x̃: 3.90%
95% mean confidence interval for instructions value: -89.57 -33.18
95% mean confidence interval for instructions %-change: -7.49% -2.89%
Instructions are helped.

total cycles in shared programs: 979993675 -> 979840773 (-0.02%)
cycles in affected programs: 6738454 -> 6585552 (-2.27%)
helped: 46
HURT: 0
helped stats (abs) min: 42 max: 6265 x̄: 3323.96 x̃: 3579
helped stats (rel) min: 0.09% max: 37.38% x̄: 4.34% x̃: 2.39%
95% mean confidence interval for cycles value: -3664.70 -2983.21
95% mean confidence interval for cycles %-change: -6.63% -2.06%
Cycles are helped.

total spills in shared programs: 10659 -> 10661 (0.02%)
spills in affected programs: 36 -> 38 (5.56%)
helped: 1
HURT: 1

total fills in shared programs: 11551 -> 11551 (0.00%)
fills in affected programs: 70 -> 70 (0.00%)
helped: 1
HURT: 1

total sends in shared programs: 1032117 -> 1031785 (-0.03%)
sends in affected programs: 711 -> 379 (-46.69%)
helped: 5
HURT: 0
helped stats (abs) min: 18 max: 87 x̄: 66.40 x̃: 74
helped stats (rel) min: 27.69% max: 54.72% x̄: 44.49% x̃: 44.31%
95% mean confidence interval for sends value: -101.79 -31.01
95% mean confidence interval for sends %-change: -58.42% -30.55%
Sends are helped.

Broadwell
total instructions in shared programs: 17865005 -> 17862757 (-0.01%)
instructions in affected programs: 66438 -> 64190 (-3.38%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 266 x̄: 45.88 x̃: 39
helped stats (rel) min: 0.03% max: 11.99% x̄: 3.73% x̃: 3.92%
95% mean confidence interval for instructions value: -59.15 -32.61
95% mean confidence interval for instructions %-change: -4.35% -3.12%
Instructions are helped.

total cycles in shared programs: 1031298803 -> 1031219023 (<.01%)
cycles in affected programs: 7253602 -> 7173822 (-1.10%)
helped: 45
HURT: 2
helped stats (abs) min: 18 max: 7828 x̄: 1928.33 x̃: 1918
helped stats (rel) min: <.01% max: 10.51% x̄: 1.58% x̃: 1.31%
HURT stats (abs)   min: 3490 max: 3505 x̄: 3497.50 x̃: 3497
HURT stats (rel)   min: 15.56% max: 15.64% x̄: 15.60% x̃: 15.60%
95% mean confidence interval for cycles value: -2174.88 -1220.01
95% mean confidence interval for cycles %-change: -2.00% 0.30%
Inconclusive result (%-change mean confidence interval includes 0).

total spills in shared programs: 20799 -> 20924 (0.60%)
spills in affected programs: 843 -> 968 (14.83%)
helped: 0
HURT: 4

total fills in shared programs: 27110 -> 27334 (0.83%)
fills in affected programs: 1824 -> 2048 (12.28%)
helped: 1
HURT: 4

total sends in shared programs: 1017935 -> 1017603 (-0.03%)
sends in affected programs: 711 -> 379 (-46.69%)
helped: 5
HURT: 0
helped stats (abs) min: 18 max: 87 x̄: 66.40 x̃: 74
helped stats (rel) min: 27.69% max: 54.72% x̄: 44.49% x̃: 44.31%
95% mean confidence interval for sends value: -101.79 -31.01
95% mean confidence interval for sends %-change: -58.42% -30.55%
Sends are helped.

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16397496 -> 16395411 (-0.01%)
instructions in affected programs: 59384 -> 57299 (-3.51%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 208 x̄: 42.55 x̃: 39
helped stats (rel) min: 0.03% max: 8.18% x̄: 3.74% x̃: 3.91%
95% mean confidence interval for instructions value: -53.59 -31.51
95% mean confidence interval for instructions %-change: -4.24% -3.23%
Instructions are helped.

total cycles in shared programs: 1035483504 -> 1035397592 (<.01%)
cycles in affected programs: 9379739 -> 9293827 (-0.92%)
helped: 45
HURT: 4
helped stats (abs) min: 10 max: 5600 x̄: 2164.51 x̃: 2350
helped stats (rel) min: <.01% max: 11.61% x̄: 1.93% x̃: 1.56%
HURT stats (abs)   min: 2 max: 5756 x̄: 2872.75 x̃: 2866
HURT stats (rel)   min: <.01% max: 24.65% x̄: 12.29% x̃: 12.26%
95% mean confidence interval for cycles value: -2293.06 -1213.56
95% mean confidence interval for cycles %-change: -2.42% 0.88%
Inconclusive result (%-change mean confidence interval includes 0).

total spills in shared programs: 17672 -> 17803 (0.74%)
spills in affected programs: 364 -> 495 (35.99%)
helped: 2
HURT: 2

total fills in shared programs: 20752 -> 20937 (0.89%)
fills in affected programs: 656 -> 841 (28.20%)
helped: 2
HURT: 2

total sends in shared programs: 1044703 -> 1044450 (-0.02%)
sends in affected programs: 523 -> 270 (-48.37%)
helped: 5
HURT: 0
helped stats (abs) min: 9 max: 87 x̄: 50.60 x̃: 37
helped stats (rel) min: 25.71% max: 54.72% x̄: 43.49% x̃: 42.53%
95% mean confidence interval for sends value: -93.95 -7.25
95% mean confidence interval for sends %-change: -58.48% -28.50%
Sends are helped.

No changes on Gen6 or earlier GPUs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8852>
2021-02-08 00:25:22 +00:00
Ian Romanick 6b0443a900 nir/algebraic: Fix a >> #b << #b for sizes other than 32-bit
The base mask previously used was 0xffffffff.  This is not correct (but
should still work) for 16-bit and 8-bit values, but it means the high
32-bits of 64-bit values will get chopped off.

Instead of just restricting the pattern to 32-bits (as was done before
00b28a50b2), this extends the optimization in two ways:

1. Make it correct for other bit sizes.
2. Make it work for arbitrary shift counts.

This has the added benefit of reducing the number of patterns actually
added (7 previously, 4 now).

The "Reassociate for improved CSE" part is just reverted to its
pre-00b28a50b2c behavior.  I doubt that pattern is likely to have much
impact outside 32-bits.

This change fixes the piglit tests
tests/spec/arb_gpu_shader_int64/fs-shl-of-shr-int64.shader_test and
tests/spec/arb_gpu_shader_int64/fs-iand-of-iadd-int64.shader_test.

All of the shaders helped in shader-db are vertex shaders on platforms
with vector-oriented vertex processing.  The shaders contain ((x >> 16)
<< 16).  These platforms set lower_extract_word, so the optimization
that transforms (x >> 16) to extract_u16 doesn't trigger.  With only ~60
shaders involved, I didn't bother trying to add extract_XYZ versions of
these patterns to try to get those cases.

Fixes: 00b28a50b2 ("nir/algebraic: trivially enable existing 32-bit patterns for all bit sizes")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Haswell and earlier Intel GPUs had simlar results. (Haswell shown)
total instructions in shared programs: 16397554 -> 16397496 (<.01%)
instructions in affected programs: 7961 -> 7903 (-0.73%)
helped: 58
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 1.89% x̄: 0.99% x̃: 0.78%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.13% -0.85%
Instructions are helped.

total cycles in shared programs: 1035483770 -> 1035483504 (<.01%)
cycles in affected programs: 75922 -> 75656 (-0.35%)
helped: 44
HURT: 2
helped stats (abs) min: 2 max: 12 x̄: 6.14 x̃: 2
helped stats (rel) min: 0.05% max: 1.67% x̄: 0.87% x̃: 0.72%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -7.28 -4.29
95% mean confidence interval for cycles %-change: -1.03% -0.63%
Cycles are helped.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8852>
2021-02-08 00:25:22 +00:00
Samuel Pitoiset 4c3ad4d065 nir/algebraic: mark more optimization with fsat(NaN) as inexact
These optimizations are duplicated from the main optimization table
to the late one... And I missed some in the original fix.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3368
Fixes: bc123c396a ("nir/algebraic: mark some optimizations with fsat(NaN) as inexact")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716>
2021-01-26 17:06:23 +00:00
Rhys Perry b729cd58d7 nir/algebraic: eliminate exact a*0.0 if float execution mode allow it
fossil-db (GFX10):
Totals from 611 (0.44% of 139391) affected shaders:
SGPRs: 40528 -> 40288 (-0.59%)
VGPRs: 16136 -> 16152 (+0.10%); split: -0.15%, +0.25%
CodeSize: 970192 -> 951036 (-1.97%)
MaxWaves: 10561 -> 10557 (-0.04%); split: +0.08%, -0.11%
Instrs: 174874 -> 172879 (-1.14%); split: -1.18%, +0.04%

fossil-db (GFX10.3):
Totals from 611 (0.44% of 139391) affected shaders:
SGPRs: 40680 -> 40488 (-0.47%)
VGPRs: 18368 -> 18276 (-0.50%); split: -0.57%, +0.07%
CodeSize: 1050712 -> 1033624 (-1.63%); split: -1.64%, +0.02%
MaxWaves: 8658 -> 8674 (+0.18%)
Instrs: 205364 -> 201220 (-2.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Rhys Perry 614ab26afd nir/algebraic: optimize out exact a+0.0 if it's used only as a float
fossil-db (GFX10):
Totals from 133 (0.10% of 139391) affected shaders:
SGPRs: 7864 -> 7856 (-0.10%); split: -0.20%, +0.10%
VGPRs: 4884 -> 4836 (-0.98%)
CodeSize: 288932 -> 287084 (-0.64%)
MaxWaves: 1973 -> 1979 (+0.30%)
Instrs: 53899 -> 53550 (-0.65%)

fossil-db (GFX10.3):
Totals from 133 (0.10% of 139391) affected shaders:
SGPRs: 7832 -> 7835 (+0.04%); split: -0.06%, +0.10%
VGPRs: 5144 -> 5088 (-1.09%)
CodeSize: 318912 -> 316696 (-0.69%)
MaxWaves: 1735 -> 1746 (+0.63%)
Instrs: 65367 -> 64853 (-0.79%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Rhys Perry 2849f0b5aa nir/algebraic: optimize out exact a*1.0 if it's used only as a float
fossil-db (GFX10):
Totals from 10180 (7.30% of 139391) affected shaders:
SGPRs: 549392 -> 549448 (+0.01%); split: -0.00%, +0.01%
VGPRs: 243228 -> 243008 (-0.09%); split: -0.11%, +0.02%
CodeSize: 12939080 -> 12603996 (-2.59%); split: -2.59%, +0.00%
MaxWaves: 186948 -> 186976 (+0.01%)
Instrs: 2497266 -> 2414648 (-3.31%)

fossil-db (GFX10.3):
Totals from 10180 (7.30% of 139391) affected shaders:
SGPRs: 549672 -> 549280 (-0.07%); split: -0.23%, +0.16%
VGPRs: 289296 -> 283672 (-1.94%); split: -2.83%, +0.88%
CodeSize: 13920180 -> 13255560 (-4.77%); split: -4.77%, +0.00%
MaxWaves: 151789 -> 153165 (+0.91%)
Instrs: 2756978 -> 2671517 (-3.10%); split: -3.10%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5523>
2021-01-26 11:36:13 +00:00
Daniel Schürmann bd8e84eb8d nir: replace .lower_sub with .has_fsub and .has_isub
This allows a more fine-grained control about whether
a backend supports one of these instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>
2021-01-11 19:13:51 +00:00
Daniel Schürmann b3ce55b445 nir,vc4: Lower fneg to fmul(x, -1.0)
This patch also replaces lower_negate with lower_ineg / lower_fneg.

The fneg semantics have been clarified as of Version 1.5, Revision 1
of the SPIR-V specification, which means that the previous lowering
to fsub is not a viable solution anymore, and is replaced with
lowering to fmul(x, -1.0).

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>
2021-01-11 19:13:51 +00:00
Ian Romanick 539c25c2da nir/algebraic: Move the flrp -> bcsel rule earlier
If multiple rules could match, the rule that appears first in the file
is used.

Only Tiger Lake and Ice Lake are affected.  Other platforms either have
a LRP instruction or can't run any shaders from shader-db that would
benefit.

v2: Fix issues created when this commit was rebased on top of
3c8934a644 ("nir/algebraic: add flrp patterns for 16 and 64 bits").
Noticed by Caio.

Tiger Lake and Ice Lake had similar results.
total instructions in shared programs: 20908672 -> 20908661 (<.01%)
instructions in affected programs: 419 -> 408 (-2.63%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.20 x̃: 3
helped stats (rel) min: 1.85% max: 3.19% x̄: 2.49% x̃: 2.65%
95% mean confidence interval for instructions value: -3.56 -0.84
95% mean confidence interval for instructions %-change: -3.24% -1.73%
Instructions are helped.

total cycles in shared programs: 473513940 -> 473513793 (<.01%)
cycles in affected programs: 7176 -> 7029 (-2.05%)
helped: 12
HURT: 0
helped stats (abs) min: 5 max: 22 x̄: 12.25 x̃: 12
helped stats (rel) min: 0.84% max: 3.24% x̄: 2.09% x̃: 1.80%
95% mean confidence interval for cycles value: -15.43 -9.07
95% mean confidence interval for cycles %-change: -2.57% -1.61%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick ec16f935fe nir/algebraic: Mark comparisons generated from lowered fsign precise
This prevents other transformations from converting them to 'a != 0'.
For example, both of these transformations can do this:

   (('~flt', 0.0, ('fabs', a)), ('fne', a, 0.0)),
   (('~flt', ('fneg', ('fabs', a)), 0.0), ('fne', a, 0.0)),

Both fsign(fabs(NaN)) and fsign(fneg(fabs(NaN))) should produce zero,
but, since 'NaN != 0.0' is true, cascading these transformations could
cause them to generate 1.0 or -1.0 respecively.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick 9771af5dde nir/algebraic: Fix broken NaN and -0.0 behavior
No shader-db or fossil-db changes on any Intel platform.

v2: Add a coding line to fix SCons build problems caused by the ±
character.

Fixes: 25bfba3335 ("nir/algebraic: Recognize open-coded copysign(1.0, a)")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick 55621c6d1c nir/algebraic: Add some compare-with-zero optimizations that are exact
This prevents some fossil-db regressions in "spir-v: Mark floating point
comparisons exact".

v2: Note that the patterns and replacements produce the same value when
isnan(b).  Suggested by Caio.

v3: Use C99 isfinite() instead of (obsolete) BSD finite().  Fixes
various Windows builds.

No fossil-db changes on any Inetl platform, Vega, or Polaris10.

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908670 -> 20908672 (<.01%)
instructions in affected programs: 69 -> 71 (2.90%)
helped: 0
HURT: 1

total cycles in shared programs: 473515288 -> 473513940 (<.01%)
cycles in affected programs: 4942 -> 3594 (-27.28%)
helped: 2
HURT: 0

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick 9167324a86 nir/algebraic: Mark some logic-joined comparison reductions as exact
This also prevents some fossil-db regressions in "spir-v: Mark floating
point comparisons exact".

v2: Mark the fmin / fmax in the replacement exact to prevent other
optimizations from ruining the NaN-clensing property of the fmin / fmax.
Suggested by Rhys.  Don't assume that constants are not NaN because some
components of a vector might be NaN while others are numbers.  Noticed
by Rhys.  This causes ~8 more shaders in Age of Wonders III (dxvk) to
regress on cycles (not instructions) by less than 1% when "spir-v: Mark
floating point comparisons exact" is applied.  This difference is too
small to care.

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908668 -> 20908670 (<.01%)
instructions in affected programs: 9196 -> 9198 (0.02%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 2 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.02% max: 5.41% x̄: 2.20% x̃: 2.16%
HURT stats (abs)   min: 2 max: 6 x̄: 3.20 x̃: 3
HURT stats (rel)   min: 2.44% max: 16.67% x̄: 9.39% x̃: 12.50%
95% mean confidence interval for instructions value: -1.22 1.49
95% mean confidence interval for instructions %-change: -2.08% 5.41%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 473515330 -> 473515288 (<.01%)
cycles in affected programs: 67146 -> 67104 (-0.06%)
helped: 10
HURT: 7
helped stats (abs) min: 1 max: 36 x̄: 15.90 x̃: 17
helped stats (rel) min: 0.01% max: 1.29% x̄: 0.66% x̃: 0.89%
HURT stats (abs)   min: 1 max: 48 x̄: 16.71 x̃: 4
HURT stats (rel)   min: 0.08% max: 1.94% x̄: 0.87% x̃: 0.19%
95% mean confidence interval for cycles value: -13.88 8.94
95% mean confidence interval for cycles %-change: -0.56% 0.49%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick fe3c518277 nir/algebraic: Don't add reordered version of patterns for commutative instructions
The reordered are automatically considered by nir_algebraic rules for
commutative instructions.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick 314a40c902 Revert "nir: Replace an odd comparison involving fmin of -b2f"
I originally noticed that 3b30814791 ("nir/algebraic: Optimize 1-bit
Booleans") caused this pattern no longer be matched by incorrectly
replacing b@32 with b@1.  Making that correct had no effect on
shader-db.  When this pattern originally was added, it only affected 4
shaders, so it's not worth the effort to debug further.

This reverts commit f50400cc80.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00
Ian Romanick aec0547838 nir/algebraic: Make some notes about comparison rearrangements versus infinity
The original comment was a little terse and a little incorrect.  The
rearrangements are fine w.r.t. NaN.  However, they produce incorrect
results if one operand is +Inf and the other is -Inf.

A later commit, "nir/algebraic: Add some compare-with-zero optimizations
that are exact", will add some more patterns here.  It may be reasonable
to squash this commit (forward) into that commit.

v2: Fix some incorrect comparisons operators in the comment (<= vs >=).
Add commentary that subtraction works like addition w.r.t. NaN.  Both
noticed / suggested by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
2021-01-05 02:07:09 +00:00