Commit Graph

329 Commits

Author SHA1 Message Date
Jason Ekstrand 615cc26b97 nir/algebraic: Generalize an optimization
This just makes it nicely scale across bit sizes.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand b569093566 nir/algebraic: Make an optimization more specific
Later in this series, bool is not going to imply 32-bit.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand 517099809a nir: Drop support for lower_b2f
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand 4bb1a34727 nir/algebraic: Optimize x2b(xneg(a)) -> a
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15072525 -> 15072525 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This helps prevent regressions in later commits.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand dca6cd9ce6 nir: Make boolean conversions sized just like the others
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64.  This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:07 -06:00
Jason Ekstrand be98b1db38 nir/opt_algebraic: Add 32-bit specifiers to a bunch of booleans
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:03 -06:00
Jason Ekstrand 2715080d65 nir/opt_algebraic: Drop bit-size suffixes from conversions
Suffixes are dropped from a bunch of conversion opcodes when it makes
sense to do so.  Others are kept if we really do want the bit-size
restriction.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:01 -06:00
Jason Ekstrand ff8e3d3b7b nir/opt_algebraic: Simplify an optimization using the new search ops
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:02:58 -06:00
Jonathan Marek 3e7186d472 nir: add fceil lowering
lowers ceil(x) as -floor(-x)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-11-27 15:44:02 -05:00
Christian Gmeiner c6aaafa3a1 nir: add lowering for ffloor
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-11-12 21:57:25 +01:00
Jason Ekstrand 6068be543b nir/algebraic: Generalize an optimization
There's nothing boolean about (a | ~a) ~> -1

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-10-22 16:00:18 -05:00
Jason Ekstrand d7e0d47b9d nir: Add a bunch of b2[if] optimizations
The b2f and b2i conversions always produce zero or one which are both
representable in every type and size.  Since b2i and b2f support all bit
sizes, we can just get rid of the conversion opcode.

total instructions in shared programs: 15089335 -> 15084368 (-0.03%)
instructions in affected programs: 212564 -> 207597 (-2.34%)
helped: 896
HURT: 0

total cycles in shared programs: 369831123 -> 369826267 (<.01%)
cycles in affected programs: 2008647 -> 2003791 (-0.24%)
helped: 693
HURT: 216

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-10-11 15:21:19 -05:00
Ian Romanick a68dd47b91 nir/algebraic: Simplify fsat of fsign
These allows us to not support fsign.sat in the Intel compiler backend,
and that will simplify some later changes.

No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-10-09 13:56:42 -07:00
Ian Romanick 1546204cdd nir/algebraic: sign(x)*x*x is abs(x)*x
shader-db results:

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15106023 -> 15105981 (<.01%)
instructions in affected programs: 300 -> 258 (-14.00%)
helped: 6
HURT: 0
helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7
helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00%
95% mean confidence interval for instructions value: -7.00 -7.00
95% mean confidence interval for instructions %-change: -14.00% -14.00%
Instructions are helped.

total cycles in shared programs: 566050327 -> 566050075 (<.01%)
cycles in affected programs: 2826 -> 2574 (-8.92%)
helped: 6
HURT: 0
helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42
helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92%
95% mean confidence interval for cycles value: -44.30 -39.70
95% mean confidence interval for cycles %-change: -8.95% -8.88%
Cycles are helped.

No changes on Gen6 or earlier.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-10-09 13:56:42 -07:00
Jason Ekstrand d448fa3ae3 nir/algebraic: Add some max/min optimizations
Found by inspection.  This doesn't help much now but we'll see this
pattern with images if you load UNORM and then store UNORM.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166916 -> 15166910 (<.01%)
    instructions in affected programs: 761 -> 755 (-0.79%)
    helped: 6
    HURT: 0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand 4dd5263663 nir/algebraic: Add more extract_[iu](8|16) optimizations
This adds the "(a << N) >> M" family of mask or sign-extensions.  Not a
huge win right now but this pattern will soon be generated by NIR format
lowering code.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166918 -> 15166916 (<.01%)
    instructions in affected programs: 36 -> 34 (-5.56%)
    helped: 2
    HURT: 0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Jason Ekstrand 116b47fe3c nir/algebraic: Be more careful converting ushr to extract_u8/16
If it's not the right bit-size, it may not actually be the correct
extraction.  For now, we'll only worry about 32-bit versions.

Fixes: 905ff86198 "nir: Recognize open-coded extract_u16"
Fixes: 76289fbfa8 "nir: Recognize open-coded extract_u8"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Mathieu Bridon d9ca4a172e python: Use the right function for the job
The code was just reimplementing itertools.combinations_with_replacement
in a less efficient way.

This does change the order of the results slightly, but it should be ok.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-09 16:49:18 -07:00
Ian Romanick 3b07d28f81 nir: Transform expressions of b2f(a) and b2f(b) to a == b
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14276886 -> 14276838 (<.01%)
instructions in affected programs: 312 -> 264 (-15.38%)
helped: 2
HURT: 0

total cycles in shared programs: 532578395 -> 532570985 (<.01%)
cycles in affected programs: 682562 -> 675152 (-1.09%)
helped: 374
HURT: 4
helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18
helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28%
HURT stats (abs)   min: 2 max: 114 x̄: 53.50 x̃: 49
HURT stats (rel)   min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15%
95% mean confidence interval for cycles value: -21.30 -17.91
95% mean confidence interval for cycles %-change: -1.30% -1.06%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10488123 -> 10488075 (<.01%)
instructions in affected programs: 336 -> 288 (-14.29%)
helped: 2
HURT: 0

total cycles in shared programs: 150260379 -> 150260439 (<.01%)
cycles in affected programs: 4726 -> 4786 (1.27%)
helped: 0
HURT: 2

No changes on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick c658b6c4c8 nir: Transform expressions of b2f(a) and b2f(b) to a ^^ b
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14276892 -> 14276886 (<.01%)
instructions in affected programs: 484 -> 478 (-1.24%)
helped: 2
HURT: 0

total cycles in shared programs: 532578397 -> 532578395 (<.01%)
cycles in affected programs: 3522 -> 3520 (-0.06%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick 3aca80aabc nir: Transform expressions of b2f(a) and b2f(b) to !(a && b)
All Gen platforms had pretty similar results. (Skylake shown)
total cycles in shared programs: 532578400 -> 532578397 (<.01%)
cycles in affected programs: 2784 -> 2781 (-0.11%)
helped: 1
HURT: 1
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%

v2: s/fmax/fmin/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick 1713c97181 nir: Transform expressions of b2f(a) and b2f(b) to a && b
No changes on any Gen platform.

v2: s/fmax/fmin/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick 4425f4786a nir: Transform expressions of b2f(a) and b2f(b) to !(a || b)
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14276961 -> 14276892 (<.01%)
instructions in affected programs: 3215 -> 3146 (-2.15%)
helped: 28
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2
helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92%
95% mean confidence interval for instructions value: -2.87 -2.06
95% mean confidence interval for instructions %-change: -5.73% -2.95%
Instructions are helped.

total cycles in shared programs: 532577068 -> 532578400 (<.01%)
cycles in affected programs: 121864 -> 123196 (1.09%)
helped: 35
HURT: 30
helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22
helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86%
HURT stats (abs)   min: 2 max: 246 x̄: 93.80 x̃: 36
HURT stats (rel)   min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58%
95% mean confidence interval for cycles value: -5.02 46.01
95% mean confidence interval for cycles %-change: -0.99% 1.65%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 7781299 -> 7781342 (<.01%)
instructions in affected programs: 22300 -> 22343 (0.19%)
helped: 13
HURT: 40
helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3
helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43%
95% mean confidence interval for instructions value: 0.23 1.39
95% mean confidence interval for instructions %-change: -1.18% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 177878928 -> 177879332 (<.01%)
cycles in affected programs: 383298 -> 383702 (0.11%)
helped: 7
HURT: 43
helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10
helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40%
HURT stats (abs)   min: 2 max: 38 x̄: 11.02 x̃: 12
HURT stats (rel)   min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09%
95% mean confidence interval for cycles value: 5.21 10.95
95% mean confidence interval for cycles %-change: -0.51% 0.21%
Inconclusive result (%-change mean confidence interval includes 0).

v2: s/fmin/fmax/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick 6b3670ae80 nir: Transform -fabs(a) >= 0 to a == 0
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14276964 -> 14276961 (<.01%)
instructions in affected programs: 411 -> 408 (-0.73%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68%

total cycles in shared programs: 532577062 -> 532577068 (<.01%)
cycles in affected programs: 1093 -> 1099 (0.55%)
helped: 1
HURT: 1
helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77%
HURT stats (abs)   min: 22 max: 22 x̄: 22.00 x̃: 22
HURT stats (rel)   min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48%

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick 46e7c340d4 nir: Transform expressions of b2f(a) and b2f(b) to a || b
All Gen6+ platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277184 -> 14276964 (<.01%)
instructions in affected programs: 10082 -> 9862 (-2.18%)
helped: 37
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4
helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for instructions value: -7.87 -3.71
95% mean confidence interval for instructions %-change: -6.98% -3.16%
Instructions are helped.

total cycles in shared programs: 532577990 -> 532577062 (<.01%)
cycles in affected programs: 170959 -> 170031 (-0.54%)
helped: 33
HURT: 9
helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30
helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13%
HURT stats (abs)   min: 2 max: 24 x̄: 10.22 x̃: 8
HURT stats (rel)   min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22%
95% mean confidence interval for cycles value: -31.23 -12.96
95% mean confidence interval for cycles %-change: -2.90% -1.02%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 7781539 -> 7781301 (<.01%)
instructions in affected programs: 10169 -> 9931 (-2.34%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6
helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88%
95% mean confidence interval for instructions value: -9.53 -5.34
95% mean confidence interval for instructions %-change: -5.94% -2.12%
Instructions are helped.

total cycles in shared programs: 177878590 -> 177878932 (<.01%)
cycles in affected programs: 78706 -> 79048 (0.43%)
helped: 7
HURT: 21
helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28
helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37%
HURT stats (abs)   min: 2 max: 86 x̄: 24.48 x̃: 22
HURT stats (rel)   min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70%
95% mean confidence interval for cycles value: 0.30 24.13
95% mean confidence interval for cycles %-change: -1.52% 1.01%
Inconclusive result (%-change mean confidence interval includes 0).

v2: s/fmin/fmax/.  Noticed by Thomas Helland.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick be7d3ba34a nir: Transform -fabs(a) < 0 to a != 0
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise.  The behavior changes if a is NaN.

All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277216 -> 14277184 (<.01%)
instructions in affected programs: 2300 -> 2268 (-1.39%)
helped: 8
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3
helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01%
95% mean confidence interval for instructions value: -6.45 -1.55
95% mean confidence interval for instructions %-change: -9.96% 1.13%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 532577848 -> 532577990 (<.01%)
cycles in affected programs: 17486 -> 17628 (0.81%)
helped: 2
HURT: 5
helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93%
HURT stats (abs)   min: 6 max: 50 x̄: 30.00 x̃: 26
HURT stats (rel)   min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02%
95% mean confidence interval for cycles value: -1.06 41.63
95% mean confidence interval for cycles %-change: -0.58% 1.74%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick d49eab2757 nir: Rearrange bcsel with two bcsel sources
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277220 -> 14277216 (<.01%)
instructions in affected programs: 422 -> 418 (-0.95%)
helped: 2
HURT: 0

total cycles in shared programs: 532577908 -> 532577848 (<.01%)
cycles in affected programs: 2800 -> 2740 (-2.14%)
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick b92fded6eb nir: Collapse more repeated bcsels on the same argument
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277230 -> 14277220 (<.01%)
instructions in affected programs: 751 -> 741 (-1.33%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2
helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32%
95% mean confidence interval for instructions value: -3.42 -1.58
95% mean confidence interval for instructions %-change: -1.47% -1.17%
Instructions are helped.

total cycles in shared programs: 532577947 -> 532577908 (<.01%)
cycles in affected programs: 10641 -> 10602 (-0.37%)
helped: 4
HURT: 3
helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7
helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60%
HURT stats (abs)   min: 2 max: 8 x̄: 5.33 x̃: 6
HURT stats (rel)   min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23%
95% mean confidence interval for cycles value: -20.69 9.55
95% mean confidence interval for cycles %-change: -1.63% 0.63%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-08-04 01:12:03 -07:00
Ian Romanick 408330ed48 nir: Don't compare i2f or u2i with zero
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14277620 -> 14277230 (<.01%)
instructions in affected programs: 36905 -> 36515 (-1.06%)
helped: 101
HURT: 6
helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6
helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51%
HURT stats (abs)   min: 1 max: 28 x̄: 10.00 x̃: 1
HURT stats (rel)   min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47%
95% mean confidence interval for instructions value: -4.59 -2.70
95% mean confidence interval for instructions %-change: -1.90% -1.41%
Instructions are helped.

total cycles in shared programs: 532580716 -> 532577947 (<.01%)
cycles in affected programs: 940575 -> 937806 (-0.29%)
helped: 92
HURT: 12
helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62
helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41%
HURT stats (abs)   min: 10 max: 1112 x̄: 160.58 x̃: 63
HURT stats (rel)   min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20%
95% mean confidence interval for cycles value: -50.66 -2.59
95% mean confidence interval for cycles %-change: -2.09% -0.73%
Cycles are helped.

total spills in shared programs: 8116 -> 8124 (0.10%)
spills in affected programs: 200 -> 208 (4.00%)
helped: 0
HURT: 2

total fills in shared programs: 11086 -> 11094 (0.07%)
fills in affected programs: 436 -> 444 (1.83%)
helped: 0
HURT: 2

Ivy Bridge and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 12979054 -> 12978067 (<.01%)
instructions in affected programs: 33633 -> 32646 (-2.93%)
helped: 120
HURT: 2
helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13
helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17%
HURT stats (abs)   min: 18 max: 18 x̄: 18.00 x̃: 18
HURT stats (rel)   min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00%
95% mean confidence interval for instructions value: -9.19 -6.99
95% mean confidence interval for instructions %-change: -5.27% -3.62%
Instructions are helped.

total cycles in shared programs: 411212880 -> 411199636 (<.01%)
cycles in affected programs: 696441 -> 683197 (-1.90%)
helped: 107
HURT: 5
helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146
helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88%
HURT stats (abs)   min: 2 max: 50 x̄: 24.00 x̃: 22
HURT stats (rel)   min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25%
95% mean confidence interval for cycles value: -136.96 -99.54
95% mean confidence interval for cycles %-change: -9.75% -6.53%
Cycles are helped.

total spills in shared programs: 78623 -> 78631 (0.01%)
spills in affected programs: 66 -> 74 (12.12%)
helped: 0
HURT: 2

total fills in shared programs: 80104 -> 80108 (<.01%)
fills in affected programs: 133 -> 137 (3.01%)
helped: 0
HURT: 2

No changes on Sandy Bridge, Iron Lake, or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick a3845616a2 nir: Remove f2i(i2f(x)) conversions
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14277978 -> 14277620 (<.01%)
instructions in affected programs: 36957 -> 36599 (-0.97%)
helped: 76
HURT: 1
helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4
helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87%
HURT stats (abs)   min: 14 max: 14 x̄: 14.00 x̃: 14
HURT stats (rel)   min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36%
95% mean confidence interval for instructions value: -7.06 -2.24
95% mean confidence interval for instructions %-change: -1.28% -0.77%
Instructions are helped.

total cycles in shared programs: 532584581 -> 532580716 (<.01%)
cycles in affected programs: 973591 -> 969726 (-0.40%)
helped: 76
HURT: 1
helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32
helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19%
HURT stats (abs)   min: 8280 max: 8280 x̄: 8280.00 x̃: 8280
HURT stats (rel)   min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10%
95% mean confidence interval for cycles value: -386.98 286.59
95% mean confidence interval for cycles %-change: -1.41% -0.81%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8127 -> 8116 (-0.14%)
spills in affected programs: 108 -> 97 (-10.19%)
helped: 1
HURT: 0

total fills in shared programs: 11090 -> 11086 (-0.04%)
fills in affected programs: 440 -> 436 (-0.91%)
helped: 1
HURT: 1

Haswell
total instructions in shared programs: 12979174 -> 12979054 (<.01%)
instructions in affected programs: 9040 -> 8920 (-1.33%)
helped: 14
HURT: 1
helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6
helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14%
HURT stats (abs)   min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19%
95% mean confidence interval for instructions value: -13.58 -2.42
95% mean confidence interval for instructions %-change: -3.94% -1.01%
Instructions are helped.

total cycles in shared programs: 411227148 -> 411212880 (<.01%)
cycles in affected programs: 630506 -> 616238 (-2.26%)
helped: 15
HURT: 0
helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38
helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17%
95% mean confidence interval for cycles value: -2544.28 641.88
95% mean confidence interval for cycles %-change: -6.89% -0.94%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 78626 -> 78623 (<.01%)
spills in affected programs: 42 -> 39 (-7.14%)
helped: 1
HURT: 0

total fills in shared programs: 80111 -> 80104 (<.01%)
fills in affected programs: 140 -> 133 (-5.00%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11684101 -> 11684030 (<.01%)
instructions in affected programs: 3080 -> 3009 (-2.31%)
helped: 4
HURT: 1
helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5
helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99%
HURT stats (abs)   min: 3 max: 3 x̄: 3.00 x̃: 3
HURT stats (rel)   min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15%
95% mean confidence interval for instructions value: -45.59 17.19
95% mean confidence interval for instructions %-change: -9.38% -1.56%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 258407697 -> 258389653 (<.01%)
cycles in affected programs: 328323 -> 310279 (-5.50%)
helped: 5
HURT: 0
helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32
helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60%
95% mean confidence interval for cycles value: -11616.71 4399.11
95% mean confidence interval for cycles %-change: -16.56% -2.03%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4537 -> 4528 (-0.20%)
spills in affected programs: 64 -> 55 (-14.06%)
helped: 1
HURT: 0

total fills in shared programs: 4823 -> 4815 (-0.17%)
fills in affected programs: 189 -> 181 (-4.23%)
helped: 1
HURT: 1

Sandy Bridge
total instructions in shared programs: 10488464 -> 10488449 (<.01%)
instructions in affected programs: 272 -> 257 (-5.51%)
helped: 3
HURT: 0
helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5
helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49%

total cycles in shared programs: 150263359 -> 150263263 (<.01%)
cycles in affected programs: 7978 -> 7882 (-1.20%)
helped: 3
HURT: 0
helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32
helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23%

No changes on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Ian Romanick ea6c276436 nir: Mark the 0.0 < abs(a) transformation as imprecise
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise.  The behavior changes if the source is NaN.

No shader-db changes on any platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-04 01:12:03 -07:00
Jason Ekstrand b3b170ade9 nir: Add a couple of iand/ior optimizations
Spotted in a shader in Batman: Arkham City.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-24 20:39:43 -07:00
Jason Ekstrand e4d346c86d nir: Add a couple trivial abs optimizations
Spotted in a shader in Batman: Arkham City.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-07-23 10:48:21 -07:00
Timothy Arceri e105b0ca30 nir: add a couple of ior opts to nir_opt_algebraic
One of these was seen in a Deus Ex shader.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-18 09:53:27 +10:00
Mathieu Bridon 0f7b18fa0d python: Use the print function
In Python 2, `print` was a statement, but it became a function in
Python 3.

Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2018-07-06 10:04:22 -07:00
Mathieu Bridon fe8a153648 python: Stabilize some script outputs
In Python, dictionaries and sets are unordered, and as a result their
is no guarantee that running this script twice will produce the same
output.

Using ordered dicts and explicitly sorting items makes the build more
reproducible, and will make it possible to verify that we're not
breaking anything when we move the build scripts to Python 3.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-07-05 12:52:12 +01:00
Eric Anholt 6a0db5f08f nir: Add lowering for find_lsb.
There is a fairly simple relation to turn this into ufind_msb.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt d4c7c3c225 nir: Add lowering for ifind_msb to ufind_msb.
ufind_msb is easily expressed in terms of clz, and we can reduce ifind_msb
to that.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt af88acf4c4 nir: Add lowering from ibitfield_extract/ubitfield_extract to shifts.
V3D doesn't have opcodes for ibfe/ubfe, so we need to lower similarly to
glsl/lower_instructions.cpp.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Eric Anholt 74618ccbca nir: Add lowering for bitfieldInsert without using bfi.
If you don't have HW to do bfi, then lowering bitfieldInsert to bfi makes
things harder than keeping the "bits" argument around.

This still uses bfm, but I've added the obvious lowering of bfm if you
need it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-06-06 13:44:28 -07:00
Ian Romanick f00fcfb7a2 nir: Lower !f2b(x) to x == 0.0
Some trivial help now, but it also prevents ~40 regressions caused by
Samuel's "nir: implement the GLSL equivalent of if simplication in
nir_opt_if" patch.

All Gen4+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14369557 -> 14369555 (<.01%)
instructions in affected programs: 442 -> 440 (-0.45%)
helped: 2
HURT: 0

total cycles in shared programs: 532425772 -> 532425743 (<.01%)
cycles in affected programs: 6086 -> 6057 (-0.48%)
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-06-01 10:14:53 -07:00
Ian Romanick 619c51722b nir: Add some missing "optimization undo" patterns
d8d18516b0 and 03fb13f646 added some patterns to undo conversions like

   (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c)))

If further optimization cause some of the operands to either be the same
or be constants, undoing the transformation can lead to further savings.

I don't know why these patterns were not added in those patches.  I did
not check to see which specific patterns actually helped.  I just added
all of them for symmetry.  This prevents some loop unrolling regressions
Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if
simplication in nir_opt_if" patch.

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 14369768 -> 14369557 (<.01%)
instructions in affected programs: 44076 -> 43865 (-0.48%)
helped: 141
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1
helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60%
95% mean confidence interval for instructions value: -1.67 -1.32
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 532430629 -> 532425772 (<.01%)
cycles in affected programs: 1170832 -> 1165975 (-0.41%)
helped: 101
HURT: 5
helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32
helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03%
HURT stats (abs)   min: 2 max: 22 x̄: 9.20 x̃: 4
HURT stats (rel)   min: <.01% max: 0.05% x̄: 0.02% x̃: <.01%
95% mean confidence interval for cycles value: -53.64 -38.00
95% mean confidence interval for cycles %-change: -3.06% -2.20%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-01 10:13:16 -07:00
Samuel Pitoiset 70f9e2589e nir: optimize iand(ieq(a, 0), ieq(b, 0)) to ieq(ior(a, b), 0)
Totals from affected shaders:
SGPRS: 80 -> 80 (0.00 %)
VGPRS: 48 -> 48 (0.00 %)
Code Size: 2120 -> 2096 (-1.13 %) bytes
Max Waves: 16 -> 16 (0.00 %)

Only two Rise of Tomb Raider shaders are affected on my side.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-05-31 10:57:16 +02:00
Timothy Arceri e8b368ad1c nir: add unsigned comparison simplifications
This avoids loop unrolling regressions in Wolfenstein II on DXVK
with an upcoming optimisation series from Samuel.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-30 22:48:37 +10:00
Alyssa Rosenzweig 5d85a0a55b nir: Implement optional b2f->iand lowering
This pass is required by the Midgard compiler; our instruction set uses
NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction.
Normally, this lowering pass would be implemented in a backend-specific
algebraic pass, but this conflicts with the existing iand->b2f pass in
nir_opt_algebraic.py, hanging the compiler. This patch thus makes the
existing pass optional (default on -- all other backends should remain
unaffected), adding an optional pass for lowering the opposite
direction.

v2: Defer lowering until late algebraic optimisations to allow
optimising the b2f instruction itself.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2018-05-18 22:44:09 +02:00
Ian Romanick 2c643fd978 nir: Don't condition 'a-b < 0' -> 'a < b' on is_not_used_by_conditional
Now that i965 recognizes that a-b generates the same conditions as 'a <
b', there is no reason to condition this transformation on 'is not used
by conditional.'

Since this was the only user of the is_not_used_by_conditional function,
delete it.

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14400775 -> 14400595 (<.01%)
instructions in affected programs: 36712 -> 36532 (-0.49%)
helped: 182
HURT: 26
helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1
helped stats (rel) min: 0.15% max: 1.82% x̄: 0.70% x̃: 0.62%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 1.02% x̄: 0.82% x̃: 0.90%
95% mean confidence interval for instructions value: -0.97 -0.76
95% mean confidence interval for instructions %-change: -0.59% -0.43%
Instructions are helped.

total cycles in shared programs: 532929592 -> 532926345 (<.01%)
cycles in affected programs: 478660 -> 475413 (-0.68%)
helped: 187
HURT: 22
helped stats (abs) min: 2 max: 200 x̄: 20.99 x̃: 18
helped stats (rel) min: 0.23% max: 24.10% x̄: 1.48% x̃: 1.03%
HURT stats (abs)   min: 1 max: 214 x̄: 30.86 x̃: 11
HURT stats (rel)   min: 0.01% max: 23.06% x̄: 3.12% x̃: 0.86%
95% mean confidence interval for cycles value: -19.50 -11.57
95% mean confidence interval for cycles %-change: -1.42% -0.58%
Cycles are helped.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 177851578 -> 177851810 (<.01%)
cycles in affected programs: 24408 -> 24640 (0.95%)
helped: 2
HURT: 4
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.42% max: 0.47% x̄: 0.44% x̃: 0.44%
HURT stats (abs)   min: 24 max: 108 x̄: 60.00 x̃: 54
HURT stats (rel)   min: 0.52% max: 1.62% x̄: 1.04% x̃: 1.02%
95% mean confidence interval for cycles value: -7.75 85.08
95% mean confidence interval for cycles %-change: -0.39% 1.49%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Ian Romanick 6aeaa7d363 nir: Don't compare b2f or b2i with zero
All of the shaders that had loops changed were in Tomb Raider.  The one
shader that lost SIMD16 is one of those.

Skylake
total instructions in shared programs: 14391653 -> 14390468 (<.01%)
instructions in affected programs: 111891 -> 110706 (-1.06%)
helped: 501
HURT: 0
helped stats (abs) min: 1 max: 155 x̄: 2.37 x̃: 1
helped stats (rel) min: 0.05% max: 21.54% x̄: 1.61% x̃: 1.01%
95% mean confidence interval for instructions value: -3.23 -1.50
95% mean confidence interval for instructions %-change: -1.77% -1.45%
Instructions are helped.

total cycles in shared programs: 532793024 -> 532776598 (<.01%)
cycles in affected programs: 987682 -> 971256 (-1.66%)
helped: 348
nnHURT: 41
helped stats (abs) min: 1 max: 3074 x̄: 54.91 x̃: 18
helped stats (rel) min: 0.05% max: 32.24% x̄: 3.36% x̃: 1.68%
HURT stats (abs)   min: 1 max: 422 x̄: 65.39 x̃: 24
HURT stats (rel)   min: 0.09% max: 39.29% x̄: 9.50% x̃: 2.02%
95% mean confidence interval for cycles value: -64.08 -20.38
95% mean confidence interval for cycles %-change: -2.78% -1.23%
Cycles are helped.

total loops in shared programs: 4854 -> 4829 (-0.52%)
loops in affected programs: 27 -> 2 (-92.59%)
helped: 18
HURT: 0

LOST:   1
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-19 13:52:35 -07:00
Ian Romanick 6878c9aabc nir: Don't i2b a value that is already Boolean
A bunch of shaders have sequences like:

    i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0))))

Other optimizations (and NIR's typeless nature) reduce this to

    i2b(x == y)

which is silly.

Skylake
total instructions in shared programs: 14498698 -> 14497948 (<.01%)
instructions in affected programs: 74480 -> 73730 (-1.01%)
helped: 277
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2
helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68%
95% mean confidence interval for instructions value: -3.35 -2.06
95% mean confidence interval for instructions %-change: -1.74% -1.16%
Instructions are helped.

total cycles in shared programs: 532015500 -> 531999238 (<.01%)
cycles in affected programs: 5943878 -> 5927616 (-0.27%)
helped: 251
HURT: 74
helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14
helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53%
HURT stats (abs)   min: 1 max: 4550 x̄: 214.04 x̃: 15
HURT stats (rel)   min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33%
95% mean confidence interval for cycles value: -158.51 58.43
95% mean confidence interval for cycles %-change: -1.07% -0.04%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 4753 -> 4735 (-0.38%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

Haswell and Broadwell had simliar results. (Broadwell shown)
total instructions in shared programs: 14791877 -> 14791127 (<.01%)
instructions in affected programs: 77326 -> 76576 (-0.97%)
helped: 278
HURT: 1
helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49%
95% mean confidence interval for instructions value: -3.33 -2.05
95% mean confidence interval for instructions %-change: -1.70% -1.13%
Instructions are helped.

total cycles in shared programs: 558250067 -> 558252872 (<.01%)
cycles in affected programs: 5806328 -> 5809133 (0.05%)
helped: 235
HURT: 83
helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16
helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51%
HURT stats (abs)   min: 1 max: 10590 x̄: 265.19 x̃: 20
HURT stats (rel)   min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54%
95% mean confidence interval for cycles value: -89.87 107.51
95% mean confidence interval for cycles %-change: -1.06% -0.32%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 4735 -> 4717 (-0.38%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

total fills in shared programs: 83111 -> 83110 (<.01%)
fills in affected programs: 28 -> 27 (-3.57%)
helped: 1
HURT: 0

Ivy Bridge
total instructions in shared programs: 11774173 -> 11773436 (<.01%)
instructions in affected programs: 70819 -> 70082 (-1.04%)
helped: 267
HURT: 0
helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2
helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63%
95% mean confidence interval for instructions value: -3.51 -2.01
95% mean confidence interval for instructions %-change: -1.94% -1.21%
Instructions are helped.

total cycles in shared programs: 257153833 -> 257148932 (<.01%)
cycles in affected programs: 585341 -> 580440 (-0.84%)
helped: 167
HURT: 100
helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16
helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88%
HURT stats (abs)   min: 1 max: 200 x̄: 25.95 x̃: 16
HURT stats (rel)   min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65%
95% mean confidence interval for cycles value: -33.25 -3.46
95% mean confidence interval for cycles %-change: -1.47% -0.54%
Cycles are helped.

total loops in shared programs: 3416 -> 3398 (-0.53%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

LOST:   2
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10499306 -> 10499094 (<.01%)
instructions in affected programs: 6051 -> 5839 (-3.50%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2
helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45%
95% mean confidence interval for instructions value: -7.66 -2.20
95% mean confidence interval for instructions %-change: -5.47% -3.12%
Instructions are helped.

total cycles in shared programs: 145862568 -> 145861370 (<.01%)
cycles in affected programs: 61733 -> 60535 (-1.94%)
helped: 36
HURT: 2
helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35
helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81%
HURT stats (abs)   min: 18 max: 102 x̄: 60.00 x̃: 60
HURT stats (rel)   min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48%
95% mean confidence interval for cycles value: -41.28 -21.77
95% mean confidence interval for cycles %-change: -6.16% -3.00%
Cycles are helped.

total loops in shared programs: 1803 -> 1785 (-1.00%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

LOST:   4
GAINED: 0

No changes on Iron Lake of GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-08 15:26:26 -08:00
Ian Romanick 54e8d2268d nir: Narrow some dot product operations
On vector platforms, this helps elide some constant loads.

v2: Reorder the transformations.

No changes on Broadwell or Skylake.

Haswell
total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
instructions in affected programs: 1277532 -> 1243902 (-2.63%)
helped: 13216
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78%
HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.57 -2.49
95% mean confidence interval for instructions %-change: -3.65% -3.54%
Instructions are helped.

total cycles in shared programs: 409580819 -> 409268463 (-0.08%)
cycles in affected programs: 71730652 -> 71418296 (-0.44%)
helped: 9898
HURT: 2352
helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16
helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50%
HURT stats (abs)   min: 2 max: 276 x̄: 23.25 x̃: 6
HURT stats (rel)   min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97%
95% mean confidence interval for cycles value: -33.19 -17.80
95% mean confidence interval for cycles %-change: -4.50% -4.26%
Cycles are helped.

total fills in shared programs: 82059 -> 82052 (<.01%)
fills in affected programs: 21 -> 14 (-33.33%)
helped: 7
HURT: 0

Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown)
total instructions in shared programs: 11811851 -> 11780605 (-0.26%)
instructions in affected programs: 1155007 -> 1123761 (-2.71%)
helped: 12304
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86%
HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.56 -2.48
95% mean confidence interval for instructions %-change: -3.71% -3.59%
Instructions are helped.

total cycles in shared programs: 257618409 -> 257316805 (-0.12%)
cycles in affected programs: 71999580 -> 71697976 (-0.42%)
helped: 9155
HURT: 2380
helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16
helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62%
HURT stats (abs)   min: 2 max: 290 x̄: 21.14 x̃: 4
HURT stats (rel)   min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33%
95% mean confidence interval for cycles value: -34.32 -17.97
95% mean confidence interval for cycles %-change: -4.55% -4.29%
Cycles are helped.

GM45 and Iron Lake had nearly identical results (Iron Lake shown)
total instructions in shared programs: 7886750 -> 7879944 (-0.09%)
instructions in affected programs: 373781 -> 366975 (-1.82%)
helped: 3715
HURT: 47
helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1
helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06%
HURT stats (abs)   min: 1 max: 6 x̄: 2.55 x̃: 2
HURT stats (rel)   min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35%
95% mean confidence interval for instructions value: -1.85 -1.77
95% mean confidence interval for instructions %-change: -2.91% -2.73%
Instructions are helped.

total cycles in shared programs: 178114636 -> 178095452 (-0.01%)
cycles in affected programs: 7227666 -> 7208482 (-0.27%)
helped: 3349
HURT: 301
helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4
helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63%
HURT stats (abs)   min: 2 max: 42 x̄: 9.13 x̃: 10
HURT stats (rel)   min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50%
95% mean confidence interval for cycles value: -5.52 -4.99
95% mean confidence interval for cycles %-change: -0.81% -0.73%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
2018-03-08 15:26:26 -08:00
Ian Romanick e3ea166a2c nir: Simplify some comparisons like a+b < a
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14514555 -> 14514547 (<.01%)
instructions in affected programs: 1972 -> 1964 (-0.41%)
helped: 8
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.39% max: 0.42% x̄: 0.41% x̃: 0.41%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.41% -0.40%
Instructions are helped.

total cycles in shared programs: 533141444 -> 533136780 (<.01%)
cycles in affected programs: 164728 -> 160064 (-2.83%)
helped: 181
HURT: 3
helped stats (abs) min: 2 max: 94 x̄: 26.17 x̃: 30
helped stats (rel) min: 0.12% max: 5.33% x̄: 3.42% x̃: 3.80%
HURT stats (abs)   min: 4 max: 54 x̄: 24.00 x̃: 14
HURT stats (rel)   min: 0.20% max: 2.39% x̄: 1.09% x̃: 0.68%
95% mean confidence interval for cycles value: -27.12 -23.58
95% mean confidence interval for cycles %-change: -3.54% -3.16%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10533667 -> 10533539 (<.01%)
instructions in affected programs: 10148 -> 10020 (-1.26%)
helped: 124
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.03 x̃: 1
helped stats (rel) min: 0.39% max: 4.35% x̄: 2.20% x̃: 2.04%
95% mean confidence interval for instructions value: -1.06 -1.00
95% mean confidence interval for instructions %-change: -2.46% -1.95%
Instructions are helped.

total cycles in shared programs: 146136887 -> 146132122 (<.01%)
cycles in affected programs: 206382 -> 201617 (-2.31%)
helped: 171
HURT: 0
helped stats (abs) min: 2 max: 40 x̄: 27.87 x̃: 30
helped stats (rel) min: 0.08% max: 5.73% x̄: 2.98% x̃: 2.67%
95% mean confidence interval for cycles value: -29.19 -26.54
95% mean confidence interval for cycles %-change: -3.20% -2.76%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886515 -> 7886507 (<.01%)
instructions in affected programs: 3016 -> 3008 (-0.27%)
helped: 8
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.25% max: 0.28% x̄: 0.27% x̃: 0.27%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.27% -0.26%
Instructions are helped.

total cycles in shared programs: 178100396 -> 178100388 (<.01%)
cycles in affected programs: 156128 -> 156120 (<.01%)
helped: 4
HURT: 4
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -3.68 1.68
95% mean confidence interval for cycles %-change: -0.03% <.01%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857872 -> 4857868 (<.01%)
instructions in affected programs: 1544 -> 1540 (-0.26%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.25% max: 0.27% x̄: 0.26% x̃: 0.26%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.28% -0.24%
Instructions are helped.

total cycles in shared programs: 122167654 -> 122167662 (<.01%)
cycles in affected programs: 96248 -> 96256 (<.01%)
helped: 0
HURT: 4
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: <.01% 0.02%
Cycles are HURT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-06 11:17:30 -08:00
Ian Romanick d1ed4ffe0b nir: Use De Morgan's Law on logic compounded comparisons
The replacement of the comparison operators must happen during this
step.  If it does not, the next pass of nir_opt_algebraic will reapply
De Morgan's Law in the "opposite direction" before performing dead code
elimination.  The resulting infinite loop will eventually get OOM
killed.

Haswell, Broadwell, and Skylake had similar results. (Broadwell shown)
total instructions in shared programs: 14808185 -> 14808036 (<.01%)
instructions in affected programs: 13758 -> 13609 (-1.08%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 10 x̄: 3.82 x̃: 3
helped stats (rel) min: 0.44% max: 1.55% x̄: 0.98% x̃: 1.01%
95% mean confidence interval for instructions value: -4.67 -2.97
95% mean confidence interval for instructions %-change: -1.09% -0.88%
Instructions are helped.

total cycles in shared programs: 559438333 -> 559435832 (<.01%)
cycles in affected programs: 199160 -> 196659 (-1.26%)
helped: 42
HURT: 3
helped stats (abs) min: 2 max: 184 x̄: 61.50 x̃: 51
helped stats (rel) min: 0.02% max: 6.94% x̄: 1.41% x̃: 1.40%
HURT stats (abs)   min: 2 max: 40 x̄: 27.33 x̃: 40
HURT stats (rel)   min: 0.05% max: 0.74% x̄: 0.51% x̃: 0.74%
95% mean confidence interval for cycles value: -71.47 -39.69
95% mean confidence interval for cycles %-change: -1.64% -0.93%
Cycles are helped.

Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11811776 -> 11811553 (<.01%)
instructions in affected programs: 15201 -> 14978 (-1.47%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 5.72 x̃: 6
helped stats (rel) min: 0.44% max: 2.53% x̄: 1.30% x̃: 1.26%
95% mean confidence interval for instructions value: -7.21 -4.23
95% mean confidence interval for instructions %-change: -1.48% -1.12%
Instructions are helped.

total cycles in shared programs: 257617270 -> 257614589 (<.01%)
cycles in affected programs: 212107 -> 209426 (-1.26%)
helped: 45
HURT: 0
helped stats (abs) min: 2 max: 180 x̄: 59.58 x̃: 54
helped stats (rel) min: 0.02% max: 6.02% x̄: 1.30% x̃: 1.32%
95% mean confidence interval for cycles value: -74.02 -45.14
95% mean confidence interval for cycles %-change: -1.59% -1.01%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886648 -> 7886515 (<.01%)
instructions in affected programs: 14106 -> 13973 (-0.94%)
helped: 29
HURT: 0
helped stats (abs) min: 1 max: 10 x̄: 4.59 x̃: 4
helped stats (rel) min: 0.35% max: 1.83% x̄: 0.90% x̃: 0.81%
95% mean confidence interval for instructions value: -5.65 -3.52
95% mean confidence interval for instructions %-change: -1.03% -0.76%
Instructions are helped.

total cycles in shared programs: 178100812 -> 178100396 (<.01%)
cycles in affected programs: 67970 -> 67554 (-0.61%)
helped: 29
HURT: 0
helped stats (abs) min: 2 max: 40 x̄: 14.34 x̃: 12
helped stats (rel) min: 0.15% max: 1.69% x̄: 0.58% x̃: 0.54%
95% mean confidence interval for cycles value: -18.30 -10.39
95% mean confidence interval for cycles %-change: -0.71% -0.45%
Cycles are helped.

GM45
total instructions in shared programs: 4857939 -> 4857872 (<.01%)
instructions in affected programs: 7426 -> 7359 (-0.90%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 10 x̄: 4.47 x̃: 4
helped stats (rel) min: 0.33% max: 1.80% x̄: 0.87% x̃: 0.77%
95% mean confidence interval for instructions value: -6.06 -2.87
95% mean confidence interval for instructions %-change: -1.06% -0.67%
Instructions are helped.

total cycles in shared programs: 122167930 -> 122167654 (<.01%)
cycles in affected programs: 43118 -> 42842 (-0.64%)
helped: 15
HURT: 0
helped stats (abs) min: 4 max: 40 x̄: 18.40 x̃: 16
helped stats (rel) min: 0.15% max: 1.69% x̄: 0.62% x̃: 0.54%
95% mean confidence interval for cycles value: -25.03 -11.77
95% mean confidence interval for cycles %-change: -0.82% -0.41%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-06 11:17:29 -08:00
Ian Romanick 52607658ff nir: Replace fmin(b2f(a), b) with a bcsel
All of the affected shaders are HDR mappers from Serious Sam 3.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14516285 -> 14516273 (<.01%)
instructions in affected programs: 348 -> 336 (-3.45%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 2.08% max: 6.67% x̄: 4.31% x̃: 4.17%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -5.55% -3.06%
Instructions are helped.

total cycles in shared programs: 533163876 -> 533163808 (<.01%)
cycles in affected programs: 1144 -> 1076 (-5.94%)
helped: 4
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 5.80% max: 6.08% x̄: 5.94% x̃: 5.94%
95% mean confidence interval for cycles value: -18.84 -15.16
95% mean confidence interval for cycles %-change: -6.20% -5.68%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10533321 -> 10533309 (<.01%)
instructions in affected programs: 372 -> 360 (-3.23%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 2.00% max: 5.88% x̄: 3.91% x̃: 3.85%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -4.96% -2.86%
Instructions are helped.

total cycles in shared programs: 146136632 -> 146136428 (<.01%)
cycles in affected programs: 11668 -> 11464 (-1.75%)
helped: 12
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 0.99% max: 3.44% x̄: 2.20% x̃: 2.29%
95% mean confidence interval for cycles value: -17.66 -16.34
95% mean confidence interval for cycles %-change: -2.82% -1.58%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886301 -> 7886277 (<.01%)
instructions in affected programs: 576 -> 552 (-4.17%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 2.94% max: 6.06% x̄: 4.51% x̃: 4.65%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -5.30% -3.72%
Instructions are helped.

total cycles in shared programs: 178113176 -> 178113176 (0.00%)
cycles in affected programs: 2116 -> 2116 (0.00%)
helped: 2
HURT: 4
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.14% max: 1.14% x̄: 1.14% x̃: 1.14%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.50% max: 0.65% x̄: 0.58% x̃: 0.58%
95% mean confidence interval for cycles value: -3.25 3.25
95% mean confidence interval for cycles %-change: -0.93% 0.94%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857756 -> 4857744 (<.01%)
instructions in affected programs: 294 -> 282 (-4.08%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 2.94% max: 5.71% x̄: 4.40% x̃: 4.55%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -5.71% -3.09%
Instructions are helped.

total cycles in shared programs: 122178730 -> 122178722 (<.01%)
cycles in affected programs: 700 -> 692 (-1.14%)
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-06 11:17:29 -08:00
Ian Romanick b974dfee11 nir: Pull b2f out of bcsel
All platforms had similar results. (Skylake shown)
total instructions in shared programs: 14516592 -> 14516586 (<.01%)
instructions in affected programs: 500 -> 494 (-1.20%)
helped: 2
HURT: 0

total cycles in shared programs: 533167044 -> 533166998 (<.01%)
cycles in affected programs: 6988 -> 6942 (-0.66%)
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-06 11:17:29 -08:00
Ian Romanick f50400cc80 nir: Replace an odd comparison involving fmin of -b2f
I noticed the fge version while looking at a shader for an unrelated
reason.  The feq version prevents a regression in a later change that
performs strength reduction of some compares.

Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14514808 -> 14514796 (<.01%)
instructions in affected programs: 750 -> 738 (-1.60%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.83% max: 1.96% x̄: 1.40% x̃: 1.40%
95% mean confidence interval for instructions value: -6.67 0.67
95% mean confidence interval for instructions %-change: -2.43% -0.36%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 533144939 -> 533144853 (<.01%)
cycles in affected programs: 8911 -> 8825 (-0.97%)
helped: 4
HURT: 0
helped stats (abs) min: 16 max: 32 x̄: 21.50 x̃: 19
helped stats (rel) min: 0.60% max: 1.89% x̄: 1.28% x̃: 1.31%
95% mean confidence interval for cycles value: -32.94 -10.06
95% mean confidence interval for cycles %-change: -2.30% -0.26%
Cycles are helped.

Haswell
total instructions in shared programs: 13093785 -> 13093775 (<.01%)
instructions in affected programs: 924 -> 914 (-1.08%)
helped: 4
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.82% max: 1.95% x̄: 1.39% x̃: 1.39%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19%
95% mean confidence interval for instructions value: -4.53 1.20
95% mean confidence interval for instructions %-change: -2.02% 0.97%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 409580553 -> 409580118 (<.01%)
cycles in affected programs: 10909 -> 10474 (-3.99%)
helped: 5
HURT: 1
helped stats (abs) min: 6 max: 222 x̄: 89.60 x̃: 18
helped stats (rel) min: 0.16% max: 24.72% x̄: 9.54% x̃: 1.78%
HURT stats (abs)   min: 13 max: 13 x̄: 13.00 x̃: 13
HURT stats (rel)   min: 0.39% max: 0.39% x̄: 0.39% x̃: 0.39%
95% mean confidence interval for cycles value: -180.68 35.68
95% mean confidence interval for cycles %-change: -19.55% 3.79%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 11811851 -> 11811840 (<.01%)
instructions in affected programs: 1032 -> 1021 (-1.07%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 1
helped stats (rel) min: 0.63% max: 1.95% x̄: 1.13% x̃: 0.97%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19%
95% mean confidence interval for instructions value: -4.17 0.51
95% mean confidence interval for instructions %-change: -1.86% 0.36%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 257618403 -> 257618168 (<.01%)
cycles in affected programs: 10784 -> 10549 (-2.18%)
helped: 4
HURT: 2
helped stats (abs) min: 4 max: 220 x̄: 64.50 x̃: 17
helped stats (rel) min: 0.50% max: 24.34% x̄: 7.07% x̃: 1.72%
HURT stats (abs)   min: 9 max: 14 x̄: 11.50 x̃: 11
HURT stats (rel)   min: 0.24% max: 0.42% x̄: 0.33% x̃: 0.33%
95% mean confidence interval for cycles value: -133.11 54.78
95% mean confidence interval for cycles %-change: -14.79% 5.59%
Inconclusive result (value mean confidence interval includes 0).

GM45, Iron Lake, and Sandy Bridge had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10533871 -> 10533859 (<.01%)
instructions in affected programs: 865 -> 853 (-1.39%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.63% max: 1.83% x̄: 1.22% x̃: 1.21%
95% mean confidence interval for instructions value: -6.67 0.67
95% mean confidence interval for instructions %-change: -2.16% -0.29%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 146139904 -> 146139852 (<.01%)
cycles in affected programs: 15213 -> 15161 (-0.34%)
helped: 4
HURT: 0
helped stats (abs) min: 3 max: 18 x̄: 13.00 x̃: 15
helped stats (rel) min: 0.15% max: 0.84% x̄: 0.39% x̃: 0.29%
95% mean confidence interval for cycles value: -23.79 -2.21
95% mean confidence interval for cycles %-change: -0.88% 0.09%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-06 11:17:29 -08:00
Ian Romanick 380136e998 nir: Mark bcsel-to-fmin (or fmax) transformations as inexact
These transformations are inexact because section 4.7.1 (Range and
Precision) says:

    Operations and built-in functions that operate on a NaN are not
    required to return a NaN as the result.

The fmin or fmax might not return NaN in cases where the original
expression would be required to return NaN.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-03-06 11:17:14 -08:00
Ian Romanick 4addd34b04 nir: Recognize some more open-coded fmin / fmax
This transformation is inexact because section 4.7.1 (Range and
Precision) says:

    Operations and built-in functions that operate on a NaN are not
    required to return a NaN as the result.

The fmin or fmax might not return NaN in cases where the original
expression would be required to return NaN.

v2: Reorder operands and mark as inexact.  The latter suggested by
Jason.

shader-db results:

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14514817 -> 14514808 (<.01%)
instructions in affected programs: 229 -> 220 (-3.93%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%

total cycles in shared programs: 533145211 -> 533144939 (<.01%)
cycles in affected programs: 37268 -> 36996 (-0.73%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%

Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
total cycles in shared programs: 257618409 -> 257618403 (<.01%)
cycles in affected programs: 12582 -> 12576 (-0.05%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%

No changes on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-03-06 11:17:14 -08:00
Timothy Arceri a050ea60ee nir: add lower_ldexp to nir compiler options
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-28 09:23:49 +11:00
Samuel Pitoiset 63fb30c674 nir: lower fexp2(fmul(flog2(a), 2)) to fmul(a, a)
Similar for the 4 case.

Suggested by Bas.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:40:45 +01:00
Samuel Pitoiset b18997876f nir: add is_used_once for fmul(fexp2(a), fexp2(b)) to fexp2(fadd(a, b))
Otherwise the code size increases because the original fexp2()
instructions can't be deleted.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-22 20:40:43 +01:00
Ian Romanick ee63933a73 nir: Distribute binary operations with constants into bcsel
This was specifically designed to simplify 1+mix(0, a-1, condition) to
mix(1, a, condition) by pushing the 1+ inside.

Skylake, Broadwell, and Haswell had similar results.  Skylake shown.
total instructions in shared programs: 14521753 -> 14521716 (<.01%)
instructions in affected programs: 10619 -> 10582 (-0.35%)
helped: 51
HURT: 14
helped stats (abs) min: 1 max: 12 x̄: 1.43 x̃: 1
helped stats (rel) min: 0.20% max: 3.58% x̄: 1.01% x̃: 0.95%
HURT stats (abs)   min: 1 max: 11 x̄: 2.57 x̃: 1
HURT stats (rel)   min: 0.22% max: 1.75% x̄: 1.20% x̃: 1.32%
95% mean confidence interval for instructions value: -1.31 0.17
95% mean confidence interval for instructions %-change: -0.80% -0.27%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 533000205 -> 533003533 (<.01%)
cycles in affected programs: 110610 -> 113938 (3.01%)
helped: 43
HURT: 28
helped stats (abs) min: 6 max: 440 x̄: 27.12 x̃: 16
helped stats (rel) min: 0.39% max: 4.84% x̄: 1.60% x̃: 1.67%
HURT stats (abs)   min: 2 max: 3066 x̄: 160.50 x̃: 14
HURT stats (rel)   min: 0.08% max: 77.78% x̄: 5.16% x̃: 0.62%
95% mean confidence interval for cycles value: -43.81 137.56
95% mean confidence interval for cycles %-change: -1.47% 3.60%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 10018840 -> 10018713 (<.01%)
instructions in affected programs: 9431 -> 9304 (-1.35%)
helped: 51
HURT: 3
helped stats (abs) min: 1 max: 80 x̄: 2.76 x̃: 1
helped stats (rel) min: 0.20% max: 16.43% x̄: 1.16% x̃: 0.81%
HURT stats (abs)   min: 1 max: 12 x̄: 4.67 x̃: 1
HURT stats (rel)   min: 0.22% max: 1.33% x̄: 0.59% x̃: 0.22%
95% mean confidence interval for instructions value: -5.36 0.66
95% mean confidence interval for instructions %-change: -1.66% -0.46%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 87571944 -> 87572785 (<.01%)
cycles in affected programs: 117234 -> 118075 (0.72%)
helped: 42
HURT: 23
helped stats (abs) min: 2 max: 114 x̄: 51.90 x̃: 30
helped stats (rel) min: 0.11% max: 11.01% x̄: 4.45% x̃: 2.74%
HURT stats (abs)   min: 1 max: 2341 x̄: 131.35 x̃: 10
HURT stats (rel)   min: 0.06% max: 37.11% x̄: 2.75% x̃: 0.61%
95% mean confidence interval for cycles value: -61.05 86.93
95% mean confidence interval for cycles %-change: -3.47% -0.33%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10542933 -> 10542844 (<.01%)
instructions in affected programs: 11487 -> 11398 (-0.77%)
helped: 52
HURT: 3
helped stats (abs) min: 1 max: 40 x̄: 1.96 x̃: 1
helped stats (rel) min: 0.08% max: 8.16% x̄: 0.90% x̃: 0.72%
HURT stats (abs)   min: 1 max: 11 x̄: 4.33 x̃: 1
HURT stats (rel)   min: 0.22% max: 1.22% x̄: 0.55% x̃: 0.22%
95% mean confidence interval for instructions value: -3.17 -0.07
95% mean confidence interval for instructions %-change: -1.13% -0.52%
Instructions are helped.

total cycles in shared programs: 146098397 -> 146097094 (<.01%)
cycles in affected programs: 128140 -> 126837 (-1.02%)
helped: 47
HURT: 8
helped stats (abs) min: 2 max: 333 x̄: 29.21 x̃: 18
helped stats (rel) min: 0.13% max: 5.04% x̄: 1.18% x̃: 0.95%
HURT stats (abs)   min: 1 max: 16 x̄: 8.75 x̃: 9
HURT stats (rel)   min: 0.08% max: 0.43% x̄: 0.30% x̃: 0.34%
95% mean confidence interval for cycles value: -37.49 -9.90
95% mean confidence interval for cycles %-change: -1.22% -0.71%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886711 -> 7886509 (<.01%)
instructions in affected programs: 10425 -> 10223 (-1.94%)
helped: 50
HURT: 2
helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1
helped stats (rel) min: 0.34% max: 15.38% x̄: 1.12% x̃: 0.54%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.86% max: 0.91% x̄: 0.89% x̃: 0.89%
95% mean confidence interval for instructions value: -8.05 0.28
95% mean confidence interval for instructions %-change: -1.83% -0.26%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 178115324 -> 178114612 (<.01%)
cycles in affected programs: 765726 -> 765014 (-0.09%)
helped: 39
HURT: 1
helped stats (abs) min: 2 max: 276 x̄: 18.31 x̃: 8
helped stats (rel) min: <.01% max: 8.47% x̄: 0.39% x̃: 0.04%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -32.07 -3.53
95% mean confidence interval for cycles %-change: -0.86% 0.10%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857762 -> 4857661 (<.01%)
instructions in affected programs: 5523 -> 5422 (-1.83%)
helped: 25
HURT: 1
helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1
helped stats (rel) min: 0.34% max: 13.61% x̄: 1.04% x̃: 0.52%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.86% max: 0.86% x̄: 0.86% x̃: 0.86%
95% mean confidence interval for instructions value: -9.99 2.22
95% mean confidence interval for instructions %-change: -2.01% 0.08%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 122179674 -> 122179194 (<.01%)
cycles in affected programs: 530162 -> 529682 (-0.09%)
helped: 22
HURT: 1
helped stats (abs) min: 2 max: 292 x̄: 21.91 x̃: 7
helped stats (rel) min: <.01% max: 8.65% x̄: 0.44% x̃: 0.04%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -46.56 4.82
95% mean confidence interval for cycles %-change: -1.20% 0.36%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:15 -08:00
Ian Romanick 03fb13f646 nir: Rearrange logic op-compounded integer compares
Skylake and Broadwell had similar results.  Skylake shown.
total instructions in shared programs: 14521769 -> 14521753 (<.01%)
instructions in affected programs: 8782 -> 8766 (-0.18%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.40% x̄: 0.20% x̃: 0.18%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.23% -0.16%
Instructions are helped.

total cycles in shared programs: 533000376 -> 533000205 (<.01%)
cycles in affected programs: 447035 -> 446864 (-0.04%)
helped: 9
HURT: 9
helped stats (abs) min: 2 max: 40 x̄: 35.78 x̃: 40
helped stats (rel) min: 0.02% max: 0.18% x̄: 0.10% x̃: 0.09%
HURT stats (abs)   min: 1 max: 52 x̄: 16.78 x̃: 10
HURT stats (rel)   min: <.01% max: 1.11% x̄: 0.29% x̃: 0.12%
95% mean confidence interval for cycles value: -25.07 6.07
95% mean confidence interval for cycles %-change: -0.08% 0.27%
Inconclusive result (value mean confidence interval includes 0).

No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 053be9f020 nir: Rearrange and-compounded float compares
If both comparisons are used as sources for instructions other than the
iand, this transformation is detrimental.  If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.

It is interesting to me that on Iron Lake only 81 shaders have
instruction counts changed, but 726 shaders have cycle counts changed.

shader-db results:

Skylake
total instructions in shared programs: 14525728 -> 14521017 (-0.03%)
instructions in affected programs: 1164726 -> 1160015 (-0.40%)
helped: 1692
HURT: 5
helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2
helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33%
HURT stats (abs)   min: 1 max: 12 x̄: 3.20 x̃: 1
HURT stats (rel)   min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86%
95% mean confidence interval for instructions value: -3.52 -2.03
95% mean confidence interval for instructions %-change: -0.86% -0.74%
Instructions are helped.

total cycles in shared programs: 533115449 -> 532991404 (-0.02%)
cycles in affected programs: 119401803 -> 119277758 (-0.10%)
helped: 1145
HURT: 467
helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18
helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42%
HURT stats (abs)   min: 1 max: 1590 x̄: 92.15 x̃: 15
HURT stats (rel)   min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39%
95% mean confidence interval for cycles value: -122.16 -31.74
95% mean confidence interval for cycles %-change: -0.94% -0.57%
Cycles are helped.

total spills in shared programs: 9597 -> 9534 (-0.66%)
spills in affected programs: 403 -> 340 (-15.63%)
helped: 1
HURT: 1

total fills in shared programs: 13904 -> 13790 (-0.82%)
fills in affected programs: 1627 -> 1513 (-7.01%)
helped: 2
HURT: 1

LOST:   0
GAINED: 2

Broadwell
total instructions in shared programs: 14816966 -> 14812590 (-0.03%)
instructions in affected programs: 1499885 -> 1495509 (-0.29%)
helped: 1672
HURT: 15
helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33%
HURT stats (abs)   min: 1 max: 21 x̄: 9.20 x̃: 8
HURT stats (rel)   min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53%
95% mean confidence interval for instructions value: -3.14 -2.05
95% mean confidence interval for instructions %-change: -0.85% -0.73%
Instructions are helped.

total cycles in shared programs: 559353622 -> 559345595 (<.01%)
cycles in affected programs: 139893703 -> 139885676 (<.01%)
helped: 921
HURT: 697
helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18
helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87%
HURT stats (abs)   min: 1 max: 2370 x̄: 178.03 x̃: 38
HURT stats (rel)   min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14%
95% mean confidence interval for cycles value: -59.64 49.72
95% mean confidence interval for cycles %-change: -1.02% -0.66%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 78902 -> 78861 (-0.05%)
spills in affected programs: 2418 -> 2377 (-1.70%)
helped: 1
HURT: 11

total fills in shared programs: 83782 -> 83678 (-0.12%)
fills in affected programs: 3515 -> 3411 (-2.96%)
helped: 2
HURT: 11

LOST:   0
GAINED: 5

Haswell and Ivy Bridge had similar results. Haswell shown.
total instructions in shared programs: 9033898 -> 9032010 (-0.02%)
instructions in affected programs: 308064 -> 306176 (-0.61%)
helped: 921
HURT: 4
helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1
helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23%
95% mean confidence interval for instructions value: -2.21 -1.87
95% mean confidence interval for instructions %-change: -0.88% -0.68%
Instructions are helped.

total cycles in shared programs: 84628949 -> 84620520 (<.01%)
cycles in affected programs: 2164913 -> 2156484 (-0.39%)
helped: 518
HURT: 359
helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20
helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01%
HURT stats (abs)   min: 1 max: 586 x̄: 36.43 x̃: 8
HURT stats (rel)   min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40%
95% mean confidence interval for cycles value: -15.17 -4.05
95% mean confidence interval for cycles %-change: -0.77% -0.32%
Cycles are helped.

LOST:   0
GAINED: 4

Sandy Bridge
total instructions in shared programs: 10544860 -> 10542933 (-0.02%)
instructions in affected programs: 360019 -> 358092 (-0.54%)
helped: 931
HURT: 4
helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1
helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33%
95% mean confidence interval for instructions value: -2.23 -1.89
95% mean confidence interval for instructions %-change: -0.76% -0.58%
Instructions are helped.

total cycles in shared programs: 146106820 -> 146098397 (<.01%)
cycles in affected programs: 3435047 -> 3426624 (-0.25%)
helped: 572
HURT: 329
helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15
helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33%
HURT stats (abs)   min: 1 max: 1714 x̄: 30.93 x̃: 6
HURT stats (rel)   min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19%
95% mean confidence interval for cycles value: -16.85 -1.85
95% mean confidence interval for cycles %-change: -0.39% -0.01%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 7886925 -> 7886711 (<.01%)
instructions in affected programs: 25763 -> 25549 (-0.83%)
helped: 75
HURT: 6
helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1
helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53%
HURT stats (abs)   min: 1 max: 16 x̄: 6.00 x̃: 1
HURT stats (rel)   min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86%
95% mean confidence interval for instructions value: -3.69 -1.60
95% mean confidence interval for instructions %-change: -2.54% -0.57%
Instructions are helped.

total cycles in shared programs: 178116888 -> 178115324 (<.01%)
cycles in affected programs: 5858790 -> 5857226 (-0.03%)
helped: 484
HURT: 242
helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6
helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
HURT stats (abs)   min: 2 max: 76 x̄: 4.07 x̃: 2
HURT stats (rel)   min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03%
95% mean confidence interval for cycles value: -2.76 -1.55
95% mean confidence interval for cycles %-change: -0.12% 0.01%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857870 -> 4857762 (<.01%)
instructions in affected programs: 13994 -> 13886 (-0.77%)
helped: 39
HURT: 5
helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48%
HURT stats (abs)   min: 1 max: 16 x̄: 4.00 x̃: 1
HURT stats (rel)   min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86%
95% mean confidence interval for instructions value: -3.86 -1.05
95% mean confidence interval for instructions %-change: -2.61% 0.04%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 122180744 -> 122179674 (<.01%)
cycles in affected programs: 3686646 -> 3685576 (-0.03%)
helped: 273
HURT: 141
helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6
helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
HURT stats (abs)   min: 2 max: 76 x̄: 3.66 x̃: 2
HURT stats (rel)   min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02%
95% mean confidence interval for cycles value: -3.42 -1.75
95% mean confidence interval for cycles %-change: -0.15% 0.03%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 821e7a4d32 nir: Separate a weird compare with zero to two compares with zero
min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0).

No shader-db changes, but it does prevent 6 to 12 instruction
regressions in the next patch on all measured Intel platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 68420d8322 nir: Simplify min and max of b2f
v2: Rebase on almost 2 years.  Require that one of the arguments to fmin
or fmax be used only once.  This prevents some regressions.

shader-db results:

Skylake and Broadwell had similar results.  Skylake shown.
total instructions in shared programs: 14526021 -> 14525913 (<.01%)
instructions in affected programs: 4613 -> 4505 (-2.34%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4
helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42%

total cycles in shared programs: 533118710 -> 533118403 (<.01%)
cycles in affected programs: 34334 -> 34027 (-0.89%)
helped: 24
HURT: 0
helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14
helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03%

No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick d8d18516b0 nir: Undo possible damage caused by rearranging or-compounded float compares
shader-db results:

Skylake and Broadwell had similar results (Skylake shown)
total instructions in shared programs: 14525898 -> 14525836 (<.01%)
instructions in affected programs: 1964 -> 1902 (-3.16%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1
helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86%
95% mean confidence interval for instructions value: -9.46 0.60
95% mean confidence interval for instructions %-change: -3.97% -0.24%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 533119892 -> 533115756 (<.01%)
cycles in affected programs: 96061 -> 91925 (-4.31%)
helped: 13
HURT: 1
helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300
helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46%
95% mean confidence interval for cycles value: -379.43 -211.43
95% mean confidence interval for cycles %-change: -4.84% -3.01%
Cycles are helped.

Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown).
total instructions in shared programs: 9033948 -> 9033898 (<.01%)
instructions in affected programs: 535 -> 485 (-9.35%)
helped: 2
HURT: 0

total cycles in shared programs: 84631402 -> 84628949 (<.01%)
cycles in affected programs: 63197 -> 60744 (-3.88%)
helped: 13
HURT: 2
helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140
helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01%
HURT stats (abs)   min: 4 max: 8 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for cycles value: -253.40 -73.67
95% mean confidence interval for cycles %-change: -4.24% -2.25%
Cycles are helped.

No changes on GM45 or Iron Lake.

v2: Add a couple more tautological compares.  Suggested by Elie.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 3941cba0f7 nir: Be more conservative about rearranging or-compounded compares
If both comparisons are used as sources for instructions other than the
ior, this transformation is detrimental.  If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.

shader-db results:

Skylake
total instructions in shared programs: 14526147 -> 14525898 (<.01%)
instructions in affected programs: 70239 -> 69990 (-0.35%)
helped: 102
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.86 -2.02
95% mean confidence interval for instructions %-change: -0.46% -0.31%
Instructions are helped.

total cycles in shared programs: 533120531 -> 533119892 (<.01%)
cycles in affected programs: 994875 -> 994236 (-0.06%)
helped: 76
HURT: 26
helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13
helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18%
HURT stats (abs)   min: 1 max: 167 x̄: 54.62 x̃: 26
HURT stats (rel)   min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39%
95% mean confidence interval for cycles value: -19.44 6.91
95% mean confidence interval for cycles %-change: -0.30% 0.15%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 14816005 -> 14815787 (<.01%)
instructions in affected programs: 64658 -> 64440 (-0.34%)
helped: 97
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.62 -1.87
95% mean confidence interval for instructions %-change: -0.45% -0.30%
Instructions are helped.

total cycles in shared programs: 559340386 -> 559339907 (<.01%)
cycles in affected programs: 1090491 -> 1090012 (-0.04%)
helped: 66
HURT: 28
helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16
helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27%
HURT stats (abs)   min: 2 max: 226 x̄: 39.07 x̃: 11
HURT stats (rel)   min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20%
95% mean confidence interval for cycles value: -15.94 5.75
95% mean confidence interval for cycles %-change: -0.35% 0.07%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 1

Haswell
total instructions in shared programs: 9034106 -> 9033948 (<.01%)
instructions in affected programs: 24096 -> 23938 (-0.66%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4
helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64%
95% mean confidence interval for instructions value: -4.71 -3.60
95% mean confidence interval for instructions %-change: -0.84% -0.58%
Instructions are helped.

total cycles in shared programs: 84631628 -> 84631402 (<.01%)
cycles in affected programs: 148674 -> 148448 (-0.15%)
helped: 14
HURT: 14
helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12
helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21%
HURT stats (abs)   min: 1 max: 10 x̄: 6.00 x̃: 5
HURT stats (rel)   min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: -19.42 3.28
95% mean confidence interval for cycles %-change: -0.59% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 10015456 -> 10015293 (<.01%)
instructions in affected programs: 27701 -> 27538 (-0.59%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4
helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52%
95% mean confidence interval for instructions value: -4.87 -3.71
95% mean confidence interval for instructions %-change: -0.82% -0.51%
Instructions are helped.

total cycles in shared programs: 87524771 -> 87524569 (<.01%)
cycles in affected programs: 112324 -> 112122 (-0.18%)
helped: 6
HURT: 12
helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20
helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26%
HURT stats (abs)   min: 1 max: 16 x̄: 5.50 x̃: 5
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -29.14 6.69
95% mean confidence interval for cycles %-change: -0.93% 0.08%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Sandy Bridge
total instructions in shared programs: 10545655 -> 10545465 (<.01%)
instructions in affected programs: 37198 -> 37008 (-0.51%)
helped: 42
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4
helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49%
95% mean confidence interval for instructions value: -5.14 -3.91
95% mean confidence interval for instructions %-change: -0.68% -0.47%
Instructions are helped.

total cycles in shared programs: 146113059 -> 146112427 (<.01%)
cycles in affected programs: 423514 -> 422882 (-0.15%)
helped: 32
HURT: 10
helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12
helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11%
HURT stats (abs)   min: 12 max: 19 x̄: 14.70 x̃: 14
HURT stats (rel)   min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14%
95% mean confidence interval for cycles value: -26.03 -4.07
95% mean confidence interval for cycles %-change: -0.43% -0.05%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886959 -> 7886925 (<.01%)
instructions in affected programs: 1340 -> 1306 (-2.54%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8
helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43%
95% mean confidence interval for instructions value: -20.44 3.44
95% mean confidence interval for instructions %-change: -5.78% 0.89%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 178116996 -> 178116888 (<.01%)
cycles in affected programs: 6262 -> 6154 (-1.72%)
helped: 2
HURT: 2
helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61
helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62%
HURT stats (abs)   min: 6 max: 8 x̄: 7.00 x̃: 7
HURT stats (rel)   min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -93.27 39.27
95% mean confidence interval for cycles %-change: -5.38% 2.27%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857887 -> 4857870 (<.01%)
instructions in affected programs: 674 -> 657 (-2.52%)
helped: 2
HURT: 0

total cycles in shared programs: 122180816 -> 122180744 (<.01%)
cycles in affected programs: 3764 -> 3692 (-1.91%)
helped: 1
HURT: 1
helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78
helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34%

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick cfc0d34802 nir: See through an fneg to apply existing optimizations
Doing the same for the existing feq and fne transformations didn't help
anything in shader-db.

shader-db results:

Broadwell and Skylake (Skylake shown)
total instructions in shared programs: 14529463 -> 14526147 (-0.02%)
instructions in affected programs: 402420 -> 399104 (-0.82%)
helped: 2136
HURT: 131
helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1
helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57%
95% mean confidence interval for instructions value: -1.51 -1.41
95% mean confidence interval for instructions %-change: -3.06% -2.78%
Instructions are helped.

total cycles in shared programs: 533146915 -> 533120531 (<.01%)
cycles in affected programs: 10356261 -> 10329877 (-0.25%)
helped: 1933
HURT: 844
helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16
helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88%
HURT stats (abs)   min: 1 max: 423 x̄: 36.17 x̃: 12
HURT stats (rel)   min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59%
95% mean confidence interval for cycles value: -11.78 -7.22
95% mean confidence interval for cycles %-change: -1.98% -1.65%
Cycles are helped.

Haswell
total instructions in shared programs: 9037416 -> 9034106 (-0.04%)
instructions in affected programs: 389831 -> 386521 (-0.85%)
helped: 2184
HURT: 120
helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57%
95% mean confidence interval for instructions value: -1.49 -1.39
95% mean confidence interval for instructions %-change: -2.68% -2.41%
Instructions are helped.

total cycles in shared programs: 84636243 -> 84631628 (<.01%)
cycles in affected programs: 4745058 -> 4740443 (-0.10%)
helped: 1904
HURT: 960
helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18
helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38%
HURT stats (abs)   min: 1 max: 1080 x̄: 55.11 x̃: 14
HURT stats (rel)   min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81%
95% mean confidence interval for cycles value: -4.51 1.29
95% mean confidence interval for cycles %-change: -1.64% -1.25%
Inconclusive result (value mean confidence interval includes 0).

LOST:   1
GAINED: 0

Sandy Bridge and Ivy Bridge (Ivy Bridge shown)
total instructions in shared programs: 10018873 -> 10015456 (-0.03%)
instructions in affected programs: 512820 -> 509403 (-0.67%)
helped: 2268
HURT: 162
helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88%
HURT stats (abs)   min: 1 max: 4 x̄: 1.59 x̃: 1
HURT stats (rel)   min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50%
95% mean confidence interval for instructions value: -1.46 -1.35
95% mean confidence interval for instructions %-change: -2.38% -2.12%
Instructions are helped.

total cycles in shared programs: 87538223 -> 87524771 (-0.02%)
cycles in affected programs: 5435520 -> 5422068 (-0.25%)
helped: 1916
HURT: 946
helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18
helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97%
HURT stats (abs)   min: 1 max: 633 x̄: 45.41 x̃: 11
HURT stats (rel)   min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62%
95% mean confidence interval for cycles value: -7.34 -2.06
95% mean confidence interval for cycles %-change: -1.62% -1.26%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 7888446 -> 7886959 (-0.02%)
instructions in affected programs: 331581 -> 330094 (-0.45%)
helped: 1160
HURT: 97
helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1
helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25%
95% mean confidence interval for instructions value: -1.25 -1.12
95% mean confidence interval for instructions %-change: -0.91% -0.75%
Instructions are helped.

total cycles in shared programs: 178130766 -> 178116996 (<.01%)
cycles in affected programs: 12534564 -> 12520794 (-0.11%)
helped: 1856
HURT: 187
helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4
helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11%
HURT stats (abs)   min: 2 max: 26 x̄: 3.55 x̃: 2
HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02%
95% mean confidence interval for cycles value: -7.41 -6.07
95% mean confidence interval for cycles %-change: -0.28% -0.22%
Cycles are helped.

GM45
total instructions in shared programs: 4858912 -> 4857887 (-0.02%)
instructions in affected programs: 237565 -> 236540 (-0.43%)
helped: 867
HURT: 57
helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: -1.18 -1.04
95% mean confidence interval for instructions %-change: -0.88% -0.71%
Instructions are helped.

total cycles in shared programs: 122189118 -> 122180816 (<.01%)
cycles in affected programs: 8776418 -> 8768116 (-0.09%)
helped: 1213
HURT: 166
helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4
helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11%
HURT stats (abs)   min: 2 max: 26 x̄: 3.35 x̃: 2
HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02%
95% mean confidence interval for cycles value: -6.78 -5.26
95% mean confidence interval for cycles %-change: -0.24% -0.18%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Connor Abbott de91461575 nir: fix algebraic optimizations
The optimizations are only valid for 32-bit integers. They were
mistakenly firing for 64-bit integers as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-08-01 12:20:49 -07:00
Matt Turner aff108f2fd nir: Optimize find_lsb/imsb/umsb error checks
Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-07-20 16:56:50 -07:00
Timothy Arceri 7a7ee40c2d nir/i965: add before ffma algebraic opts
This shuffles constants down in the reverse of what the previous
patch does and applies some simpilifications that may be made
possible from doing so.

Shader-db results BDW:

total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
instructions in affected programs: 281889 -> 278897 (-1.06%)
helped: 1231
HURT: 128

total cycles in shared programs: 246562852 -> 246567288 (0.00%)
cycles in affected programs: 11271524 -> 11275960 (0.04%)
helped: 1630
HURT: 1378

V2: mark float opts as inexact

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri fb2269fed1 nir: shuffle constants to the top
V2: mark float opts as inexact

If one of the inputs to an mul/add is the result of another
mul/add there is a chance that we can reuse the result of that
mul/add in other calls if we do the multiplication in the right
order.

Also by attempting to move all constants to the top we increase
the chance of constant folding.

For example it is a fairly common pattern for shaders to do something
similar to this:

  const float a = 0.5;
  in vec4 b;
  in float c;

  ...

  b.x = b.x * c;
  b.y = b.y * c;

  ...

  b.x = b.x * a + a;
  b.y = b.y * a + a;

So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:

  ...

  c = a * c;

  ...

  b.x = b.x * c + a;
  b.y = b.y * c + a;

Shader-db results BDW:

total instructions in shared programs: 13011050 -> 12967888 (-0.33%)
instructions in affected programs: 4118366 -> 4075204 (-1.05%)
helped: 17739
HURT: 1343

total cycles in shared programs: 246717952 -> 246410716 (-0.12%)
cycles in affected programs: 166870802 -> 166563566 (-0.18%)
helped: 18493
HURT: 7965

total spills in shared programs: 14937 -> 14560 (-2.52%)
spills in affected programs: 9331 -> 8954 (-4.04%)
helped: 284
HURT: 33

total fills in shared programs: 20211 -> 19671 (-2.67%)
fills in affected programs: 12586 -> 12046 (-4.29%)
helped: 286
HURT: 33

LOST:   39
GAINED: 33

Some of the hurt will go away when we shuffle things back down to the
bottom in the following patch. It's also noteworthy that almost all of the
spill changes are in Deus Ex both hurt and helped.

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Timothy Arceri 83f7fdf83a nir: add flt comparision simplification
Didn't turn out as useful as I'd hoped, but it will help alot more on
i965 by reducing regressions when we drop brw_do_channel_expressions()
and brw_do_vector_splitting().

I'm not sure how much sense 'is_not_used_by_conditional' makes on
platforms other than i965 but since this is a new opt it at least
won't do any harm.

shader-db BDW:

total instructions in shared programs: 13029581 -> 13029415 (-0.00%)
instructions in affected programs: 15268 -> 15102 (-1.09%)
helped: 86
HURT: 0

total cycles in shared programs: 247038346 -> 247036198 (-0.00%)
cycles in affected programs: 692634 -> 690486 (-0.31%)
helped: 183
HURT: 27

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-04-24 12:08:14 +10:00
Jason Ekstrand 762a6333f2 nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Emil Velikov e4c7911150 nir: remove shebang from python scripts
Analogous to earlier commit(s).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-03-10 14:12:47 +00:00
Jason Ekstrand 70e86a3f2d nir/algebraic: Optimize 64bit pack/unpack
This reduces the instruction count in some fp64 and int64 piglit tests

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 17:28:03 -08:00
Jason Ekstrand 161d3e81be nir: Combine the int and double [un]pack opcodes
NIR is a typeless IR and the two opcodes, when considered bitwise, do
exactly the same thing.  There's no reason to have two versions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-02-16 17:28:03 -08:00
Ian Romanick fda33e09d8 nir: Shift count for shift opcodes is always 32-bits
Previously both sources were unsized.  This caused problems when the
thing being shifted was 64-bit but the shift count was 32-bit.  The
expectation in NIR is that all unsized sources (and destination) will
ultimately have the same size.

The changes in nir_opt_algebraic.py are to prevent errors like:

 Failed to parse transformation:
03:12:25   (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte')
03:12:25 Traceback (most recent call last):
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__
03:12:25     xform = SearchAndReplace(xform)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__
03:12:25     BitSizeValidator(varset).validate(self.search, self.replace)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate
03:12:25     validate_dst_class = self._validate_bit_class_up(replace)
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up
03:12:25     src_class = self._validate_bit_class_up(val.sources[i])
03:12:25   File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up
03:12:25     assert src_class == src_type_bits
03:12:25 AssertionError

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
2017-01-20 15:41:23 -08:00
Elie TOURNIER 9fdaeb7776 nir: add min/max optimisation
Add the following optimisations:

min(x, -x) = -abs(x)
min(x, -abs(x)) = -abs(x)
min(x, abs(x)) = x
max(x, -abs(x)) = x
max(x, abs(x)) = abs(x)
max(x, -x) = abs(x)

shader-db:

total instructions in shared programs: 13067779 -> 13067775 (-0.00%)
instructions in affected programs: 249 -> 245 (-1.61%)
helped: 4
HURT: 0

total cycles in shared programs: 252054838 -> 252054806 (-0.00%)
cycles in affected programs: 504 -> 472 (-6.35%)
helped: 2
HURT: 0

Signed-off-by: Elie Tournier <tournier.elie@gmail.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-19 21:44:28 -08:00
Timothy Arceri 772cd31048 nir: optimise min/max fadd combos
shader-db results BDW:

total instructions in shared programs: 13060410 -> 13060313 (-0.00%)
instructions in affected programs: 24533 -> 24436 (-0.40%)
helped: 88
HURT: 0

total cycles in shared programs: 256585692 -> 256586698 (0.00%)
cycles in affected programs: 647290 -> 648296 (0.16%)
helped: 35
HURT: 30

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-14 23:26:22 +11:00
Timothy Arceri de8b03f5fb nir: don't turn ieq/ine into inot if used by an if
Otherwise we will end up with an extra instruction to compare the
result of the inot.

On BDW:

total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
instructions in affected programs: 103379 -> 103240 (-0.13%)
helped: 127
HURT: 0

total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
cycles in affected programs: 11324730 -> 11321188 (-0.03%)
helped: 114
HURT: 21

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri 7acc865226 nir: add late opt to turn inot/b2f combos back to bcsel
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.

On BDW:

total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2

total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240

LOST:   0
GAINED: 1

The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri 8f37fc7066 nir: add imprecise flrp optimisation
On BDW:

total instructions in shared programs: 13061890 -> 13061877 (-0.00%)
instructions in affected programs: 2441 -> 2428 (-0.53%)
helped: 13
HURT: 0

total cycles in shared programs: 256612254 -> 256611784 (-0.00%)
cycles in affected programs: 16418 -> 15948 (-2.86%)
helped: 10
HURT: 2

V2: don't use ffma directly

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri 1130f82a88 nir: add another comparison simplification
On BDW:

total instructions in shared programs: 13061877 -> 13060965 (-0.01%)
instructions in affected programs: 133569 -> 132657 (-0.68%)
helped: 566
HURT: 0

total cycles in shared programs: 256611784 -> 256599536 (-0.00%)
cycles in affected programs: 861016 -> 848768 (-1.42%)
helped: 379
HURT: 73

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 12:32:16 +11:00
Kenneth Graunke 3371de38f2 nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On BDW:

total instructions in shared programs: 13074882 -> 13068703 (-0.05%)
instructions in affected programs: 1823116 -> 1816937 (-0.34%)
helped: 4187
HURT: 537

total cycles in shared programs: 256622718 -> 256425382 (-0.08%)
cycles in affected programs: 123790120 -> 123592784 (-0.16%)
helped: 3823
HURT: 2037

total spills in shared programs: 15276 -> 14929 (-2.27%)
spills in affected programs: 9446 -> 9099 (-3.67%)
helped: 352
HURT: 1

total fills in shared programs: 20496 -> 20144 (-1.72%)
fills in affected programs: 13040 -> 12688 (-2.70%)
helped: 352
HURT: 1

LOST:   2
GAINED: 21

v2: Rely on 'a' being a well-formed boolean (Connor, Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 12:32:16 +11:00
Kenneth Graunke 1c50d31c26 nir: Convert ineg(b2i(a)) to a if it's a boolean.
On BDW:

total instructions in shared programs: 13071119 -> 13070371 (-0.01%)
instructions in affected programs: 83424 -> 82676 (-0.90%)
helped: 505
HURT: 45 (all TCS, all hurt by a single instruction)

total cycles in shared programs: 256601322 -> 256588932 (-0.00%)
cycles in affected programs: 819410 -> 807020 (-1.51%)
helped: 450
HURT: 57

total loops in shared programs: 2950 -> 2942 (-0.27%)
loops in affected programs: 8 -> 0
helped: 7
HURT: 0

v2: Drop unnecessary 'a@bool' annotation (Connor, Eric).
    Add a comment explaining the rule (Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-09 12:32:16 +11:00
Jason Ekstrand d55835b8bd nir/algebraic: Add optimizations for "a == a && a CMP b"
This sequence shows up The Talos Principal, at least under Vulkan,
and prevents loop analysis from properly computing trip counts in a
few loops.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-12-22 16:27:19 -08:00
Matt Turner ac6646129f nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.
instructions in affected programs: 550 -> 544 (-1.09%)
helped: 6

cycles in affected programs: 6952 -> 6850 (-1.47%)
helped: 6

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:39:27 -08:00
Ian Romanick 4d35683d91 nir: Optimize integer division and modulus with 1
The previous power-of-two rules didn't catch idiv (because i965 doesn't
set lower_idiv) and imod cases.  The udiv and umod cases should have
been caught, but I included them for orthogonality.

This fixes silly code observed from compute shaders with local_size_[xy]
= 1.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 14:25:10 -07:00
Kenneth Graunke 7d0554f341 nir: Rely on the fact that bcsel takes a well formed boolean.
According to Connor, it's safe to assume that the first operand of
bcsel, as well as the operand of b2f and b2i, must be well formed
booleans.

https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html

With the previous improvements to a@bool handling, this now has no
change in shader-db instruction counts on Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-19 02:05:23 -07:00
Ian Romanick cceb50e14e nir/algebraic: Optimize common array indexing sequence
Some shaders include code that looks like:

   uniform int i;
   uniform vec4 bones[...];

   foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);

CSE would do some work on this:

   x = i * 3
   foo(bones[x], bones[x + 1], bones[x + 2]);

The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like

   x = i * 3
   foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);

Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.

   x = i * 48;
   foo(bones[x], bones[x + 16], bones[x + 32]);

So, ~6 instructions becomes ~3.

Some individual shader-db results look pretty bad.  However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.

G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0

total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0

Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0

total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0

Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0

total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317

Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0

total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307

Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0

total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261

Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5

total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415

Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5

total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-17 10:52:38 +01:00
Ian Romanick 0b626d7524 nir/algebraic: Optimize fabs(u2f(x))
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass.  The code generated for frexp() uses
fabs, and this resulted in an extra instruction.  Ultimately I ended up
not using frexp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:30 -07:00
Eric Anholt c93f6938d5 nir: Add optimization for (a || True == True)
This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
producing some constants that were getting compared.

total instructions in shared programs: 112276 -> 112198 (-0.07%)
instructions in affected programs:     2239 -> 2161 (-3.48%)
total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
estimated cycles in affected programs:     2365 -> 2301 (-2.71%)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-12 15:46:09 -07:00
Jason Ekstrand 68e308d853 nir/algebraic: Remove imprecise flog2 optimizations
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original.  For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite.  However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).

This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2.  This is done 6 or 8 times and summed to produce
the final result which is written to a red texture.  In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 11:56:57 -07:00
Rob Clark dfbae7d64f nir/algebraic: support for power-of-two optimizations
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 16:05:03 -04:00
Jason Ekstrand 1b72c31e1f nir/algebraic: Separate ffma lowering from fusing
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 11:44:35 -07:00
Samuel Iglesias Gonsálvez 2ab2d2e588 nir: Separate 32 and 64-bit fmod lowering
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 08:07:49 +02:00
Jason Ekstrand 6d4a426745 nir/algebraic: Support lowering for both 64 and 32-bit ldexp
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Jason Ekstrand f0af5b87ec nir/opcodes: Make ldexp take an explicitly 32-bit int
There is no sense in having the double version of ldexp take a 64-bit
integer.  Instead, let's just take a 32-bit int all the time.  This also
matches what GLSL does where both variants of ldexp take a regular integer
for the exponent argument.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Samuel Iglesias Gonsálvez db07b46f2c nir: Add lrp lowering for doubles in opt_algebraic
Some hardware (i965 on Broadwell generation, for example) does not support
natively the execution of lrp instruction with double arguments.

Add 'lower_flrp64' flag to lower this instruction in that case.

v2:
   - Rename lower_flrp_double to lower_flrp64 (Jason)
   - Fix typo (Jason)
   - Adapt the code to define bit_size information in the opcodes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Samuel Iglesias Gonsálvez 443600d51e nir: rename lower_flrp to lower_flrp32
A later patch will add lower_flrp64 option to NIR.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Jason Ekstrand 8a3e344180 nir/opt_algebraic: Fix some expressions with ambiguous bit sizes
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand 7e0ee3a38b nir/search: Respect the bit_size parameter on nir_search_value
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand fcc1c8a437 nir/algebraic: Add a mechanism for specifying the bit size of a value
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand 4455bfa9a0 nir/algebraic: Add lowering for ldexp
The algorithm used is different from both the naive suggestion from the
GLSL spec and the one used in GLSL IR today.  Unfortunately, the GLSL IR
implementation that we have today doesn't handle denormals (for those that
care) or the case where the float source is +-inf.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:19 -07:00
Jason Ekstrand 745b3d295e nir: Add more modulus opcodes
These are all needed for SPIR-V

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:00 -07:00
Markus Wick 18c8b927e2 nir: Merge redudant integer clamping.
Dolphin uses them a lot. Range tracking would be better in the long term,
but this two lines works fine for now.

Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 18:48:50 -07:00
Kenneth Graunke 5886cd79a0 nir: Do basic constant reassociation.
Many shaders contain expression trees of the form:

    const_1 * (value * const_2)

Reorganizing these to

    (const_1 * const_2) * value

will allow constant folding to combine the constants.  Sometimes, these
constants are 2 and 0.5, so we can remove a multiply altogether.  Other
times, it can create more immediate constants, which can actually hurt.

Finding a good balance here is tricky.  While much more could be done,
this simple patch seems to have a lot of positive benefit while having
a low downside.

shader-db results on Broadwell:

total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
instructions in affected programs: 438318 -> 435919 (-0.55%)
helped: 1502
HURT: 245

total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
cycles in affected programs: 11541788 -> 11435950 (-0.92%)
helped: 3445
HURT: 1224

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 18:43:55 -07:00
Ian Romanick 08ff5f4d1f nir: Simplify a bcsel to logical-or
Oddly, this did not affect the shader where I first noticed the pattern.
That particular shader doesn't get its if-statement converted to a bcsel
because there are two assignments in the else-statement.  This led to me
submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747.

shader-db results:

Sandy Bridge
total instructions in shared programs: 8467384 -> 8467069 (-0.00%)
instructions in affected programs: 36594 -> 36279 (-0.86%)
helped: 46
HURT: 0

total cycles in shared programs: 117573448 -> 117568518 (-0.00%)
cycles in affected programs: 339114 -> 334184 (-1.45%)
helped: 46
HURT: 0

Ivy Bridge / Haswell / Broadwell / Skylake:
total instructions in shared programs: 7774258 -> 7773999 (-0.00%)
instructions in affected programs: 30874 -> 30615 (-0.84%)
helped: 46
HURT: 0

total cycles in shared programs: 65739190 -> 65734530 (-0.01%)
cycles in affected programs: 180380 -> 175720 (-2.58%)
helped: 45
HURT: 1

No change on G45 or Ironlake.

I also tried these expressions, but none of them affected any shaders in
shader-db:

   (('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)),
   (('bcsel', a, 'b@bool', False),    ('iand', a, b)),
   (('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)),

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-31 14:59:36 -07:00
Matt Turner 05ee6627d6 nir: Fix typo from commit 6702f1acde. 2016-03-30 19:18:35 -07:00
Matt Turner 6702f1acde nir: Propagate negates up multiplication chains.
total instructions in shared programs: 7112159 -> 7088092 (-0.34%)
instructions in affected programs: 1374915 -> 1350848 (-1.75%)
helped: 7392
HURT: 621

GAINED: 2
LOST:   2
2016-03-30 13:12:34 -07:00
Jason Ekstrand 0dbda153aa nir/algebraic: Flag inexact optimizations
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe.  This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact".  It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:02 -07:00
Jason Ekstrand ed3a029e80 nir/algebraic: Fix fmin detection to match the spec
The previous transformation got the arguments to fmin backwards.  When NaNs
are involved, the GLSL min/max aren't commutative so it matters.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:00 -07:00
Jason Ekstrand 89545b1314 nir/algebraic: Get rid of an invlid fxor optimization
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:58 -07:00
Jason Ekstrand 3a7cb6534c nir/algebraic: Allow for flagging operations as being inexact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:55 -07:00
Ian Romanick d7a25a9def nir: Don't abs slt and friends
No shader-db changes, but this is symmetric with the previous commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 2bb006af68 nir: Don't abs the result of b2f or b2i
In the results below, 2 SIMD16 shaders in Trine are lost.

G4X
total instructions in shared programs: 4012279 -> 4011108 (-0.03%)
instructions in affected programs: 116776 -> 115605 (-1.00%)
helped: 339
HURT: 0

total cycles in shared programs: 84315862 -> 84313584 (-0.00%)
cycles in affected programs: 1767232 -> 1764954 (-0.13%)
helped: 274
HURT: 81

Ironlake
total instructions in shared programs: 6399073 -> 6396998 (-0.03%)
instructions in affected programs: 218050 -> 215975 (-0.95%)
helped: 600
HURT: 0

total cycles in shared programs: 128892088 -> 128888810 (-0.00%)
cycles in affected programs: 2867452 -> 2864174 (-0.11%)
helped: 422
HURT: 137

Sandy Bridge
total instructions in shared programs: 8462174 -> 8460759 (-0.02%)
instructions in affected programs: 178529 -> 177114 (-0.79%)
helped: 596
HURT: 0

total cycles in shared programs: 117542276 -> 117534098 (-0.01%)
cycles in affected programs: 1239166 -> 1230988 (-0.66%)
helped: 369
HURT: 150

Ivy Bridge
total instructions in shared programs: 7775131 -> 7773410 (-0.02%)
instructions in affected programs: 162903 -> 161182 (-1.06%)
helped: 590
HURT: 0

total cycles in shared programs: 65759882 -> 65747268 (-0.02%)
cycles in affected programs: 1004354 -> 991740 (-1.26%)
helped: 467
HURT: 141

Haswell
total instructions in shared programs: 7107786 -> 7106327 (-0.02%)
instructions in affected programs: 140954 -> 139495 (-1.04%)
helped: 590
HURT: 0

total cycles in shared programs: 64668028 -> 64655322 (-0.02%)
cycles in affected programs: 967080 -> 954374 (-1.31%)
helped: 452
HURT: 149

LOST:   2
GAINED: 0

Broadwell
total instructions in shared programs: 8980029 -> 8978287 (-0.02%)
instructions in affected programs: 197232 -> 195490 (-0.88%)
helped: 715
HURT: 0

total cycles in shared programs: 70070448 -> 70055970 (-0.02%)
cycles in affected programs: 975724 -> 961246 (-1.48%)
helped: 471
HURT: 111

LOST:   2
GAINED: 0

Skylake
total instructions in shared programs: 9115178 -> 9113436 (-0.02%)
instructions in affected programs: 203012 -> 201270 (-0.86%)
helped: 715
HURT: 0

total cycles in shared programs: 68848660 -> 68834004 (-0.02%)
cycles in affected programs: 993888 -> 979232 (-1.47%)
helped: 473
HURT: 116

LOST:   2
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 348e5a71d8 nir: Simplify 0 < fabs(a)
Sandy Bridge / Ivy Bridge / Haswell
total instructions in shared programs: 8462180 -> 8462174 (-0.00%)
instructions in affected programs: 564 -> 558 (-1.06%)
helped: 6
HURT: 0

total cycles in shared programs: 117542462 -> 117542276 (-0.00%)
cycles in affected programs: 9768 -> 9582 (-1.90%)
helped: 12
HURT: 0

Broadwell / Skylake
total instructions in shared programs: 8980833 -> 8980826 (-0.00%)
instructions in affected programs: 626 -> 619 (-1.12%)
helped: 7
HURT: 0

total cycles in shared programs: 70077900 -> 70077714 (-0.00%)
cycles in affected programs: 9378 -> 9192 (-1.98%)
helped: 12
HURT: 0

G45 and Ironlake showed no change.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:47:56 -07:00
Ian Romanick 564a8b8a26 nir: Simplify 0 >= b2f(a)
This also prevented some regressions with other patches in my local
tree.

Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0

total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0

No changes on earlier platforms.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:44:57 -07:00
Ian Romanick bf0d60aa11 nir: Simplify i2b with negated or abs operand
This enables removing ssa_201 and ssa_202 in sequences like:

                 vec1 ssa_200 = flt ssa_199, ssa_194
                 vec1 ssa_201 = b2i ssa_200
                 vec1 ssa_202 = i2b -ssa_201

shader-db results:

Sandy Bridge
total instructions in shared programs: 8462257 -> 8462180 (-0.00%)
instructions in affected programs: 3846 -> 3769 (-2.00%)
helped: 35
HURT: 0

total cycles in shared programs: 117542934 -> 117542462 (-0.00%)
cycles in affected programs: 20072 -> 19600 (-2.35%)
helped: 20
HURT: 1

Ivy Bridge
total instructions in shared programs: 7775252 -> 7775137 (-0.00%)
instructions in affected programs: 3645 -> 3530 (-3.16%)
helped: 35
HURT: 0

total cycles in shared programs: 65760522 -> 65760068 (-0.00%)
cycles in affected programs: 21082 -> 20628 (-2.15%)
helped: 25
HURT: 2

Haswell
total instructions in shared programs: 7108666 -> 7108589 (-0.00%)
instructions in affected programs: 3253 -> 3176 (-2.37%)
helped: 35
HURT: 0

total cycles in shared programs: 64675726 -> 64675272 (-0.00%)
cycles in affected programs: 21034 -> 20580 (-2.16%)
helped: 26
HURT: 1

Broadwell / Skylake
total instructions in shared programs: 8980912 -> 8980835 (-0.00%)
instructions in affected programs: 3223 -> 3146 (-2.39%)
helped: 35
HURT: 0

total cycles in shared programs: 70077926 -> 70077904 (-0.00%)
cycles in affected programs: 21886 -> 21864 (-0.10%)
helped: 21
HURT: 6

G45 and Ironlake showed no change.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:43:28 -07:00
Ian Romanick a4079f1cb2 nir: Lower flrp with Boolean interpolator to bcsel
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse.  On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.

shader-db results:

G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1

total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53

Unsurprisingly, no changes on later platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:42:42 -07:00
Matt Turner 905ff86198 nir: Recognize open-coded extract_u16.
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner 76289fbfa8 nir: Recognize open-coded extract_u8.
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:

   float((val >> 16u) & 0xffu)
   float((val >>  8u) & 0xffu)
   float((val >>  0u) & 0xffu)

Instead of shifting, masking, and type converting like this:

   shr(8)          g15<1>UD        g25<8,8,1>UD    0x00000010UD
   and(8)          g16<1>UD        g15<8,8,1>UD    0x000000ffUD
   mov(8)          g17<1>F         g16<8,8,1>UD

   shr(8)          g18<1>UD        g25<8,8,1>UD    0x00000008UD
   and(8)          g19<1>UD        g18<8,8,1>UD    0x000000ffUD
   mov(8)          g20<1>F         g19<8,8,1>UD

   and(8)          g21<1>UD        g25<8,8,1>UD    0x000000ffUD
   mov(8)          g22<1>F         g21<8,8,1>UD

i965 can simply extract a byte and convert to float in a single
instruction:

   mov(8)          g17<1>F         g25.2<32,8,4>UB
   mov(8)          g20<1>F         g25.1<32,8,4>UB
   mov(8)          g22<1>F         g25.0<32,8,4>UB

This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.

   instructions in affected programs: 28568 -> 27450 (-3.91%)
   helped: 7

   cycles in affected programs: 210076 -> 203144 (-3.30%)
   helped: 7

This patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4520 -> 4374 instructions (-3.23%)
 #1706: 3752 -> 3582 instructions (-4.53%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner 371c4b3c48 nir: Recognize open-coded bitfield_reverse.
Helps 11 shaders in UnrealEngine4 demos.

I seriously hope they would have given us bitfieldReverse() if we
exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that
anyway?).

instructions in affected programs: 4875 -> 4633 (-4.96%)
cycles in affected programs: 270516 -> 244516 (-9.61%)

I suspect there's a *lot* of room to improve nir_search/opt_algebraic's
handling of this. We'd actually like to match, e.g., step2 by matching
step1 once and then doing a pointer comparison for the second instance
of step1, but unfortunately we generate an enormous tuple for instead.

The .text size increases by 6.5% and the .data by 17.5%.

   text     data  bss    dec    hex  filename
  22957    45224    0  68181  10a55  nir_libnir_la-nir_opt_algebraic.o
  24461    53160    0  77621  12f35  nir_libnir_la-nir_opt_algebraic.o

I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is
in a GL 4.0 context once we expose GL 4.0.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 21:20:58 -08:00
Matt Turner a8f0960816 nir: Recognize product of open-coded pow()s.
Prevents regressions in the next commit.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner 9f02e3ab03 nir: Add opt_algebraic rules for xor with zero.
instructions in affected programs: 668 -> 664 (-0.60%)
helped: 4

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner 955d052058 nir: Add lowering support for unpacking opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 9b8786eba9 nir: Add lowering support for packing opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 68f8c5730b nir: Add opcodes to extract bytes or words.
The uint versions zero extend while the int versions sign extend.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Emil Velikov a39a8fbbaa nir: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:30 +00:00
Renamed from src/glsl/nir/nir_opt_algebraic.py (Browse further)