Commit Graph

249 Commits

Author SHA1 Message Date
Timothy Arceri 7acc865226 nir: add late opt to turn inot/b2f combos back to bcsel
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.

On BDW:

total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2

total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240

LOST:   0
GAINED: 1

The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri 8f37fc7066 nir: add imprecise flrp optimisation
On BDW:

total instructions in shared programs: 13061890 -> 13061877 (-0.00%)
instructions in affected programs: 2441 -> 2428 (-0.53%)
helped: 13
HURT: 0

total cycles in shared programs: 256612254 -> 256611784 (-0.00%)
cycles in affected programs: 16418 -> 15948 (-2.86%)
helped: 10
HURT: 2

V2: don't use ffma directly

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-12 09:47:29 +11:00
Timothy Arceri 1130f82a88 nir: add another comparison simplification
On BDW:

total instructions in shared programs: 13061877 -> 13060965 (-0.01%)
instructions in affected programs: 133569 -> 132657 (-0.68%)
helped: 566
HURT: 0

total cycles in shared programs: 256611784 -> 256599536 (-0.00%)
cycles in affected programs: 861016 -> 848768 (-1.42%)
helped: 379
HURT: 73

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 12:32:16 +11:00
Kenneth Graunke 3371de38f2 nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.
On BDW:

total instructions in shared programs: 13074882 -> 13068703 (-0.05%)
instructions in affected programs: 1823116 -> 1816937 (-0.34%)
helped: 4187
HURT: 537

total cycles in shared programs: 256622718 -> 256425382 (-0.08%)
cycles in affected programs: 123790120 -> 123592784 (-0.16%)
helped: 3823
HURT: 2037

total spills in shared programs: 15276 -> 14929 (-2.27%)
spills in affected programs: 9446 -> 9099 (-3.67%)
helped: 352
HURT: 1

total fills in shared programs: 20496 -> 20144 (-1.72%)
fills in affected programs: 13040 -> 12688 (-2.70%)
helped: 352
HURT: 1

LOST:   2
GAINED: 21

v2: Rely on 'a' being a well-formed boolean (Connor, Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 12:32:16 +11:00
Kenneth Graunke 1c50d31c26 nir: Convert ineg(b2i(a)) to a if it's a boolean.
On BDW:

total instructions in shared programs: 13071119 -> 13070371 (-0.01%)
instructions in affected programs: 83424 -> 82676 (-0.90%)
helped: 505
HURT: 45 (all TCS, all hurt by a single instruction)

total cycles in shared programs: 256601322 -> 256588932 (-0.00%)
cycles in affected programs: 819410 -> 807020 (-1.51%)
helped: 450
HURT: 57

total loops in shared programs: 2950 -> 2942 (-0.27%)
loops in affected programs: 8 -> 0
helped: 7
HURT: 0

v2: Drop unnecessary 'a@bool' annotation (Connor, Eric).
    Add a comment explaining the rule (Ian).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-09 12:32:16 +11:00
Jason Ekstrand d55835b8bd nir/algebraic: Add optimizations for "a == a && a CMP b"
This sequence shows up The Talos Principal, at least under Vulkan,
and prevents loop analysis from properly computing trip counts in a
few loops.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-12-22 16:27:19 -08:00
Matt Turner ac6646129f nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.
instructions in affected programs: 550 -> 544 (-1.09%)
helped: 6

cycles in affected programs: 6952 -> 6850 (-1.47%)
helped: 6

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:39:27 -08:00
Ian Romanick 4d35683d91 nir: Optimize integer division and modulus with 1
The previous power-of-two rules didn't catch idiv (because i965 doesn't
set lower_idiv) and imod cases.  The udiv and umod cases should have
been caught, but I included them for orthogonality.

This fixes silly code observed from compute shaders with local_size_[xy]
= 1.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-19 14:25:10 -07:00
Kenneth Graunke 7d0554f341 nir: Rely on the fact that bcsel takes a well formed boolean.
According to Connor, it's safe to assume that the first operand of
bcsel, as well as the operand of b2f and b2i, must be well formed
booleans.

https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html

With the previous improvements to a@bool handling, this now has no
change in shader-db instruction counts on Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-19 02:05:23 -07:00
Ian Romanick cceb50e14e nir/algebraic: Optimize common array indexing sequence
Some shaders include code that looks like:

   uniform int i;
   uniform vec4 bones[...];

   foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);

CSE would do some work on this:

   x = i * 3
   foo(bones[x], bones[x + 1], bones[x + 2]);

The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like

   x = i * 3
   foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);

Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.

   x = i * 48;
   foo(bones[x], bones[x + 16], bones[x + 32]);

So, ~6 instructions becomes ~3.

Some individual shader-db results look pretty bad.  However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.

G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0

total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0

Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0

total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0

Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0

total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317

Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0

total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307

Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0

total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261

Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5

total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415

Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5

total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-17 10:52:38 +01:00
Ian Romanick 0b626d7524 nir/algebraic: Optimize fabs(u2f(x))
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass.  The code generated for frexp() uses
fabs, and this resulted in an extra instruction.  Ultimately I ended up
not using frexp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-19 12:19:30 -07:00
Eric Anholt c93f6938d5 nir: Add optimization for (a || True == True)
This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
producing some constants that were getting compared.

total instructions in shared programs: 112276 -> 112198 (-0.07%)
instructions in affected programs:     2239 -> 2161 (-3.48%)
total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
estimated cycles in affected programs:     2365 -> 2301 (-2.71%)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-12 15:46:09 -07:00
Jason Ekstrand 68e308d853 nir/algebraic: Remove imprecise flog2 optimizations
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original.  For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite.  However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).

This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2.  This is done 6 or 8 times and summed to produce
the final result which is written to a red texture.  In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 11:56:57 -07:00
Rob Clark dfbae7d64f nir/algebraic: support for power-of-two optimizations
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 16:05:03 -04:00
Jason Ekstrand 1b72c31e1f nir/algebraic: Separate ffma lowering from fusing
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 11:44:35 -07:00
Samuel Iglesias Gonsálvez 2ab2d2e588 nir: Separate 32 and 64-bit fmod lowering
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 08:07:49 +02:00
Jason Ekstrand 6d4a426745 nir/algebraic: Support lowering for both 64 and 32-bit ldexp
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Jason Ekstrand f0af5b87ec nir/opcodes: Make ldexp take an explicitly 32-bit int
There is no sense in having the double version of ldexp take a 64-bit
integer.  Instead, let's just take a 32-bit int all the time.  This also
matches what GLSL does where both variants of ldexp take a regular integer
for the exponent argument.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-28 21:36:52 -07:00
Samuel Iglesias Gonsálvez db07b46f2c nir: Add lrp lowering for doubles in opt_algebraic
Some hardware (i965 on Broadwell generation, for example) does not support
natively the execution of lrp instruction with double arguments.

Add 'lower_flrp64' flag to lower this instruction in that case.

v2:
   - Rename lower_flrp_double to lower_flrp64 (Jason)
   - Fix typo (Jason)
   - Adapt the code to define bit_size information in the opcodes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Samuel Iglesias Gonsálvez 443600d51e nir: rename lower_flrp to lower_flrp32
A later patch will add lower_flrp64 option to NIR.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Jason Ekstrand 8a3e344180 nir/opt_algebraic: Fix some expressions with ambiguous bit sizes
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand 7e0ee3a38b nir/search: Respect the bit_size parameter on nir_search_value
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand fcc1c8a437 nir/algebraic: Add a mechanism for specifying the bit size of a value
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-27 11:21:06 -07:00
Jason Ekstrand 4455bfa9a0 nir/algebraic: Add lowering for ldexp
The algorithm used is different from both the naive suggestion from the
GLSL spec and the one used in GLSL IR today.  Unfortunately, the GLSL IR
implementation that we have today doesn't handle denormals (for those that
care) or the case where the float source is +-inf.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:19 -07:00
Jason Ekstrand 745b3d295e nir: Add more modulus opcodes
These are all needed for SPIR-V

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-13 15:44:00 -07:00
Markus Wick 18c8b927e2 nir: Merge redudant integer clamping.
Dolphin uses them a lot. Range tracking would be better in the long term,
but this two lines works fine for now.

Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-11 18:48:50 -07:00
Kenneth Graunke 5886cd79a0 nir: Do basic constant reassociation.
Many shaders contain expression trees of the form:

    const_1 * (value * const_2)

Reorganizing these to

    (const_1 * const_2) * value

will allow constant folding to combine the constants.  Sometimes, these
constants are 2 and 0.5, so we can remove a multiply altogether.  Other
times, it can create more immediate constants, which can actually hurt.

Finding a good balance here is tricky.  While much more could be done,
this simple patch seems to have a lot of positive benefit while having
a low downside.

shader-db results on Broadwell:

total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
instructions in affected programs: 438318 -> 435919 (-0.55%)
helped: 1502
HURT: 245

total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
cycles in affected programs: 11541788 -> 11435950 (-0.92%)
helped: 3445
HURT: 1224

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-11 18:43:55 -07:00
Ian Romanick 08ff5f4d1f nir: Simplify a bcsel to logical-or
Oddly, this did not affect the shader where I first noticed the pattern.
That particular shader doesn't get its if-statement converted to a bcsel
because there are two assignments in the else-statement.  This led to me
submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747.

shader-db results:

Sandy Bridge
total instructions in shared programs: 8467384 -> 8467069 (-0.00%)
instructions in affected programs: 36594 -> 36279 (-0.86%)
helped: 46
HURT: 0

total cycles in shared programs: 117573448 -> 117568518 (-0.00%)
cycles in affected programs: 339114 -> 334184 (-1.45%)
helped: 46
HURT: 0

Ivy Bridge / Haswell / Broadwell / Skylake:
total instructions in shared programs: 7774258 -> 7773999 (-0.00%)
instructions in affected programs: 30874 -> 30615 (-0.84%)
helped: 46
HURT: 0

total cycles in shared programs: 65739190 -> 65734530 (-0.01%)
cycles in affected programs: 180380 -> 175720 (-2.58%)
helped: 45
HURT: 1

No change on G45 or Ironlake.

I also tried these expressions, but none of them affected any shaders in
shader-db:

   (('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)),
   (('bcsel', a, 'b@bool', False),    ('iand', a, b)),
   (('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)),

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-31 14:59:36 -07:00
Matt Turner 05ee6627d6 nir: Fix typo from commit 6702f1acde. 2016-03-30 19:18:35 -07:00
Matt Turner 6702f1acde nir: Propagate negates up multiplication chains.
total instructions in shared programs: 7112159 -> 7088092 (-0.34%)
instructions in affected programs: 1374915 -> 1350848 (-1.75%)
helped: 7392
HURT: 621

GAINED: 2
LOST:   2
2016-03-30 13:12:34 -07:00
Jason Ekstrand 0dbda153aa nir/algebraic: Flag inexact optimizations
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe.  This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact".  It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:02 -07:00
Jason Ekstrand ed3a029e80 nir/algebraic: Fix fmin detection to match the spec
The previous transformation got the arguments to fmin backwards.  When NaNs
are involved, the GLSL min/max aren't commutative so it matters.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:00 -07:00
Jason Ekstrand 89545b1314 nir/algebraic: Get rid of an invlid fxor optimization
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:58 -07:00
Jason Ekstrand 3a7cb6534c nir/algebraic: Allow for flagging operations as being inexact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:55 -07:00
Ian Romanick d7a25a9def nir: Don't abs slt and friends
No shader-db changes, but this is symmetric with the previous commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 2bb006af68 nir: Don't abs the result of b2f or b2i
In the results below, 2 SIMD16 shaders in Trine are lost.

G4X
total instructions in shared programs: 4012279 -> 4011108 (-0.03%)
instructions in affected programs: 116776 -> 115605 (-1.00%)
helped: 339
HURT: 0

total cycles in shared programs: 84315862 -> 84313584 (-0.00%)
cycles in affected programs: 1767232 -> 1764954 (-0.13%)
helped: 274
HURT: 81

Ironlake
total instructions in shared programs: 6399073 -> 6396998 (-0.03%)
instructions in affected programs: 218050 -> 215975 (-0.95%)
helped: 600
HURT: 0

total cycles in shared programs: 128892088 -> 128888810 (-0.00%)
cycles in affected programs: 2867452 -> 2864174 (-0.11%)
helped: 422
HURT: 137

Sandy Bridge
total instructions in shared programs: 8462174 -> 8460759 (-0.02%)
instructions in affected programs: 178529 -> 177114 (-0.79%)
helped: 596
HURT: 0

total cycles in shared programs: 117542276 -> 117534098 (-0.01%)
cycles in affected programs: 1239166 -> 1230988 (-0.66%)
helped: 369
HURT: 150

Ivy Bridge
total instructions in shared programs: 7775131 -> 7773410 (-0.02%)
instructions in affected programs: 162903 -> 161182 (-1.06%)
helped: 590
HURT: 0

total cycles in shared programs: 65759882 -> 65747268 (-0.02%)
cycles in affected programs: 1004354 -> 991740 (-1.26%)
helped: 467
HURT: 141

Haswell
total instructions in shared programs: 7107786 -> 7106327 (-0.02%)
instructions in affected programs: 140954 -> 139495 (-1.04%)
helped: 590
HURT: 0

total cycles in shared programs: 64668028 -> 64655322 (-0.02%)
cycles in affected programs: 967080 -> 954374 (-1.31%)
helped: 452
HURT: 149

LOST:   2
GAINED: 0

Broadwell
total instructions in shared programs: 8980029 -> 8978287 (-0.02%)
instructions in affected programs: 197232 -> 195490 (-0.88%)
helped: 715
HURT: 0

total cycles in shared programs: 70070448 -> 70055970 (-0.02%)
cycles in affected programs: 975724 -> 961246 (-1.48%)
helped: 471
HURT: 111

LOST:   2
GAINED: 0

Skylake
total instructions in shared programs: 9115178 -> 9113436 (-0.02%)
instructions in affected programs: 203012 -> 201270 (-0.86%)
helped: 715
HURT: 0

total cycles in shared programs: 68848660 -> 68834004 (-0.02%)
cycles in affected programs: 993888 -> 979232 (-1.47%)
helped: 473
HURT: 116

LOST:   2
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 348e5a71d8 nir: Simplify 0 < fabs(a)
Sandy Bridge / Ivy Bridge / Haswell
total instructions in shared programs: 8462180 -> 8462174 (-0.00%)
instructions in affected programs: 564 -> 558 (-1.06%)
helped: 6
HURT: 0

total cycles in shared programs: 117542462 -> 117542276 (-0.00%)
cycles in affected programs: 9768 -> 9582 (-1.90%)
helped: 12
HURT: 0

Broadwell / Skylake
total instructions in shared programs: 8980833 -> 8980826 (-0.00%)
instructions in affected programs: 626 -> 619 (-1.12%)
helped: 7
HURT: 0

total cycles in shared programs: 70077900 -> 70077714 (-0.00%)
cycles in affected programs: 9378 -> 9192 (-1.98%)
helped: 12
HURT: 0

G45 and Ironlake showed no change.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:47:56 -07:00
Ian Romanick 564a8b8a26 nir: Simplify 0 >= b2f(a)
This also prevented some regressions with other patches in my local
tree.

Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0

total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0

No changes on earlier platforms.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:44:57 -07:00
Ian Romanick bf0d60aa11 nir: Simplify i2b with negated or abs operand
This enables removing ssa_201 and ssa_202 in sequences like:

                 vec1 ssa_200 = flt ssa_199, ssa_194
                 vec1 ssa_201 = b2i ssa_200
                 vec1 ssa_202 = i2b -ssa_201

shader-db results:

Sandy Bridge
total instructions in shared programs: 8462257 -> 8462180 (-0.00%)
instructions in affected programs: 3846 -> 3769 (-2.00%)
helped: 35
HURT: 0

total cycles in shared programs: 117542934 -> 117542462 (-0.00%)
cycles in affected programs: 20072 -> 19600 (-2.35%)
helped: 20
HURT: 1

Ivy Bridge
total instructions in shared programs: 7775252 -> 7775137 (-0.00%)
instructions in affected programs: 3645 -> 3530 (-3.16%)
helped: 35
HURT: 0

total cycles in shared programs: 65760522 -> 65760068 (-0.00%)
cycles in affected programs: 21082 -> 20628 (-2.15%)
helped: 25
HURT: 2

Haswell
total instructions in shared programs: 7108666 -> 7108589 (-0.00%)
instructions in affected programs: 3253 -> 3176 (-2.37%)
helped: 35
HURT: 0

total cycles in shared programs: 64675726 -> 64675272 (-0.00%)
cycles in affected programs: 21034 -> 20580 (-2.16%)
helped: 26
HURT: 1

Broadwell / Skylake
total instructions in shared programs: 8980912 -> 8980835 (-0.00%)
instructions in affected programs: 3223 -> 3146 (-2.39%)
helped: 35
HURT: 0

total cycles in shared programs: 70077926 -> 70077904 (-0.00%)
cycles in affected programs: 21886 -> 21864 (-0.10%)
helped: 21
HURT: 6

G45 and Ironlake showed no change.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:43:28 -07:00
Ian Romanick a4079f1cb2 nir: Lower flrp with Boolean interpolator to bcsel
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse.  On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.

shader-db results:

G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1

total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53

Unsurprisingly, no changes on later platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:42:42 -07:00
Matt Turner 905ff86198 nir: Recognize open-coded extract_u16.
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner 76289fbfa8 nir: Recognize open-coded extract_u8.
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:

   float((val >> 16u) & 0xffu)
   float((val >>  8u) & 0xffu)
   float((val >>  0u) & 0xffu)

Instead of shifting, masking, and type converting like this:

   shr(8)          g15<1>UD        g25<8,8,1>UD    0x00000010UD
   and(8)          g16<1>UD        g15<8,8,1>UD    0x000000ffUD
   mov(8)          g17<1>F         g16<8,8,1>UD

   shr(8)          g18<1>UD        g25<8,8,1>UD    0x00000008UD
   and(8)          g19<1>UD        g18<8,8,1>UD    0x000000ffUD
   mov(8)          g20<1>F         g19<8,8,1>UD

   and(8)          g21<1>UD        g25<8,8,1>UD    0x000000ffUD
   mov(8)          g22<1>F         g21<8,8,1>UD

i965 can simply extract a byte and convert to float in a single
instruction:

   mov(8)          g17<1>F         g25.2<32,8,4>UB
   mov(8)          g20<1>F         g25.1<32,8,4>UB
   mov(8)          g22<1>F         g25.0<32,8,4>UB

This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.

   instructions in affected programs: 28568 -> 27450 (-3.91%)
   helped: 7

   cycles in affected programs: 210076 -> 203144 (-3.30%)
   helped: 7

This patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4520 -> 4374 instructions (-3.23%)
 #1706: 3752 -> 3582 instructions (-4.53%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner 371c4b3c48 nir: Recognize open-coded bitfield_reverse.
Helps 11 shaders in UnrealEngine4 demos.

I seriously hope they would have given us bitfieldReverse() if we
exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that
anyway?).

instructions in affected programs: 4875 -> 4633 (-4.96%)
cycles in affected programs: 270516 -> 244516 (-9.61%)

I suspect there's a *lot* of room to improve nir_search/opt_algebraic's
handling of this. We'd actually like to match, e.g., step2 by matching
step1 once and then doing a pointer comparison for the second instance
of step1, but unfortunately we generate an enormous tuple for instead.

The .text size increases by 6.5% and the .data by 17.5%.

   text     data  bss    dec    hex  filename
  22957    45224    0  68181  10a55  nir_libnir_la-nir_opt_algebraic.o
  24461    53160    0  77621  12f35  nir_libnir_la-nir_opt_algebraic.o

I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is
in a GL 4.0 context once we expose GL 4.0.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 21:20:58 -08:00
Matt Turner a8f0960816 nir: Recognize product of open-coded pow()s.
Prevents regressions in the next commit.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner 9f02e3ab03 nir: Add opt_algebraic rules for xor with zero.
instructions in affected programs: 668 -> 664 (-0.60%)
helped: 4

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner 955d052058 nir: Add lowering support for unpacking opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 9b8786eba9 nir: Add lowering support for packing opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 68f8c5730b nir: Add opcodes to extract bytes or words.
The uint versions zero extend while the int versions sign extend.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Emil Velikov a39a8fbbaa nir: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:30 +00:00
Renamed from src/glsl/nir/nir_opt_algebraic.py (Browse further)