Commit Graph

69 Commits

Author SHA1 Message Date
Emil Velikov a39a8fbbaa nir: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:30 +00:00
Matt Turner b82e26a6a4 nir: Lower bitfield_extract.
The OpenGL specifications for bitfieldExtract() says:

   The result will be undefined if <offset> or <bits> is negative, or if
   the sum of <offset> and <bits> is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.

This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as

   bitfieldExtract:
      bits > 31 ? value : bfe(value, offset, bits)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-14 09:28:01 -08:00
Matt Turner 15640ee77a nir: Handle <bits>=32 case in bitfield_insert lowering.
The OpenGL specifications for bitfieldInsert() says:

   The result will be undefined if <offset> or <bits> is negative, or if
   the sum of <offset> and <bits> is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.

This commit fixes the lowering of bitfield_insert to handle the trivial
case of <bits> = 32 as

   bitfieldInsert:
      bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-14 09:27:52 -08:00
Jason Ekstrand d00abcc283 nir/algebraic: Add more lowering
This commit adds lowering options for the following opcodes:

 - nir_op_fmod
 - nir_op_bitfield_insert
 - nir_op_uadd_carry
 - nir_op_usub_borrow

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-07 16:14:38 -08:00
Kenneth Graunke 7295f4fcc2 nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.
The nir_opt_algebraic rule

(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),

can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv.  (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point.  So, make NIR do it for us.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-05 19:22:11 -08:00
Kristian Høgsberg Kristensen f9283f2668 nir: Teach nir_opt_algebraic about adding and subtracting the same thing
This optimizes a + b - b to just a. Modest shader-db results (BDW):

  total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
  instructions in affected programs:     61938 -> 61348 (-0.95%)
  total loops in shared programs:        2131 -> 2131 (0.00%)
  helped:                                263
  HURT:                                  0
  GAINED:                                0
  LOST:                                  0

but the optimization turns

  gl_VertexID - gl_BaseVertexARB

into just a reference to SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, which the
i965 hardware supports natively. That means we can avoid using the
internal vertex buffer for gl_BaseVertexARB in this case.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-29 10:39:25 -08:00
Matt Turner 3a7f95b3aa nir: Optimize useless comparisons against true/false.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]

v2: Move new rule to Boolean simplification section
    Add a a@bool != true simplification

Suggested-by: Neil Roberts <neil@linux.intel.com>
2015-12-08 15:41:08 -08:00
Eric Anholt 5b2fb138bc nir: Add opcodes for saturated vector math.
This corresponds to instructions used on vc4 for its blending inside of
shaders.  I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.

v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
    by jekstrand)
2015-10-23 18:11:21 +01:00
Jason Ekstrand e5a9346d00 nir: Add fdph and fdph_replicated opcodes
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Rob Clark d9efe40dc9 nir: add lowering for ffract
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 08:27:36 -04:00
Jason Ekstrand 47739c7df4 nir: Add a fdot instruction that replicates the result to a vec4
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Thomas Helland 49d0a36bd6 nir: Simplify feq(fneg(a), a)) -> feq(a, 0.0)
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-18 11:34:44 -07:00
Thomas Helland a39167d594 nir: Simplify fne(fneg(a), a) -> fne(a, 0.0)
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.

Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs:     15785 -> 15585 (-1.27%)
helped:                                36
HURT:                                  0
GAINED:                                0
LOST:                                  0

Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26

Make that:
vec1 ssa_27 = fne ssa_21, 0.0f

This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-18 11:34:44 -07:00
Eric Anholt a70f63ab20 nir: Add algebraic opt for no-op iand.
I lazily generated some of these in VC4 NIR lowering.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-08-04 17:19:25 -07:00
Kenneth Graunke 6026f7e8fb nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).

shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs:     11928 -> 11670 (-2.16%)
helped:                                64
HURT:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-06-25 02:12:32 -07:00
Matt Turner 5614bcc416 nir: Remove sRGB colorspace conversion round-trip.
Some shaders in Civilization V and Beyond Earth do

   pow(pow(x, 2.2), 0.454545)

which is converting to and from sRGB colorspace.

A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.

instructions in affected programs:     934 -> 886 (-5.14%)
helped:                                16
2015-05-22 11:26:36 -07:00
Ian Romanick 3bdbc1e436 nir: Delete all traces of nir_op_flog
Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick e0a17f6e31 nir: Delete all traces of nir_op_fexp
Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Matt Turner 8e029105c2 nir: Allow feq/fne/ieq/ine to be optimized with inot.
instructions in affected programs:     380 -> 376 (-1.05%)
helped:                                2

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner f5cf74d8ba nir: Recognize (a < c || b < c) as min(a, b) < c.
... and (a >= c) || (b >= c) as max(a, b) >= c.

Similar to commit 97e6c1b9.

total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
instructions in affected programs:     6400 -> 6304 (-1.50%)
helped:                                68
HURT:                                  4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner ceb8b739ce nir: Recognize trivial min/max.
No changes, but does prevent some regressions in the next commit.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner 8ae559971a nir: Recognize i2b(b2i(x)) as x.
Helps the same set of programs as the previous commit.

instructions in affected programs:     4490 -> 4346 (-3.21%)
helped:                                8

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner 74697e2844 nir: Recognize imul(b2i(a), b2i(b)) as a logical AND.
Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16
because we avoid an integer multiplication.

instructions in affected programs:     2353 -> 2245 (-4.59%)
helped:                                4
GAINED:                                4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner f251ea393b nir: Transform pow(x, 4) into (x*x)*(x*x). 2015-04-24 11:39:01 -07:00
Ian Romanick bc672e261c nir: Fix typo in "ushr by 0" algebraic replacement
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "10.5" <mesa-stable@lists.freedestkop.org>
2015-04-14 16:41:04 -07:00
Rob Clark 58add76791 nir: split out lower_sub from lower_negate
Originally you had to have one or the other.  But actually I don't want
either.  (Or rather I want whatever is the minimum # of instructions.)

TODO: not sure where the best place to insert a check that driver hasn't
set *both* lower_negate and lower_sub?

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 10:43:16 -04:00
Rob Clark 6829d76e02 nir: add option to lower slt/sge/seq/sne
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0.  But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).

v2: use b2f rather than u2f

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-05 08:56:24 -04:00
Matt Turner 781badee7a nir: Remove useless ftrunc inside f2i/f2u.
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 97e6c1b957 nir: Recognize (a < b || a < c) as a < max(b, c).
Doesn't work for analogous && cases, because of NaNs.

total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs:     42000 -> 41117 (-2.10%)
helped:                                403

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner a2b6e908cf nir: Add addition/multiplication identities of exp/log.
instructions in affected programs:     2858 -> 2808 (-1.75%)
helped:                                12

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 099c729b4c nir: Add identities for the log function.
The rcp(log(x)) pattern affects instruction counts.

instructions in affected programs:     144 -> 138 (-4.17%)
helped:                                6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 8a6ae384b2 nir: Add identities for the exponential function.
No changes in shader-db.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner e26783d445 nir: Recognize another open coded lrp.
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs:     4876 -> 4720 (-3.20%)
helped:                                58
HURT:                                  10

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner e82437e141 nir: Recognize open coded lrp.
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs:     34773 -> 33083 (-4.86%)
helped:                                147
HURT:                                  6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Jason Ekstrand e06a3d0282 nir: Move the compare-with-zero optimizations to the late section
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs:     4230 -> 4286 (1.32%)
helped:                                0
HURT:                                  12

While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand da294f9b2f nir/algebraic: Add a seperate section for "late" optimizations
i965/nir: Use the late optimizations

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 1779dc060f nir/algebraic: Remove a duplicate optimization
This optimization is repeated verbatim above

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Eric Anholt 15b03b7964 nir: Recognize a pattern of bool frobbing from TGSI KILL_IF.
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value.  Only, in NIR we want a proper
bool once again, so we compare with 0.  This is a lot of pointless extra
instructions.

total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs:     1342 -> 1309 (-2.46%)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt 6e8d4a2f80 nir: Recognize a pattern for doing b2f without the opcode.
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand.  This is common when generating NIR from TGSI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Kenneth Graunke bf2c3bc316 nir: Lower subtraction to add with negation when !lower_negate.
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.

Suggested by Connor Abbott.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:34 -07:00
Eric Anholt afa9fc1561 nir: Add optional lowering of flrp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-27 13:29:48 -07:00
Matt Turner 3fb56805f0 nir: Recognize sat(add(b2f(a), b2f(b))) as a logical OR.
Transform this into b2f(or(a, b)).

instructions in affected programs:     432 -> 430 (-0.46%)
helped:                                2

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-24 14:43:37 -07:00
Matt Turner c31158d2cb nir: Recognize mul(b2f(a), b2f(b)) as a logical AND.
Transform this into b2f(and(a, b)).

total instructions in shared programs: 6205448 -> 6204391 (-0.02%)
instructions in affected programs:     284030 -> 282973 (-0.37%)
helped:                                903
HURT:                                  6

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-24 14:43:37 -07:00
Thomas Helland 8fb8fe46fa nir: Optimize a + neg(a)
Shader-db i965 instructions:
total instructions in shared programs: 1711180 -> 1711159 (-0.00%)
instructions in affected programs:     825 -> 804 (-2.55%)
helped:                                9
HURT:                                  0
GAINED:                                3
LOST:                                  3

Shader-db NIR instructions:
total instructions in shared programs: 606187 -> 606179 (-0.00%)
instructions in affected programs:     298 -> 290 (-2.68%)
helped:                                4
HURT:                                  0
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
2015-03-11 14:21:05 -07:00
Thomas Helland 0525f2e851 nir: Optimize (a*b)+(a*c) -> a*(b+c)
Shader-db i965 instructions:
total instructions in shared programs: 1715894 -> 1710802 (-0.30%)
instructions in affected programs:     443080 -> 437988 (-1.15%)
helped:                                1502
HURT:                                  13
GAINED:                                4
LOST:                                  4

Shader-db NIR instructions:
total instructions in shared programs: 607710 -> 606187 (-0.25%)
instructions in affected programs:     208285 -> 206762 (-0.73%)
helped:                                769
HURT:                                  8
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
2015-03-11 14:21:05 -07:00
Eric Anholt 4359954d84 nir: Generalize the optimization of subs of subs from 0.
I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful.  In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.

No shader-db effects.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt 345c2b288a nir: Collapse repeated bcsels on the same argument.
vc4 results:
total instructions in shared programs: 39881 -> 39794 (-0.22%)
instructions in affected programs:     6302 -> 6215 (-1.38%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt a38038ca5e nir: When faced with a csel on !condition, just flip the arguments.
total NIR instructions in shared programs: 39426 -> 39411 (-0.04%)
NIR instructions in affected programs:     3748 -> 3733 (-0.40%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt dc982f4a85 nir: Add a couple of simplifications of csel operations.
vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt 6eadde51bb nir: Recognize and reduce duplicated fsats.
No effect on vc4 shader-db.

v2: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:51 -08:00