The OpenGL specifications for bitfieldExtract() says:
The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.
Therefore passing bits=32, offset=0 is legal and defined in GLSL.
But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.
This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as
bitfieldExtract:
bits > 31 ? value : bfe(value, offset, bits)
Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
The OpenGL specifications for bitfieldInsert() says:
The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.
Therefore passing bits=32, offset=0 is legal and defined in GLSL.
But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.
This commit fixes the lowering of bitfield_insert to handle the trivial
case of <bits> = 32 as
bitfieldInsert:
bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)
Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
The nir_opt_algebraic rule
(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),
can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv. (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point. So, make NIR do it for us.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
This optimizes a + b - b to just a. Modest shader-db results (BDW):
total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
instructions in affected programs: 61938 -> 61348 (-0.95%)
total loops in shared programs: 2131 -> 2131 (0.00%)
helped: 263
HURT: 0
GAINED: 0
LOST: 0
but the optimization turns
gl_VertexID - gl_BaseVertexARB
into just a reference to SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, which the
i965 hardware supports natively. That means we can avoid using the
internal vertex buffer for gl_BaseVertexARB in this case.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
v2: Move new rule to Boolean simplification section
Add a a@bool != true simplification
Suggested-by: Neil Roberts <neil@linux.intel.com>
This corresponds to instructions used on vc4 for its blending inside of
shaders. I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.
v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
by jekstrand)
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.
Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs: 15785 -> 15585 (-1.27%)
helped: 36
HURT: 0
GAINED: 0
LOST: 0
Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26
Make that:
vec1 ssa_27 = fne ssa_21, 0.0f
This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).
shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs: 11928 -> 11670 (-2.16%)
helped: 64
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Some shaders in Civilization V and Beyond Earth do
pow(pow(x, 2.2), 0.454545)
which is converting to and from sRGB colorspace.
A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.
instructions in affected programs: 934 -> 886 (-5.14%)
helped: 16
Nothing produces it, and nothing can consume it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Nothing produces it, and nothing can consume it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
... and (a >= c) || (b >= c) as max(a, b) >= c.
Similar to commit 97e6c1b9.
total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
instructions in affected programs: 6400 -> 6304 (-1.50%)
helped: 68
HURT: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
No changes, but does prevent some regressions in the next commit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Helps the same set of programs as the previous commit.
instructions in affected programs: 4490 -> 4346 (-3.21%)
helped: 8
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16
because we avoid an integer multiplication.
instructions in affected programs: 2353 -> 2245 (-4.59%)
helped: 4
GAINED: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Originally you had to have one or the other. But actually I don't want
either. (Or rather I want whatever is the minimum # of instructions.)
TODO: not sure where the best place to insert a check that driver hasn't
set *both* lower_negate and lower_sub?
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0. But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).
v2: use b2f rather than u2f
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <eric@anholt.net>
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <eric@anholt.net>
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <mattst88@gmail.com>
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value. Only, in NIR we want a proper
bool once again, so we compare with 0. This is a lot of pointless extra
instructions.
total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs: 1342 -> 1309 (-2.46%)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand. This is common when generating NIR from TGSI.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful. In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.
No shader-db effects.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>