KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Emil Velikov	a39a8fbbaa	nir: move to compiler/ Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>	2016-01-26 16:08:30 +00:00
Matt Turner	b82e26a6a4	nir: Lower bitfield_extract. The OpenGL specifications for bitfieldExtract() says: The result will be undefined if <offset> or <bits> is negative, or if the sum of <offset> and <bits> is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit adds ubfe/ibfe operations from SM5 and a lowering pass for bitfield_extract to to handle the trivial case of <bits> = 32 as bitfieldExtract: bits > 31 ? value : bfe(value, offset, bits) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>	2016-01-14 09:28:01 -08:00
Matt Turner	15640ee77a	nir: Handle <bits>=32 case in bitfield_insert lowering. The OpenGL specifications for bitfieldInsert() says: The result will be undefined if <offset> or <bits> is negative, or if the sum of <offset> and <bits> is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 bfi opcode is specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit fixes the lowering of bitfield_insert to handle the trivial case of <bits> = 32 as bitfieldInsert: bits > 31 ? insert : bfi(bfm(bits, offset), insert, base) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2 ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>	2016-01-14 09:27:52 -08:00
Jason Ekstrand	d00abcc283	nir/algebraic: Add more lowering This commit adds lowering options for the following opcodes: - nir_op_fmod - nir_op_bitfield_insert - nir_op_uadd_carry - nir_op_usub_borrow Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-01-07 16:14:38 -08:00
Kenneth Graunke	7295f4fcc2	nir: Add a lower_fdiv option, turn fdiv into fmul/frcp. The nir_opt_algebraic rule (('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))), can produce new fdiv operations, which need to be lowered on i965, as we don't actually implement fdiv. (Normally, we handle this in GLSL IR's lower_instructions pass, but in the above case we introduce an fdiv after that point. So, make NIR do it for us.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2016-01-05 19:22:11 -08:00
Kristian Høgsberg Kristensen	f9283f2668	nir: Teach nir_opt_algebraic about adding and subtracting the same thing This optimizes a + b - b to just a. Modest shader-db results (BDW): total instructions in shared programs: 7842452 -> 7841862 (-0.01%) instructions in affected programs: 61938 -> 61348 (-0.95%) total loops in shared programs: 2131 -> 2131 (0.00%) helped: 263 HURT: 0 GAINED: 0 LOST: 0 but the optimization turns gl_VertexID - gl_BaseVertexARB into just a reference to SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, which the i965 hardware supports natively. That means we can avoid using the internal vertex buffer for gl_BaseVertexARB in this case. Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-12-29 10:39:25 -08:00
Matt Turner	3a7f95b3aa	nir: Optimize useless comparisons against true/false. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v1] Reviewed-by: Eric Anholt <eric@anholt.net> [v1] v2: Move new rule to Boolean simplification section Add a a@bool != true simplification Suggested-by: Neil Roberts <neil@linux.intel.com>	2015-12-08 15:41:08 -08:00
Eric Anholt	5b2fb138bc	nir: Add opcodes for saturated vector math. This corresponds to instructions used on vc4 for its blending inside of shaders. I've seen these opcodes on other architectures before, but I think it's the first time these are needed in Mesa. v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review by jekstrand)	2015-10-23 18:11:21 +01:00
Jason Ekstrand	e5a9346d00	nir: Add fdph and fdph_replicated opcodes Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-09-22 20:37:35 -07:00
Rob Clark	d9efe40dc9	nir: add lowering for ffract Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-09-16 08:27:36 -04:00
Jason Ekstrand	47739c7df4	nir: Add a fdot instruction that replicates the result to a vec4 Fortunately, nir_constant_expr already auto-splats if "dst" never shows up in the constant expression field so we don't need to do anything there. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2015-09-15 12:38:48 -07:00
Thomas Helland	49d0a36bd6	nir: Simplify feq(fneg(a), a)) -> feq(a, 0.0) The positive and negative value of a float can only be equal to each other if it is -0.0f and 0.0f. This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf This gives no changes in my shader-db Signed-off-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-08-18 11:34:44 -07:00
Thomas Helland	a39167d594	nir: Simplify fne(fneg(a), a) -> fne(a, 0.0) -NaN != NaN, and -Inf != Inf, so this should be safe. Found while working on my VRP pass. Shader-db results on my IVB: total instructions in shared programs: 1698267 -> 1698067 (-0.01%) instructions in affected programs: 15785 -> 15585 (-1.27%) helped: 36 HURT: 0 GAINED: 0 LOST: 0 Some shaders was found to have the following pattern in NIR: vec1 ssa_26 = fneg ssa_21 vec1 ssa_27 = fne ssa_21, ssa_26 Make that: vec1 ssa_27 = fne ssa_21, 0.0f This is found in Dota2 and Brutal Legend. One shader is cut by 8%, from 323 -> 296 instructons in SIMD8 Signed-off-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-08-18 11:34:44 -07:00
Eric Anholt	a70f63ab20	nir: Add algebraic opt for no-op iand. I lazily generated some of these in VC4 NIR lowering. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-08-04 17:19:25 -07:00
Kenneth Graunke	6026f7e8fb	nir: Recognize max(min(a, 1.0), 0.0) as fsat(a). We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected this variant (which is also handled by the GLSL IR pass). shader-db results on Broadwell: total instructions in shared programs: 7363046 -> 7362788 (-0.00%) instructions in affected programs: 11928 -> 11670 (-2.16%) helped: 64 HURT: 0 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-06-25 02:12:32 -07:00
Matt Turner	5614bcc416	nir: Remove sRGB colorspace conversion round-trip. Some shaders in Civilization V and Beyond Earth do pow(pow(x, 2.2), 0.454545) which is converting to and from sRGB colorspace. A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c) actually regresses two shaders in Sun Temple in which the result of the inner pow is used twice, once by another pow and once by another instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more general pattern would have still left us with a pow, and I'm 2.2 * 0.454545 percent sure that's not what they want. instructions in affected programs: 934 -> 886 (-5.14%) helped: 16	2015-05-22 11:26:36 -07:00
Ian Romanick	3bdbc1e436	nir: Delete all traces of nir_op_flog Nothing produces it, and nothing can consume it. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-08 12:12:54 -07:00
Ian Romanick	e0a17f6e31	nir: Delete all traces of nir_op_fexp Nothing produces it, and nothing can consume it. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-08 12:12:54 -07:00
Matt Turner	8e029105c2	nir: Allow feq/fne/ieq/ine to be optimized with inot. instructions in affected programs: 380 -> 376 (-1.05%) helped: 2 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	f5cf74d8ba	nir: Recognize (a < c \|\| b < c) as min(a, b) < c. ... and (a >= c) \|\| (b >= c) as max(a, b) >= c. Similar to commit `97e6c1b9`. total instructions in shared programs: 6182276 -> 6182180 (-0.00%) instructions in affected programs: 6400 -> 6304 (-1.50%) helped: 68 HURT: 4 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	ceb8b739ce	nir: Recognize trivial min/max. No changes, but does prevent some regressions in the next commit. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	8ae559971a	nir: Recognize i2b(b2i(x)) as x. Helps the same set of programs as the previous commit. instructions in affected programs: 4490 -> 4346 (-3.21%) helped: 8 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	74697e2844	nir: Recognize imul(b2i(a), b2i(b)) as a logical AND. Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16 because we avoid an integer multiplication. instructions in affected programs: 2353 -> 2245 (-4.59%) helped: 4 GAINED: 4 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	f251ea393b	nir: Transform pow(x, 4) into (xx)(x*x).	2015-04-24 11:39:01 -07:00
Ian Romanick	bc672e261c	nir: Fix typo in "ushr by 0" algebraic replacement Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "10.5" <mesa-stable@lists.freedestkop.org>	2015-04-14 16:41:04 -07:00
Rob Clark	58add76791	nir: split out lower_sub from lower_negate Originally you had to have one or the other. But actually I don't want either. (Or rather I want whatever is the minimum # of instructions.) TODO: not sure where the best place to insert a check that driver hasn't set both lower_negate and lower_sub? Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 10:43:16 -04:00
Rob Clark	6829d76e02	nir: add option to lower slt/sge/seq/sne In freedreno these get implemented as the matching f* instruction plus a u2f to convert the result to float 1.0/0.0. But less lines of code to just let nir_opt_algebraic handle this for us, plus opens up some small window for other opt passes to improve (ie. if some shader ended up with both a flt and slt with same src args, for example). v2: use b2f rather than u2f Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-05 08:56:24 -04:00
Matt Turner	781badee7a	nir: Remove useless ftrunc inside f2i/f2u. No shader-db changes, probably because they're all removed by the GLSL compiler optimization added in commit `69ad5fd4`. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	97e6c1b957	nir: Recognize (a < b \|\| a < c) as a < max(b, c). Doesn't work for analogous && cases, because of NaNs. total instructions in shared programs: 6195712 -> 6194829 (-0.01%) instructions in affected programs: 42000 -> 41117 (-2.10%) helped: 403 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	a2b6e908cf	nir: Add addition/multiplication identities of exp/log. instructions in affected programs: 2858 -> 2808 (-1.75%) helped: 12 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	099c729b4c	nir: Add identities for the log function. The rcp(log(x)) pattern affects instruction counts. instructions in affected programs: 144 -> 138 (-4.17%) helped: 6 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	8a6ae384b2	nir: Add identities for the exponential function. No changes in shader-db. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	e26783d445	nir: Recognize another open coded lrp. total instructions in shared programs: 6195924 -> 6195768 (-0.00%) instructions in affected programs: 4876 -> 4720 (-3.20%) helped: 58 HURT: 10 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	e82437e141	nir: Recognize open coded lrp. total instructions in shared programs: 6197614 -> 6195924 (-0.03%) instructions in affected programs: 34773 -> 33083 (-4.86%) helped: 147 HURT: 6 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Jason Ekstrand	e06a3d0282	nir: Move the compare-with-zero optimizations to the late section total instructions in shared programs: 4422307 -> 4422363 (0.00%) instructions in affected programs: 4230 -> 4286 (1.32%) helped: 0 HURT: 12 While this does hurt some things, the losses are minor and it prevents the compare-with-zero optimization from fighting with ffma which is much more important. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Jason Ekstrand	da294f9b2f	nir/algebraic: Add a seperate section for "late" optimizations i965/nir: Use the late optimizations Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Jason Ekstrand	1779dc060f	nir/algebraic: Remove a duplicate optimization This optimization is repeated verbatim above Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Eric Anholt	15b03b7964	nir: Recognize a pattern of bool frobbing from TGSI KILL_IF. TGSI's conditional discards take float arg and negate it, so GLSL to TGSI generates a b2f and negates that value. Only, in NIR we want a proper bool once again, so we compare with 0. This is a lot of pointless extra instructions. total instructions in shared programs: 39735 -> 39702 (-0.08%) instructions in affected programs: 1342 -> 1309 (-2.46%) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-01 10:57:01 -07:00
Eric Anholt	6e8d4a2f80	nir: Recognize a pattern for doing b2f without the opcode. Since we have patterns based on b2f, generate them if we see the b2f equivalent using an iand. This is common when generating NIR from TGSI. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-01 10:57:01 -07:00
Kenneth Graunke	bf2c3bc316	nir: Lower subtraction to add with negation when !lower_negate. prog->nir will generate fsub opcodes, but i965 doesn't implement them. We may as well lower them at the NIR level, since it's trivial to do. Suggested by Connor Abbott. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-03-27 21:16:34 -07:00
Eric Anholt	afa9fc1561	nir: Add optional lowering of flrp. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-27 13:29:48 -07:00
Matt Turner	3fb56805f0	nir: Recognize sat(add(b2f(a), b2f(b))) as a logical OR. Transform this into b2f(or(a, b)). instructions in affected programs: 432 -> 430 (-0.46%) helped: 2 Acked-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-24 14:43:37 -07:00
Matt Turner	c31158d2cb	nir: Recognize mul(b2f(a), b2f(b)) as a logical AND. Transform this into b2f(and(a, b)). total instructions in shared programs: 6205448 -> 6204391 (-0.02%) instructions in affected programs: 284030 -> 282973 (-0.37%) helped: 903 HURT: 6 Acked-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-24 14:43:37 -07:00
Thomas Helland	8fb8fe46fa	nir: Optimize a + neg(a) Shader-db i965 instructions: total instructions in shared programs: 1711180 -> 1711159 (-0.00%) instructions in affected programs: 825 -> 804 (-2.55%) helped: 9 HURT: 0 GAINED: 3 LOST: 3 Shader-db NIR instructions: total instructions in shared programs: 606187 -> 606179 (-0.00%) instructions in affected programs: 298 -> 290 (-2.68%) helped: 4 HURT: 0 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Thomas Helland <thomashelland90@gmail.com>	2015-03-11 14:21:05 -07:00
Thomas Helland	0525f2e851	nir: Optimize (ab)+(ac) -> a*(b+c) Shader-db i965 instructions: total instructions in shared programs: 1715894 -> 1710802 (-0.30%) instructions in affected programs: 443080 -> 437988 (-1.15%) helped: 1502 HURT: 13 GAINED: 4 LOST: 4 Shader-db NIR instructions: total instructions in shared programs: 607710 -> 606187 (-0.25%) instructions in affected programs: 208285 -> 206762 (-0.73%) helped: 769 HURT: 8 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Thomas Helland <thomashelland90@gmail.com>	2015-03-11 14:21:05 -07:00
Eric Anholt	4359954d84	nir: Generalize the optimization of subs of subs from 0. I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above, but we can generalize it and make it more potentially useful. In the specific original case of a 0 for our new 'a' argument, it'll get further algebraic optimization once the 0 is an argument to the new add. No shader-db effects. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	345c2b288a	nir: Collapse repeated bcsels on the same argument. vc4 results: total instructions in shared programs: 39881 -> 39794 (-0.22%) instructions in affected programs: 6302 -> 6215 (-1.38%) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	a38038ca5e	nir: When faced with a csel on !condition, just flip the arguments. total NIR instructions in shared programs: 39426 -> 39411 (-0.04%) NIR instructions in affected programs: 3748 -> 3733 (-0.40%) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	dc982f4a85	nir: Add a couple of simplifications of csel operations. vc4 was already cleaning these up, but it does shave 4 NIR instructions in shader-db. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	6eadde51bb	nir: Recognize and reduce duplicated fsats. No effect on vc4 shader-db. v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:51 -08:00

1 2

69 Commits