KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Kenneth Graunke	1762568fd3	nir: Allow vec2/vec3/vec4 instructions in the select peephole pass. These are basically just moves, so they should be safe as well. When disabling i965's GLSL IR level scalarizer (channel expressions) pass, I started seeing NIR code like this: if ssa_21 { block block_1: /* preds: block_0 / vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30 / succs: block_3 / } else { block block_2: / preds: block_0 / / succs: block_3 / } block block_3: / preds: block_1 block_2 */ vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2 Previously, the GLSL IR scalarizer pass would break the vec4 into a series of fmovs, which were allowed by the peephole pass. But with the vec4 operation, they were not. We want to keep getting selects. Normal i965 on Broadwell: instructions in affected programs: 200 -> 176 (-12.00%) helped: 4 With brw_fs_channel_expressions() disabled: instructions in affected programs: 1832 -> 1646 (-10.15%) helped: 30 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-06-22 14:08:36 -07:00
Jordan Justen	2867f2e8cd	nir: Add barrier intrinsic function Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2015-06-12 15:12:40 -07:00
Chris Forbes	e7f628c2fc	glsl: Add ir node for barrier v2: * Changes suggested by mattst88 [jordan.l.justen@intel.com: Add nir support] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>	2015-06-12 15:12:39 -07:00
Timothy Arceri	86a74e9b6b	nir: use src for ssa helper Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-06-03 06:50:39 +10:00
Timothy Arceri	5f7b8fa481	nir: remove extra semicolon Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-06-03 06:50:33 +10:00
Eduardo Lima Mitev	5b226a1242	nir: prevent use-after-free condition in should_lower_phi() lower_phis_to_scalar() pass recurses the instruction dependence graph to determine if all the sources of a given instruction are scalarizable. To prevent cycles, it temporary marks the phi instruction before recursing in, then updates the entry with the resulting value. However, it does not consider that the entry value may have changed after a recursion pass, hence causing a use-after-free situation and a crash. This patch fixes this by reloading the entry corresponding to the 'phi' after recursing and before updating its value. The crash can be reproduced ~20% of times with the dEQP test: dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-06-02 20:21:49 +02:00
Iago Toral Quiroga	2231cf0ba3	nir: Fix output swizzle in get_mul_for_src When we compute the output swizzle we want to consider the number of components in the add operation. So far we were using the writemask of the multiplication for this instead, which is not correct. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-28 18:25:37 +02:00
Matt Turner	5614bcc416	nir: Remove sRGB colorspace conversion round-trip. Some shaders in Civilization V and Beyond Earth do pow(pow(x, 2.2), 0.454545) which is converting to and from sRGB colorspace. A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c) actually regresses two shaders in Sun Temple in which the result of the inner pow is used twice, once by another pow and once by another instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more general pattern would have still left us with a pow, and I'm 2.2 * 0.454545 percent sure that's not what they want. instructions in affected programs: 934 -> 886 (-5.14%) helped: 16	2015-05-22 11:26:36 -07:00
Jason Ekstrand	2126c68e5c	nir: Get rid of the array elements parameter on load/store intrinsics Previously, we used intrinsic->const_index[1] to represent "the number of array elements to load" for load/store intrinsics. However, this set to 1 by every pass that ever creates a load/store intrinsic. Also, while it might make some sense for registers, it makes no sense whatsoever in SSA. On top of that, the i965 backend was the only backend to ever support it; freedreno and vc4 just assert that it's always 1. Let's just delete it. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-05-20 09:28:06 -07:00
Francisco Jerez	d91d6b3f03	nir: Translate memory barrier intrinsics from GLSL IR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-12 15:47:57 +03:00
Francisco Jerez	f8f8b31847	nir: Translate image load, store and atomic intrinsics from GLSL IR. v2: Undefine coordinate components not applicable to the target. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-12 15:47:57 +03:00
Francisco Jerez	6de78e6b0c	nir: Fix indexing of atomic counter arrays with a constant value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-12 15:47:57 +03:00
Francisco Jerez	f1269a3e01	nir: Add memory barrier intrinsic. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-12 15:47:57 +03:00
Francisco Jerez	d9e930997f	nir: Define image load, store and atomic intrinsics. v2: Undefine coordinate components not applicable to the target. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-12 15:47:57 +03:00
Tapani Pälli	95774ca258	nir: fix sampler lowering pass for arrays This fixes bugs with special cases where we have arrays of structures containing samplers or arrays of samplers. I've verified that patch results in calculating same index value as returned by _mesa_get_sampler_uniform_value for IR. Patch makes following ES3 conformance test pass: ES3-CTS.shaders.struct.uniform.sampler_array_fragment v2: remove unnecessary comment (Topi) simplify changes and the overall code (Jason) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114	2015-05-12 14:28:16 +03:00
Kenneth Graunke	d6fb155f30	nir: Fix aggressive typos in nir_from_ssa.c. s/agressive/aggressive/g Trivial.	2015-05-08 19:38:14 -07:00
Jason Ekstrand	fb5f411248	nir/search: Save/restore the variables_seen bitmask when matching Shader-db results on Broadwell: total instructions in shared programs: 7152330 -> 7137006 (-0.21%) instructions in affected programs: 1330548 -> 1315224 (-1.15%) helped: 5797 HURT: 76 GAINED: 0 LOST: 8 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:29:15 -07:00
Jason Ekstrand	e0cfe59c37	nir/search: Assert that variable id's are in range Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:29:15 -07:00
Jason Ekstrand	13facfbd5b	nir/search: handle explicitly sized sources in match_value Previously, this case was being handled in match_expression prior to calling match_value. However, there is really no good reason for this given that match_value has all of the information it needs. Also, they weren't being handled properly in the commutative case and putting it in match_value gives us that for free. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:29:14 -07:00
Jason Ekstrand	f752effa08	nir/nir: Use a linked list instead of a hash set for use/def sets This commit switches us from the current setup of using hash sets for use/def sets to using linked lists. Doing so should save us quite a bit of memory because we aren't carrying around 3 hash sets per register and 2 per SSA value. It should also save us CPU time because adding/removing things from use/def sets is 4 pointer manipulations instead of a hash lookup. Running shader-db 50 times with USE_NIR=0, NIR, and NIR + use/def lists: GLSL IR Only: 586.4 +/- 1.653833 NIR with hash sets: 675.4 +/- 2.502108 NIR + use/def lists: 641.2 +/- 1.557043 I also ran a memory usage experiment with Ken's patch to delete GLSL IR and keep NIR. This patch cuts an aditional 42.9 MiB of ralloc'd memory over and above what we gained by deleting the GLSL IR on the same dota trace. On the code complexity side of things, some things are now much easier and others are a bit harder. One of the operations we perform constantly in optimization passes is to replace one source with another. Due to the fact that an instruction can use the same SSA value multiple times, we had to iterate through the sources of the instruction and determine if the use we were replacing was the only one before removing it from the set of uses. With this patch, uses are per-source not per-instruction so we can just remove it safely. On the other hand, trying to iterate over all of the instructions that use a given value is more difficult. Fortunately, the two places we do that are the ffma peephole where it doesn't matter and GCM where we already gracefully handle duplicates visits to an instruction. Another aspect here is that using linked lists in this way can be tricky to get right. With sets, things were quite forgiving and the worst that happened if you didn't properly remove a use was that it would get caught in the validator. With linked lists, it can lead to linked list corruption which can be harder to track. However, we do just as much validation of the linked lists as we did of the sets so the validator should still catch these problems. While working on this series, the vast majority of the bugs I had to fix were caught by assertions. I don't think the lists are going to be that much worse than the sets. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:16:13 -07:00
Jason Ekstrand	ecc2cfc8b6	nir: Use nir_instr_rewrite_src in copy propagation We were rolling our own rewrite_src variant in copy-propagation. Let's stop doing that and use the ones in core NIR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:16:13 -07:00
Jason Ekstrand	f72a8d1cf0	nir: Add a function for rewriting the condition of an if statement Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:16:13 -07:00
Jason Ekstrand	300d729436	nir: Add and use initializer #defines for nir_src and nir_dest Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:16:13 -07:00
Jason Ekstrand	6702ebce57	nir: Modernize the out-of-SSA pass The out-of-SSA pass was one of the first passes written when getting SSA up-and-going (for obvious reasons). As such, it came before a lot of the nifty SSA-based helpers were introduced. This commit modernizes it so that we're no longer doing nearly as much manual banging on use/def sets. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:16:13 -07:00
Jason Ekstrand	7ee0216e2d	nir/validate: Validate SSA def parent instructions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-05-08 17:16:13 -07:00
Ian Romanick	3bdbc1e436	nir: Delete all traces of nir_op_flog Nothing produces it, and nothing can consume it. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-08 12:12:54 -07:00
Ian Romanick	ad51f9b421	nir: Don't produce nir_op_flog from GLSL IR All paths that produce GLSL IR for NIR lower ir_unop_log. All paths that consume NIR will explode if they geta nir_op_flog. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-08 12:12:54 -07:00
Ian Romanick	e0a17f6e31	nir: Delete all traces of nir_op_fexp Nothing produces it, and nothing can consume it. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-08 12:12:54 -07:00
Ian Romanick	a45d55f17c	nir: Don't produce nir_op_fexp from GLSL IR All paths that produce GLSL IR for NIR lower ir_unop_exp. All paths that consume NIR will explode if they geta nir_op_fexp. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-05-08 12:12:54 -07:00
Matt Turner	8e029105c2	nir: Allow feq/fne/ieq/ine to be optimized with inot. instructions in affected programs: 380 -> 376 (-1.05%) helped: 2 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	f5cf74d8ba	nir: Recognize (a < c \|\| b < c) as min(a, b) < c. ... and (a >= c) \|\| (b >= c) as max(a, b) >= c. Similar to commit `97e6c1b9`. total instructions in shared programs: 6182276 -> 6182180 (-0.00%) instructions in affected programs: 6400 -> 6304 (-1.50%) helped: 68 HURT: 4 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	ceb8b739ce	nir: Recognize trivial min/max. No changes, but does prevent some regressions in the next commit. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	8ae559971a	nir: Recognize i2b(b2i(x)) as x. Helps the same set of programs as the previous commit. instructions in affected programs: 4490 -> 4346 (-3.21%) helped: 8 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Matt Turner	74697e2844	nir: Recognize imul(b2i(a), b2i(b)) as a logical AND. Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16 because we avoid an integer multiplication. instructions in affected programs: 2353 -> 2245 (-4.59%) helped: 4 GAINED: 4 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-05-07 10:51:05 -07:00
Zoë Blade	05e7f7f438	Fix a few typos Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-04-27 17:28:29 +03:00
Matt Turner	f251ea393b	nir: Transform pow(x, 4) into (xx)(x*x).	2015-04-24 11:39:01 -07:00
Jason Ekstrand	125574d1ef	nir/lower_source_mods: Don't propagate register sources The nir_lower_source_mods pass does a weak form of copy propagation to clean up all of the mov-with-negate's that get generated. However, we weren't properly checking that the sources were SSA and so we could end up moving a register read which is not, in general, valid. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:41 -07:00
Jason Ekstrand	296131f467	nir: Rewrite instr_rewrite_src The old code wasn't correctly handling the case where the new value of the source contains an indirect. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:41 -07:00
Jason Ekstrand	d61bd972d8	nir/locals_to_regs: Hanadle indirect accesses of length-1 arrays Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:41 -07:00
Jason Ekstrand	06f3c98b9d	nir/locals_to_regs: Initialize registers with constant initializers Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:41 -07:00
Jason Ekstrand	4e9b376594	nir/locals_to_regs: Pass around the nir_shader rather than a void * mem_ctx Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:41 -07:00
Jason Ekstrand	f50f59d3d9	nir: Add a simple growing array data structure Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:41 -07:00
Jason Ekstrand	8b900e7405	nir/types: Make glsl_get_length smarter Previously, this function returned the number of elements for structures and arrays and 0 for everything else. In NIR, this is almost never what you want because we also treat matricies as arrays so you have to special-case constantly. This commit glsl_get_length treat matrices as an array of columns by returning the number of columns instead of 0 This also fixes a bug in locals_to_regs caused by not checking for the matrix case in one place. v2: Only special-case for matrices and return a length of 0 for vectors as we did before. This was needed to not break the TGSI-based drivers and doesn't really affect NIR at the moment. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Tested-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 18:10:40 -07:00
Jason Ekstrand	7e1d21edbf	nir: Move get_const_initializer_load from vars_to_ssa to NIR core Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:40 -07:00
Jason Ekstrand	ba88760202	nir/lower_vars_to_ssa: Pass around the nir_shader instead of a void mem_ctx Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:40 -07:00
Jason Ekstrand	e79120afdc	nir/print: Print the closing paren on load_const instructions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:40 -07:00
Jason Ekstrand	02f03fc0f1	nir/tex: Use the correct return size for query_levels and lod Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:40 -07:00
Jason Ekstrand	94669cb534	nir: Refactor tex_instr_dest_size to use a switch statement Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:40 -07:00
Jason Ekstrand	73cc76362d	nir/lower_vars_to_ssa: Actually look for indirects when determining aliasing Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-22 18:10:39 -07:00
Matt Turner	4dacb212fd	nir: Allow abs/neg in select peephole pass. total instructions in shared programs: 4314531 -> 4308949 (-0.13%) instructions in affected programs: 429085 -> 423503 (-1.30%) helped: 1680 HURT: 0 GAINED: 0 LOST: 111 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-17 11:01:34 -07:00
Rob Clark	e14af4c067	nir/builder: add nir_builder_insert_after_instr() For lowering if/else, I need a way to insert at the end of the previous block. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:34:15 -04:00
Ian Romanick	94aab6cde6	nir: Convert the if-test for num_inputs == 2 to an assertion Suggested by Jason on a different patch after some comments / questions by Ilia. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabott0@gmail.com>	2015-04-16 09:56:49 -07:00
Ian Romanick	4cf5ca5ca5	nir: Try commutative sources in CSE Shader-db results: GM45 NIR: total instructions in shared programs: 4082044 -> 4081919 (-0.00%) instructions in affected programs: 27609 -> 27484 (-0.45%) helped: 44 Iron Lake NIR: total instructions in shared programs: 5678776 -> 5678646 (-0.00%) instructions in affected programs: 27406 -> 27276 (-0.47%) helped: 45 Sandy Bridge NIR: total instructions in shared programs: 7329995 -> 7329096 (-0.01%) instructions in affected programs: 142035 -> 141136 (-0.63%) helped: 406 HURT: 19 Ivy Bridge NIR: total instructions in shared programs: 6769314 -> 6768359 (-0.01%) instructions in affected programs: 140820 -> 139865 (-0.68%) helped: 423 HURT: 2 Haswell NIR: total instructions in shared programs: 6183693 -> 6183298 (-0.01%) instructions in affected programs: 96538 -> 96143 (-0.41%) helped: 303 HURT: 4 Broadwell NIR: total instructions in shared programs: 7501711 -> 7498170 (-0.05%) instructions in affected programs: 266403 -> 262862 (-1.33%) helped: 705 HURT: 5 GAINED: 4 v2: Rebase on top of Connor's fix. v3: Convert the if-test for num_inputs == 2 to an assertion. Suggested by Jason after some comments / questions by Ilia. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1] Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Connor Abbott <cwabbott0@gmail.com>	2015-04-15 18:15:59 -07:00
Ian Romanick	bc672e261c	nir: Fix typo in "ushr by 0" algebraic replacement Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "10.5" <mesa-stable@lists.freedestkop.org>	2015-04-14 16:41:04 -07:00
Ian Romanick	67a8610caf	nir: Silence unused parameter warnings nir/nir.h: In function 'nir_validate_shader': nir/nir.h:1567:56: warning: unused parameter 'shader' [-Wunused-parameter] static inline void nir_validate_shader(nir_shader shader) { } ^ nir/nir_opt_cse.c: In function 'src_is_ssa': nir/nir_opt_cse.c:165:32: warning: unused parameter 'data' [-Wunused-parameter] src_is_ssa(nir_src src, void data) ^ nir/nir_opt_cse.c: In function 'dest_is_ssa': nir/nir_opt_cse.c:171:35: warning: unused parameter 'data' [-Wunused-parameter] dest_is_ssa(nir_dest dest, void *data) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-14 16:41:04 -07:00
Connor Abbott	47a1b4841d	nir/cse: fix bug with comparing non-per-component sources We weren't comparing the right number of components when checking swizzles. Use nir_ssa_alu_instr_num_src_components() to do the right thing. No piglit regressions, and no fixes either. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-14 19:07:44 -04:00
Kenneth Graunke	b3e286c457	nir: Store num_direct_uniforms in the nir_shader. Storing this here is pretty sketchy - I don't know if any driver other than i965 will want to use it. But this will make it a lot easier to generate NIR code at link time. We'll probably rework it anyway. (Ian suggested making nir_assign_var_locations_scalar_direct_first simply modify the nir_shader's fields, rather than passing pointers to them. If this stays long term, we should do that. But Jason and I suspect we'll be reworking this area again in the near future.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-11 11:39:48 -07:00
Rob Clark	f596135616	nir: fix bit of cargo-culting in lower_idiv I guess I was looking too much at how lower_system_values worked when writing lower_idiv. Since ttn wasn't emitting load_var for sysvals and the only drivers using lower_idiv were using ttn, I think nothing was broken as a result. But might as well fix this before it becomes a problem. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-11 10:43:16 -04:00
Rob Clark	58add76791	nir: split out lower_sub from lower_negate Originally you had to have one or the other. But actually I don't want either. (Or rather I want whatever is the minimum # of instructions.) TODO: not sure where the best place to insert a check that driver hasn't set both lower_negate and lower_sub? Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 10:43:16 -04:00
Kenneth Graunke	500da98e0b	nir: Constify nir_lower_sampler's gl_shader_program pointer. Now that we're not generating linker errors, we don't actually modify this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-04-10 02:16:33 -07:00
Kenneth Graunke	709b88ccd8	nir: Remove linker_error calls from nir_lower_samplers(). These should never happen. Plus, NIR passes really shouldn't be reporting linker errors - this is past link time. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-04-10 02:16:31 -07:00
Kenneth Graunke	99264b7f37	nir: Make nir_lower_samplers take a gl_shader_stage, not a gl_program *. We don't actually need a gl_program struct. We only used it to translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage (i.e. MESA_SHADER_VERTEX). We may as well just pass that. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-04-10 02:16:29 -07:00
Kenneth Graunke	4b27391cad	nir: Move gl_shader_stage enum from mtypes.h to shader_enums.h. I want to use this in some code that doesn't currently include mtypes.h. It seems like a better place for it anyway. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-04-10 02:16:27 -07:00
Jason Ekstrand	11694737fc	nir: Make nir__instr_create take a nir_shader instead of a void context Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-04-07 14:34:21 -07:00
Kenneth Graunke	a10d493715	nir: Implement a nir_sweep() pass. This pass performs a mark and sweep pass over a nir_shader's associated memory - anything still connected to the program will be kept, and any dead memory we dropped on the floor will be freed. The expectation is that this will be called when finished building and optimizing the shader. However, it's also fine to call it earlier, and many times, to free up memory earlier. v2: (feedback from Jason Ekstrand) - Skip sweeping impl->start_block, as it's already in the CF list. - Don't sweep SSA defs (they're owned by their defining instruction) - Don't steal phi sources (they're owned by nir_phi_instr). - Don't steal tex->src (it's owned by the tex_inst itself) - Don't sweep dereference chains (top-level dereferences are owned by the instruction; sub-dereferences are owned by the parent deref). - Don't sweep sources and destinations (SSA defs are handled as part of the defining instruction, and registers are handled as part of function implementations). - Just steal instructions; don't walk them (no longer required). v3: (feedback from Jason Ekstrand) - Steal indirect sources from nir_src/nir_dest. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-07 14:34:14 -07:00
Kenneth Graunke	de2014cf1e	nir: Allocate dereferences out of their parent instruction or deref. Jason pointed out that variable dereferences in NIR are really part of their parent instruction, and should have the same lifetime. Unlike in GLSL IR, they're not used very often - just for intrinsic variables, call parameters & return, and indirect samplers for texturing. Also, nir_deref_var is the top-level concept, and nir_deref_array/nir_deref_record are child nodes. This patch attempts to allocate nir_deref_vars out of their parent instruction, and any sub-dereferences out of their parent deref. It enforces these restrictions in the validator as well. This means that freeing an instruction should free its associated dereference chain as well. The memory sweeper pass can also happily ignore them. v2: Rename make_deref to evaluate_deref and make it take a nir_instr * instead of void *. This involves adding &instr->instr everywhere. (Requested by Jason Ekstrand.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-07 14:34:14 -07:00
Kenneth Graunke	4f4b04b7c7	nir: Allocate nir_ssa_def::uses/if_uses out of the instruction. We can't allocate them out of the nir_ssa_def itself, because it may not be ralloc'd (for example, nir_dest embeds a nir_ssa_def). However, allocating them out of the instruction should work. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-07 14:34:13 -07:00
Kenneth Graunke	900498bd11	nir: Allocate nir_phi_src values out of the nir_phi_instr. Phi sources are part of the phi instruction and should have the same lifetime. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-07 14:34:13 -07:00
Kenneth Graunke	b05d53404c	nir: Allocate nir_call_instr::params out of the nir_call itself. The lifetime of the params array needs to be match the nir_call_instr itself. So, allocate it using the instruction itself as the context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-07 14:34:13 -07:00
Jason Ekstrand	2e3b35a1cb	nir/lower_tex_projector: Don't use designated initializers These don't work in MSVC or in older versions of GCC Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89899 Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2015-04-07 11:49:39 -07:00
Matt Turner	d131630c08	nir: Remove fsin_reduced/fcos_reduced. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-06 10:13:22 -07:00
Matt Turner	5c71cf8531	glsl: Remove never used sin_reduced/cos_reduced. These were added in commit `f2616e56`, presumably in preparation for translating ARB vp/fp into GLSL IR. That never happened, and neither did a lowering pass that actually generated these instructions. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-06 10:13:22 -07:00
Rob Clark	f2ecc95e44	nir: add lowering for idiv/udiv/umod Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD(). See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an adaptation of the nv50 code from Ilia Mirkin). A python/numpy script which implements the same algorithm (and is possibly useful for debugging or analysis) can be found here: http://people.freedesktop.org/~robclark/div-lowering.py I've tested this on i965 hacked up to insert the idiv lowering pass, and on freedreno with NIR frontend. Signed-off-by: Rob Clark <robclark@freedesktop.org> Tested-by: Eric Anholt <eric@anholt.net> (vc4)	2015-04-05 09:20:35 -04:00
Rob Clark	7880bea2fb	nir: fix typo for f2b/i2b/b2i expressions (v2) v2: discovered that i2b/b2i are also confused Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-05 08:56:24 -04:00
Rob Clark	6829d76e02	nir: add option to lower slt/sge/seq/sne In freedreno these get implemented as the matching f* instruction plus a u2f to convert the result to float 1.0/0.0. But less lines of code to just let nir_opt_algebraic handle this for us, plus opens up some small window for other opt passes to improve (ie. if some shader ended up with both a flt and slt with same src args, for example). v2: use b2f rather than u2f Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-05 08:56:24 -04:00
Jason Ekstrand	9c53e80b9b	nir/lower_samplers: Use the right memory context for realloc'ing tex sources As of `da5ec2a`, we allocate instruction sources out of the instruction itself. When we realloc the texture sources we need to use the right memory context or ralloc will get angry and assert-fail Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-04-03 17:02:20 -07:00
Jason Ekstrand	52e718097f	nir: Add a cubemap normalizing pass This commit adds a pass to L1-normalize cube-map coordinates. Some hardware such as i965 requires that largest cube-map coordinate is +-1. We had a pass to perform this normalization in GLSL IR but we need it in NIR for cube maps on ARB programs to work correctly. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> v2 (Suggested by Eric): - Do a vector fabs and split into components later - Move to core NIR Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-03 14:12:49 -07:00
Jason Ekstrand	dccc57eaba	nir/from_ssa: Don't set reg->parent_instr for ssa_undef instructions Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2015-04-03 14:04:31 -07:00
Jason Ekstrand	7bdba4a245	nir: Add a src_get_parent_instr function Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2015-04-03 14:04:12 -07:00
Eric Anholt	ea811b7868	nir: Add a lowering pass for texture projectors. Not much hardware wants them these days, and it might give us a chance to do CSE or algebraic at the NIR level. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-03 11:50:24 -07:00
Eric Anholt	64bdfc698d	nir: Add an interface to turn a nir_src into a nir_ssa_def. We use nir_ssa_defs for nir_builder args, so this takes a nir_src and makes one so it can be passed in. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-03 11:50:22 -07:00
Eric Anholt	ec02970205	nir: Add an interface for the builder to insert instructions before. So far we'd only used nir_builder to build brand new programs. But if we're doing modifications to instructions (like in a lowering pass), then we want to generate new stuff before the instruction we're modifying. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-03 11:50:18 -07:00
Kenneth Graunke	da5ec2ac0b	nir: Allocate nir_tex_instr::sources out of the instruction itself. The lifetime of the sources array needs to be match the nir_tex_instr itself. So, allocate it using the instruction itself as the context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-02 14:20:03 -07:00
Kenneth Graunke	7380c641b1	nir: Allocate predecessor and dominance frontier sets from block itself. These sets are part of the block, and their lifetime needs to match the block itself. So, allocate them using the block itself as the context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-02 14:20:02 -07:00
Kenneth Graunke	131444e1c5	nir: Allocate register fields out of the register itself. The lifetime of each register's use/def/if_use sets needs to match the register itself. So, allocate them using the register itself as the context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-02 14:20:01 -07:00
Kenneth Graunke	587b3a20a1	nir: Make nir_create_function() strdup the function name. glsl_to_nir passes in the ir_function's name field; we were copying the pointer, but not duplicating the memory. We want to be able to free the linked GLSL IR program after translating to NIR, so we'll need to create a copy of the function name that the NIR shader actually owns. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-02 14:20:00 -07:00
Kenneth Graunke	f61b6c3e48	nir: Free dead variables when removing them. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-02 14:19:58 -07:00
Kenneth Graunke	f4e4491080	nir: Combine remove_dead_local_vars() and remove_dead_global_vars(). We can just pass a pointer to the list of variables, and reuse the code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-04-02 14:19:56 -07:00
Jason Ekstrand	ca3b4d6d17	nir/opt_peephole_ffma: Fix a couple typos in a comment Acked-by: Matt Turner <mattst88@gmail.com>	2015-04-02 11:09:37 -07:00
Jason Ekstrand	0573d0e484	nir/print: Correctly print swizzles for explicitly sized alu sources Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-02 10:21:18 -07:00
Matt Turner	781badee7a	nir: Remove useless ftrunc inside f2i/f2u. No shader-db changes, probably because they're all removed by the GLSL compiler optimization added in commit `69ad5fd4`. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	97e6c1b957	nir: Recognize (a < b \|\| a < c) as a < max(b, c). Doesn't work for analogous && cases, because of NaNs. total instructions in shared programs: 6195712 -> 6194829 (-0.01%) instructions in affected programs: 42000 -> 41117 (-2.10%) helped: 403 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	a2b6e908cf	nir: Add addition/multiplication identities of exp/log. instructions in affected programs: 2858 -> 2808 (-1.75%) helped: 12 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	099c729b4c	nir: Add identities for the log function. The rcp(log(x)) pattern affects instruction counts. instructions in affected programs: 144 -> 138 (-4.17%) helped: 6 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	8a6ae384b2	nir: Add identities for the exponential function. No changes in shader-db. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	e26783d445	nir: Recognize another open coded lrp. total instructions in shared programs: 6195924 -> 6195768 (-0.00%) instructions in affected programs: 4876 -> 4720 (-3.20%) helped: 58 HURT: 10 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Matt Turner	e82437e141	nir: Recognize open coded lrp. total instructions in shared programs: 6197614 -> 6195924 (-0.03%) instructions in affected programs: 34773 -> 33083 (-4.86%) helped: 147 HURT: 6 Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-01 13:43:57 -07:00
Jason Ekstrand	7f344721b1	nir/peephole_ffma: Be less agressive about fusing multiply-adds shader-db results for fragment shaders on Haswell: total instructions in shared programs: 4395688 -> 4389623 (-0.14%) instructions in affected programs: 355876 -> 349811 (-1.70%) helped: 1455 HURT: 14 GAINED: 5 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:04 -07:00
Jason Ekstrand	a8c8b3b872	nir: Add a dedicated ffma peephole optimization i965/nir: Use the dedicated ffma peephole total instructions in shared programs: 4418748 -> 4394618 (-0.55%) instructions in affected programs: 1292790 -> 1268660 (-1.87%) helped: 5999 HURT: 457 GAINED: 4 LOST: 9 Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:04 -07:00
Jason Ekstrand	e06a3d0282	nir: Move the compare-with-zero optimizations to the late section total instructions in shared programs: 4422307 -> 4422363 (0.00%) instructions in affected programs: 4230 -> 4286 (1.32%) helped: 0 HURT: 12 While this does hurt some things, the losses are minor and it prevents the compare-with-zero optimization from fighting with ffma which is much more important. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Jason Ekstrand	da294f9b2f	nir/algebraic: Add a seperate section for "late" optimizations i965/nir: Use the late optimizations Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Jason Ekstrand	1779dc060f	nir/algebraic: Remove a duplicate optimization This optimization is repeated verbatim above Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Jason Ekstrand	22ee7eeb4e	nir/algebraic: #define around structure definitions Previously, we couldn't generate two algebraic passes in the same file because of multiple structure definitions. To solve this, we play the age-old header file trick and just #define around it. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-04-01 12:51:03 -07:00
Jason Ekstrand	793a94d6b5	nir/print: Don't print extra swizzzle components Previously, NIR would just print 4 swizzle components if the swizzle was anything other than foo.xyzw. This creates lots of noise if, for example, you have a one-component element with a swizzle of foo.xxxx. Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>	2015-04-01 12:49:49 -07:00
Eric Anholt	15b03b7964	nir: Recognize a pattern of bool frobbing from TGSI KILL_IF. TGSI's conditional discards take float arg and negate it, so GLSL to TGSI generates a b2f and negates that value. Only, in NIR we want a proper bool once again, so we compare with 0. This is a lot of pointless extra instructions. total instructions in shared programs: 39735 -> 39702 (-0.08%) instructions in affected programs: 1342 -> 1309 (-2.46%) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-01 10:57:01 -07:00
Eric Anholt	6e8d4a2f80	nir: Recognize a pattern for doing b2f without the opcode. Since we have patterns based on b2f, generate them if we see the b2f equivalent using an iand. This is common when generating NIR from TGSI. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-04-01 10:57:01 -07:00
Kenneth Graunke	72b06fb08e	nir: Fix copy and pasted error message in nir_validate. These are nir_cf_nodes, not ALU instructions. Also, use unreachable() to preempt said review feedback. v2: Do it right (thanks Ilia). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-28 09:36:46 -07:00
Kenneth Graunke	bf2c3bc316	nir: Lower subtraction to add with negation when !lower_negate. prog->nir will generate fsub opcodes, but i965 doesn't implement them. We may as well lower them at the NIR level, since it's trivial to do. Suggested by Connor Abbott. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-03-27 21:16:34 -07:00
Kenneth Graunke	06f7bea96a	nir: Add builder helpers for MOVs with ALU sources and swizzling MOVs. These will be useful for prog->nir and tgsi->nir. v2: Don't forget to mark nir_swizzle as inline (Eric). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-03-27 21:16:33 -07:00
Kenneth Graunke	75c922e0fe	nir: Add nir_builder helpers for creating load_const intrinsics. Both prog->nir and tgsi->nir will want to use these. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-03-27 21:16:33 -07:00
Eric Anholt	afa9fc1561	nir: Add optional lowering of flrp. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-27 13:29:48 -07:00
Kenneth Graunke	3120345f40	nir: Add glsl_float_type() wrapper. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2015-03-25 16:17:19 -07:00
Matt Turner	babd0fa3e2	nir: Fix typo.	2015-03-24 19:14:40 -07:00
Matt Turner	3fb56805f0	nir: Recognize sat(add(b2f(a), b2f(b))) as a logical OR. Transform this into b2f(or(a, b)). instructions in affected programs: 432 -> 430 (-0.46%) helped: 2 Acked-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-24 14:43:37 -07:00
Matt Turner	c31158d2cb	nir: Recognize mul(b2f(a), b2f(b)) as a logical AND. Transform this into b2f(and(a, b)). total instructions in shared programs: 6205448 -> 6204391 (-0.02%) instructions in affected programs: 284030 -> 282973 (-0.37%) helped: 903 HURT: 6 Acked-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-24 14:43:37 -07:00
Matt Turner	95729d2458	nir: Handle mixed scalar/vector arguments to logical and/or/xor. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-24 14:43:37 -07:00
Jason Ekstrand	8a33f95b7a	nir/lower_io: Add a assign_locations function that sorts by [in]direct use v2: Delete the set of indirectly accessed variables when we're done with it v3: Rename from _packed to _scalar Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-19 13:18:39 -07:00
Jason Ekstrand	25db44a845	nir/lower_io: Make variable location assignment a manual operation Previously, we just assigned variable locations in nir_lower_io. Now, we force the user to assign variable locations for us. This gives the backend a bit more control over where variables are placed. v2: Rename from _packed to _scalar Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-19 13:18:39 -07:00
Jason Ekstrand	639115123e	nir: Use a list instead of a hash_table for inputs, outputs, and uniforms We never did a single hash table lookup in the entire NIR code base that I found so there was no real benifit to doing it that way. I suppose that for linking, we'll probably want to be able to lookup by name but we can leave building that hash table to the linker. In the mean time this was causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us unique names of uniforms, etc. This was causing massive rendering isues in the unreal4 Sun Temple demo. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-19 13:18:38 -07:00
Matt Turner	dd0d3a2c0f	mesa: Replace _mesa_round_to_even() with _mesa_roundeven(). Eric's initial patch adding constant expression evaluation for ir_unop_round_even used nearbyint. The open-coded _mesa_round_to_even implementation came about without much explanation after a reviewer asked whether nearbyint depended on the application not modifying the rounding mode. Of course (as Eric commented) we rely on the application not changing the rounding mode from its default (round-to-nearest) in many other places, including the IROUND function used by _mesa_round_to_even! Worse, IROUND() is implemented using the trunc(x + 0.5) trick which fails for x = nextafterf(0.5, 0.0). Still worse, _mesa_round_to_even unexpectedly returns an int. I suspect that could cause problems when rounding large integral values not representable as an int in ir_constant_expression.cpp's ir_unop_round_even evaluation. Its use of _mesa_round_to_even is clearly broken for doubles (as noted during review). The constant expression evaluation code for the packing built-in functions also mistakenly assumed that _mesa_round_to_even returned a float, as can be seen by the cast through a signed integer type to an unsigned (since negative float -> unsigned conversions are undefined). rint() and nearbyint() implement the round-half-to-even behavior we want when the rounding mode is set to the default round-to-nearest. The only difference between them is that nearbyint() raises the inexact exception. This patch implements _mesa_roundeven{f,}, a function similar to the roundeven function added by a yet unimplemented technical specification (ISO/IEC TS 18661-1:2014), with a small difference in behavior -- we don't bother raising the inexact exception, which I don't think we care about anyway. At least recent Intel CPUs can quickly change a subset of the bits in the x87 floating-point control register, but the exception mask bits are not included. rint() does not need to change these bits, but nearbyint() does (twice: save old, set new, and restore old) in order to raise the inexact exception, which would incur some penalty. Reviewed-by: Carl Worth <cworth@cworth.org>	2015-03-18 21:06:26 -07:00
Jason Ekstrand	27bf37ba05	nir/peephole_select: Allow uniform/input loads and load_const Shader-db results on HSW: total instructions in shared programs: 4174156 -> 4157291 (-0.40%) instructions in affected programs: 145397 -> 128532 (-11.60%) helped: 383 HURT: 0 GAINED: 20 LOST: 22 There are two more tests lost than gained. However, comparing this with GLSL IR vs. NIR results, the overall delta is reduced from 85/44 gained/lost on current master to 71/32 with this commit. Therefore, I think it's probably a boon since we are getting "closer" to where we were before. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-17 17:11:05 -07:00
Jason Ekstrand	1be862c0c4	nir/peephole_select: Copy instructions into the block before the if Previously we tried to do poor-man's copy propagation as we created the select instructions. Instead, this commit just moves the instructions from the blocks inside the if into the block before. Copy propagation will take care of making sure we don't have any extra mov's in there for us. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-17 17:11:05 -07:00
Jason Ekstrand	8cf40ed05d	nir/peephole_select: Rename are_all_move_to_phi and use a switch Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-17 17:11:05 -07:00
Kenneth Graunke	f3e4b2c9d2	nir: Fix non-determinism in nir_lower_vars_to_ssa(). Previously, we stored derefs in a hash table, using the malloc'd pointer as the key. Then, we walked through the hash table and generated code, based on the order of the hash table's elements. Memory addresses returned by malloc are pretty much random, which meant that the hash was random, and the hash table's elements would be walked in some random order. This led to successive compiles of the same shader using different variable names and slightly different orderings of phi-nodes. Code could not be diff'd, and the final assembly would sometimes change slightly too. It turns out the only point of the hash table was to avoid inserting the same node multiple times for different dereferences. We never actually searched the hash table! This patch uses an intrusive linked list instead. Since exec_list uses head and tail sentinels, checking prev or next against NULL will tell us whether the node is already in the list. Pair programming with Jason Ekstrand. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-12 13:25:39 -07:00
Kenneth Graunke	2c79f6f9c3	nir: Add intrinsics for SYSTEM_VALUE_BASE_VERTEX and VERTEX_ID_ZERO_BASE Ian and I added these around the time Connor was developing NIR. Now that both exist, we should make them work together! Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-12 08:29:48 -07:00
Jason Ekstrand	90e50908d7	nir/worklist: Don't change the start index when computing the tail index Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2015-03-11 15:18:16 -07:00
Thomas Helland	8fb8fe46fa	nir: Optimize a + neg(a) Shader-db i965 instructions: total instructions in shared programs: 1711180 -> 1711159 (-0.00%) instructions in affected programs: 825 -> 804 (-2.55%) helped: 9 HURT: 0 GAINED: 3 LOST: 3 Shader-db NIR instructions: total instructions in shared programs: 606187 -> 606179 (-0.00%) instructions in affected programs: 298 -> 290 (-2.68%) helped: 4 HURT: 0 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Thomas Helland <thomashelland90@gmail.com>	2015-03-11 14:21:05 -07:00
Thomas Helland	0525f2e851	nir: Optimize (ab)+(ac) -> a*(b+c) Shader-db i965 instructions: total instructions in shared programs: 1715894 -> 1710802 (-0.30%) instructions in affected programs: 443080 -> 437988 (-1.15%) helped: 1502 HURT: 13 GAINED: 4 LOST: 4 Shader-db NIR instructions: total instructions in shared programs: 607710 -> 606187 (-0.25%) instructions in affected programs: 208285 -> 206762 (-0.73%) helped: 769 HURT: 8 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Thomas Helland <thomashelland90@gmail.com>	2015-03-11 14:21:05 -07:00
Kenneth Graunke	b9c2fa15e3	nir: Make the printer include nir_variable::location too. Being able to see both location and driver_location can be useful when debugging IO mistakes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-03-09 01:34:03 -07:00
Kenneth Graunke	8dcc1f2c10	nir: Only do gl_FrontFacing workaround in glsl_to_nir for the FS. Vertex shaders can have shader inputs where location happens to be VARYING_SLOT_FACE. Without predicating this on the shader stage, we suddenly end up with load_front_face intrinsics in vertex shaders, which is nonsensical. Fixes spec/arb_vertex_buffer_object/pos-array when using NIR for VS. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-08 20:04:02 -07:00
Kenneth Graunke	c6f2abe67e	nir: Plumb the shader stage into glsl_to_nir(). The next commit needs to know the shader stage in glsl_to_nir(). To facilitate that, we pass the gl_shader rather than the raw exec_list of instructions. This has both the exec_list and the stage. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-08 20:04:01 -07:00
Kenneth Graunke	b200cbb0a4	nir: Add native_integers to nir_shader_compiler_options. glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the driver supports native integers. Presumably other passes may as well. Adding this to nir_shader_compiler_options is an easy way to provide that information, as it's accessible via nir_shader::options. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-08 20:03:57 -07:00
Kenneth Graunke	a55da73be4	nir: Try to make sense of the nir_shader_compiler_options code. The code in glsl_to_nir is entirely dead, as we translate from GLSL to NIR at link time, when there isn't a _mesa_glsl_parse_state to pass, so every caller passes NULL. glsl_to_nir seems like the wrong place to try and create the shader compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other translators all would have to duplicate that code. The driver should set this up once with whatever settings it wants, and pass it in. Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[] and left a comment saying: "The memory for the options is expected to be kept in a single static copy by the driver." This suggests the plan was to do exactly that. That pointer was not marked const, however, and the dead code used a mix of static structures and ralloced ones. This patch deletes the dead code in glsl_to_nir, instead making it take the shader compiler options as a mandatory argument. It creates an (empty) options struct in the i965 driver, and makes NirOptions point to that. It marks the pointer const so that we can actually do so without generating "discards const qualifier" compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-03-08 20:03:46 -07:00
Kenneth Graunke	2561aea6b3	nir: Delete nir_shader::user_structures and num_user_structures. Nothing actually uses these, and the only caller of glsl_to_nir() (brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state pointer, meaning they'll always be NULL and 0, respectively. Just delete them. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-08 20:03:44 -07:00
Jose Fonseca	40a4797384	nir: Use helper macros for dealing with VLAs. v2: - Single statement, by using memset return value as suggested by Ian Romanick. - No internal declaration, as suggested by Jason Ekstrand. - Move macros to a header. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-03-04 10:52:02 +00:00
Jose Fonseca	f320ecf218	nir: Use alloca instead of variable length arrays. This is to enable the code to build with -Werror=vla in the short term, and enable the code to build with MSVC2013 soon after. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-27 14:30:36 +00:00
Kenneth Graunke	8e62bd52f8	nir: Introduce nir_intrinsic_discard_if. This is a conditional discard, which takes a boolean source. Note that we don't generate ir_discard::condition today, so this shouldn't break drivers (since none implement this intrinsic yet). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-02-24 15:24:52 -08:00
Jason Ekstrand	c750ecaa12	nir/register: Add a parent_instr field This adds a parent_instr field similar to the one for ssa_def. The difference here is that the parent_instr field on a nir_register can be NULL if the register does not have a unique definition or if that definition does not dominate all its uses. We set this field in the out-of-SSA pass so that backends can get SSA-like information even after they have gone out of SSA. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-02-24 14:08:04 -08:00
Jason Ekstrand	9b9ef2aeee	nir/gcm: Add some missing break statements Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-23 13:20:13 -08:00
Jason Ekstrand	cb4b2ad44a	nir: Copy-propagate vecN operations that are actually moves We were already do this for ALU operations but we haven't for non-ALU operations. This changes that. total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%) NIR instructions in affected programs: 1768850 -> 1751305 (-0.99%) helped: 14244 HURT: 124 total FS instructions in shared programs: 4083960 -> 4084036 (0.00%) FS instructions in affected programs: 7302 -> 7378 (1.04%) helped: 12 HURT: 51 Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-23 13:19:05 -08:00
Eric Anholt	4359954d84	nir: Generalize the optimization of subs of subs from 0. I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above, but we can generalize it and make it more potentially useful. In the specific original case of a 0 for our new 'a' argument, it'll get further algebraic optimization once the 0 is an argument to the new add. No shader-db effects. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	345c2b288a	nir: Collapse repeated bcsels on the same argument. vc4 results: total instructions in shared programs: 39881 -> 39794 (-0.22%) instructions in affected programs: 6302 -> 6215 (-1.38%) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	a38038ca5e	nir: When faced with a csel on !condition, just flip the arguments. total NIR instructions in shared programs: 39426 -> 39411 (-0.04%) NIR instructions in affected programs: 3748 -> 3733 (-0.40%) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	8e1152cb33	nir: Allow nir_opt_algebraic to see booleanness through &&, \|\|, ^, !. We have some useful optimizations to drop things like 'ine a, 0' on a boolean argument, but if 'a' came from logical operations on bools, it couldn't tell. These kinds of constructs appear as a result of TGSI->NIR quite frequently (at least with if flattening), so being a little more aggressive in detecting booleans can pay off. v2: Add ixor as a booleanness-preserving op (Suggestion by Connor). vc4 results: total instructions in shared programs: 40207 -> 39881 (-0.81%) instructions in affected programs: 6677 -> 6351 (-4.88%) Reviewed-by: Matt Turner <mattst88@gmail.com> (v1) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Eric Anholt	dc982f4a85	nir: Add a couple of simplifications of csel operations. vc4 was already cleaning these up, but it does shave 4 NIR instructions in shader-db. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-21 14:57:14 -08:00
Kenneth Graunke	b6393d7040	nir: Fix the Mesa build without -DDEBUG. With -DDEBUG -UNDEBUG, this assert uses reg_state::stack_size, which doesn't exist, breaking the build: assert(state->states[index].index < state->states[index].stack_size); Switch it to ifndef NDEBUG, so the field will exist if the assertion actually generates code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-02-20 13:43:44 -08:00
Eric Anholt	bef38f62e0	nir: Drop dependency on mtypes.h for core NIR. One less new directory necessary for gallium code that wants to interact with NIR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-02-20 11:36:34 -08:00
Eric Anholt	b53d035825	util: Move Mesa's bitset.h to util/. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-02-20 11:36:34 -08:00
Jason Ekstrand	c7002fad90	nir/GCM: Pull unpinned instructions out of blocks while pinning This lets us be slightly more efficient by not walking the CFG extra times. Also, it may make it easier to ensure that GVN happens on only unpinned instructions. Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	8dfe6f672f	nir/GCM: Use pass_flags instead of bitsets for tracking visited/pinned Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	190073c737	nir: Add a global code motion (GCM) pass v2 Jason Ekstrand <jason.ekstrand@intel.com>: - Use nir_dominance_lca for computing least common anscestors - Use the block index for comparing dominance tree depths - Pin things that do partial derivatives Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	a52a4b5223	nir/instr: Change "live" to a more generic "pass_flags" field Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	3d25afc51c	nir: Make nir_[cf_node/instr]_[prev/next] return null if at the end Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	902b0ccc9a	nir/from_ssa: Don't try to read an invalid instruction Right now, the nir_instr_prev function function blindly looks up the previous element in the exec list and casts it to an instruction even if it's the tail sentinel. The next commit will change this to return null if it's the first instruction. Making this change first avoids getting a segfault between commits. The only reason we never noticed is that, thanks to the way things are laid out in nir_block, the casted instruction's type was never parallal_copy. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	0281fd0786	nir/validate: Validate SSA defs the same way we do for registers Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	34952b5671	nir/validate: Validate if_uses on registers Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	98ecb25f89	nir: Properly clean up CF nodes when we remove them Previously, if you remved a CF node that still had instructions in it, none of the use/def information from those instructions would get cleaned up. Also, we weren't removing if statements from the if_uses of the corresponding register or SSA def. This commit fixes both of these problems Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	e025943134	nir: use nir_foreach_ssa_def for indexing ssa defs This is both simpler and more correct. The old code didn't properly index load_const instructions. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	0167c38cac	nir/from_ssa: Use the nir_block_dominance function instead of our own Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	f481a9425c	nir/dominance: Add a constant-time mechanism for comparing blocks This is mostly thanks to Connor. The idea is to do a depth-first search that computes pre and post indices for all the blocks. We can then figure out if one block dominates another in constant time by two simple comparison operations. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:17 -08:00
Jason Ekstrand	b4c5489c8a	nir/dominance: Expose the dominance intersection function Being able to find the least common anscestor in the dominance tree is a useful thing that we may want to do in other passes. In particular, we need it for GCM. v2: Handle NULL inputs by returning the other block Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-19 17:06:16 -08:00
Brian Paul	2f5597787c	nir: add missing GLSL_TYPE_DOUBLE case in type_size() To silence compiler warning about unhandled switch case. v2: move GLSL_TYPE_DOUBLE to the "not reached" section, per Ilia. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-19 15:36:59 -07:00
Eric Anholt	2a135c470e	nir: Add an ALU op builder kind of like ir_builder.h v2: Rebase on the nir_opcodes.h python code generation support. v3: Use SSA values, and set an appropriate writemask on dot products. v4: Make the arguments be SSA references as well. This lets you stack up expressions in the arguments of other expressions, at the cost of having to insert a fmov/imov if you want to swizzle. Also, add the generated file to NIR_GENERATED_FILES. v5: Use more pythonish style for iterating the list. v6: Infer the size of the dest from the size of the srcs, and auto-swizzle a single small src out to the appropriate size. v7: Add little helpers for initializing the struct, add a typedef for the struct like other nir types have. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v6) Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v7)	2015-02-18 22:28:42 -08:00
Eric Anholt	6eadde51bb	nir: Recognize and reduce duplicated fsats. No effect on vc4 shader-db. v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:51 -08:00
Eric Anholt	1907a3a7ee	nir: Add a flag for lowering fsat. vc4 cse/algebraic-disabled stats: total instructions in shared programs: 44356 -> 44354 (-0.00%) instructions in affected programs: 55 -> 53 (-3.64%) v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:51 -08:00
Eric Anholt	e5ecf8e427	nir: Add a flag for lowering ffma. vc4 cse/algebraic-disabled stats: total uniforms in shared programs: 13966 -> 13791 (-1.25%) uniforms in affected programs: 435 -> 260 (-40.23%) total instructions in shared programs: 44732 -> 44356 (-0.84%) instructions in affected programs: 9599 -> 9223 (-3.92%) v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:51 -08:00
Eric Anholt	42a8ace66e	nir: Add a flag for lowering fneg/ineg. vc4 cse/algebraic-disabled stats: total instructions in shared programs: 44911 -> 44732 (-0.40%) instructions in affected programs: 11371 -> 11192 (-1.57%) v2: Fix broken iabs(isub(0, a)) transformation. v3: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:51 -08:00
Eric Anholt	cb95a228e8	nir: Add a flag for lowering fsqrt(x) to frcp(frsqrt(x)). vc4 cse/algebraic-disabled stats: total uniforms in shared programs: 13972 -> 13966 (-0.04%) uniforms in affected programs: 408 -> 402 (-1.47%) total instructions in shared programs: 44973 -> 44911 (-0.14%) instructions in affected programs: 1551 -> 1489 (-4.00%) v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:50 -08:00
Eric Anholt	ccf14bca4b	nir: Add lowering of POW instructions if the lower flag is set. This could be done in a separate pass like we do in GLSL IR, but it seems to me like having the definitions of the transformations in the two directions next to each other makes a lot of sense. v2: Reorder the comment about the transformation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-18 14:47:50 -08:00
Eric Anholt	8e9dbfff17	nir: Conditionalize the POW reconstruction on shader compiler options. Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic and other passes should try and generate certain types of opcodes or patterns. Extend that to NIR by defining our own struct, which is automatically generated from the Mesa struct in glsl_to_nir and provided directly by the driver in TGSI-to-NIR. v2: Split out the previous two prep patches. v3: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)	2015-02-18 14:47:50 -08:00
Eric Anholt	955a6bb57d	nir: Add an optional expression controlling nir_algebraic xforms. This will be used so that we can customize the transforms for the target GPU, so we don't un-lower expressions that had already been lowered (or introduce new lowering transformations that not all GPUs want) v2: Drop the complication of having the condition->index dictionary, since we don't actually expect there to be many different conditions (change by Kenneth). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-18 14:47:50 -08:00
Eric Anholt	f90bb54734	nir: Add a nir_shader_compiler_options struct pointed to by the shaders. This will be used to give the optimization passes a chance to customize behavior for the particular target device. v2: Rebase to master (no TGSI->NIR present) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)	2015-02-18 14:47:50 -08:00
Jason Ekstrand	dd110cdfd8	nir: Make gl_FrontFacing a system_value GLSL IR labels gl_FrontFacing as an input variable and not a system value. This commit makes NIR silently translate gl_FrontFacing to a system value so that it properly gets translated into a load_system_value intrinsic. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-02-14 13:47:16 -08:00
Jason Ekstrand	929f43851e	nir/lower_phis_to_scalar: Fix some logic in is_phi_scalarizable Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-14 13:46:59 -08:00
Matt Turner	4c42e1116b	nir: Recognize open-coded fmin/fmax. And unfortunately other shaders do the same thing but with >=/<= which we can't apply this optimization to because of NaNs. instructions in affected programs: 23309 -> 22938 (-1.59%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-11 13:50:19 -08:00
Eric Anholt	56e21647e2	nir: Add algebraic opt for int comparisons with identical operands. No change on shader-db on i965. v2: Reword the comment due to feedback from Erik Faye-Lund Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)	2015-02-11 11:52:38 -08:00
Eric Anholt	2919bdf466	nir: Fix load_const comparisons for CSE. We want the size of a float per component, not the size of a whole vec4. NIR instructions on i965: total instructions in shared programs: 1261937 -> 1261929 (-0.00%) instructions in affected programs: 114 -> 106 (-7.02%) Looking at one of these examples (tesseract), it's from vec4 load_consts for a MRT solid fill, which do get CSEed now that we don't memcmp off the end of the const value and into the SSA def. For the 1-component loads that are common in i965, we were only memcmping off into the rest of the usually zero-filled const_value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-11 11:52:38 -08:00
Matt Turner	a9065cef48	nir: Remove casts from void*. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-10 17:48:42 -08:00
Matt Turner	bb1e007157	nir: Replace assert(0) with unreachable(). Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-10 17:48:31 -08:00
Matt Turner	942b56ad05	nir: Remove unused has_indirect variable. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-10 17:48:16 -08:00
Kenneth Graunke	480ee1f0b4	nir: Mark nir_print_instr's instr pointer as const. Printing instructions doesn't modify them, so we can mark the parameter const. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-10 03:37:55 -08:00
Eric Anholt	bff4cbdafa	nir: Fix broken fsat recognizer. We've probably never seen this ridiculous pattern in the wild, so it didn't matter. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-06 15:57:55 -08:00
Eric Anholt	6706537dd4	nir: Slightly simplify algebraic code generation by reusing a struct. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-02-06 15:57:55 -08:00
Connor Abbott	a135f34080	nir: add an optimization to remove useless phi nodes This removes phi nodes whose sources all point to the same thing. Shader-db results: total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%) NIR instructions in affected programs: 126564 -> 122480 (-3.23%) helped: 615 HURT: 0 total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%) FS instructions in affected programs: 24622 -> 23174 (-5.88%) helped: 138 HURT: 0 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Tested-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 16:00:13 -05:00
Jason Ekstrand	572d1f6e41	nir/validate: Ensure that phi sources are SSA-only Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 12:52:42 -08:00
Jason Ekstrand	5420774510	nir/validate: Validate that only float ALU outputs are saturated Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 12:46:55 -08:00
Jason Ekstrand	c0df85cca4	nir/lower_source_mods: Don't lower saturate for non-float outputs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-03 12:46:38 -08:00
Jason Ekstrand	f2adcd36cb	nir: Add a pass to lower vector phi nodes to scalar phi nodes v2 Jason Ekstrand <jason.ekstrand@intel.com>: - Add better comments - Use nir_ssa_dest_init and nir_src_for_ssa more places - Fix some void * casts v3 Jason Ekstrand <jason.ekstrand@intel.com>: - Rework the way we determine whether or not to sccalarize a phi node to make the recursion non-bogus - Treat load_const instructions as scalarizable v4 Jason Ekstrand <jason.ekstrand@intel.com>: - Allow uniform and input loads to be scalarizable v5 Jason Ekstrand <jason.ekstrand@intel.com>: - Also consider loads of inputs (varying, uniform, or ubo) to be scalarizable. We were already doing this for load_var on uniforms and inputs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-03 12:33:11 -08:00
Jason Ekstrand	604ae33c8b	nir/opt_algebraic: Add some constant bcsel reductions total instructions in shared programs: 5998190 -> 5997603 (-0.01%) instructions in affected programs: 54276 -> 53689 (-1.08%) helped: 293 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:11:13 -08:00
Jason Ekstrand	7f19cd5a56	nir/opt_algebraic: Add some boolean simplifications total instructions in shared programs: 5998321 -> 5998287 (-0.00%) instructions in affected programs: 4520 -> 4486 (-0.75%) helped: 8 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:11:10 -08:00
Jason Ekstrand	70273c5cd5	nir/algebraic: Support specifying variable as constant or by type Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	81f77e4f3a	nir/algebraic: Fail to compile of a variable is used in a replace but not the search Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	026b5cc792	nir/search: Allow for matching variables based on types This allows you to match on an unknown value but only if it is of a given type. 90% of the uses of this are for matching only booleans, but adding the generality of arbitrary types is no more complex. nir_algebraic.py doesn't handle this yet but that's ok because the C language will ensure that the default type on all variables is void. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	d8999bcdce	nir/search: Add support for matching unknown constants There are some algebraic transformations that we want to do but only if certain things are constants. For instance, we may want to replace a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant. While this generates more instructions, some of it will get constant folded. nir_algebraic.py doesn't handle this yet, but that's ok because the C language will make sure that false is the default for now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Jason Ekstrand	5ab1489ae6	nir: Add an invalid type This allows us to indicate a concept of an invalid type. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-29 17:07:45 -08:00
Eric Anholt	fc884eadf1	nir: Add variants of some of the comparison simplifications. We end up with these from TGSI-to-NIR because the pass generating the comparisons doesn't know if the arg is actually a bool input or not. vc4 results: total instructions in shared programs: 41801 -> 41508 (-0.70%) instructions in affected programs: 4253 -> 3960 (-6.89%) Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-29 11:44:06 -08:00
Eric Anholt	9a3a60cb13	nir: Don't try to to-SSA ALU instructions that are already SSA. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-29 11:43:33 -08:00
Eric Anholt	68d476167c	nir: Fix a bit of broken indentation. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-29 11:42:08 -08:00
Eric Anholt	36c604c824	nir: Add a couple of helpers for glsl types. This will be used by tgsi_to_nir, which needs to get vec4 types for declaring shader input/output variables. v2: Add a missing space. Reviewed-by: Matt Turner <mattst88@gmail.com> (v2) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-29 11:41:17 -08:00
Eric Anholt	dd4d9a4e62	nir: Make vec-to-movs handle src/dest aliasing. It now emits vector MOVs instead of a series of individual MOVs, which should be useful to any vector backends. This pushes the problem of src/dest aliasing of channels on a scalar chip to the backend, but if there are any vector operations in your shader then you needed to be handling this already. Fixes fs-swap-problem with my scalarizing patches. v2: Rename to insert_mov(), and add a comment about what it does. v3: Rewrite the comment. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)	2015-01-28 16:33:34 -08:00
Jason Ekstrand	bb26ebac13	nir/opcodes: Use a return type of tfloat for ldexp Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-28 13:21:40 -08:00
Jason Ekstrand	f0340ff625	Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp" This reverts commit `d7d340fb2f`. We have an isnormal() implementation available, the only problem was that we had the wrong return type (fixed in a later patch). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Acked-by: Matt Turner <mattst88@gmail.com>	2015-01-28 13:19:47 -08:00
Jason Ekstrand	d7d340fb2f	nir/opcodes: Use fpclassify() instead of isnormal() for ldexp Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2015-01-28 03:42:41 -08:00
Connor Abbott	f1a9252def	nir: fix a bug with constant folding non-per-component instructions Before, we were only copying the first N channels, where N is the size of the SSA destination, which is fine for per-component instructions, but non-per-component instructions like fdot3 can have more source components than destination components. Fix this using the helper function introduced in the last patch. v2: use new helper name Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 21:26:36 -05:00
Connor Abbott	816f0515a2	nir: add a helper function for getting the number of source components Unlike with non-SSA ALU instructions, where if they're per-component you have to look at the writemask to know which source channels are being used, SSA ALU instructions always have all the possible channels enabled so we can just look at the number of components in the SSA definition for per-component instructions to say how many source components are being used. v2: use new name nir_ssa_alu_instr_src_components() Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 21:26:36 -05:00
Jason Ekstrand	dd74369a0a	nir/opcodes: Don't go through doubles when constant-folding iabs Previously, we called the abs() function in math.h. However, this involves unnecessarily going through double. This commit changes it to use integers directly with a ternary. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-26 11:25:02 -08:00
Jason Ekstrand	9bd28fe3a3	nir/opcodes: Simplify and fix the unpack_half__split_ constant expressions Previously, these functions were explicitly writing to dst.x and dst.y. However they both return only one component so writing to dst.y is invalid. Also, since they only return one component, we don't need the explicit assignment in the expression and can simplify it use an implicit assignment. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 11:25:02 -08:00
Jason Ekstrand	27c6e3e4ca	nir: Use pointers for nir_src_copy and nir_dest_copy This avoids the overhead of copying structures and better matches the newly added nir_alu_src_copy and nir_alu_dest_copy. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-26 11:24:58 -08:00
Connor Abbott	0aa31bf9c3	nir/constant_folding: use the new constant folding infrastructure Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-24 21:35:35 -08:00
Jason Ekstrand	89285e4d47	nir: add new constant folding infrastructure Add a required field to the Opcode class, const_expr, that contains an expression or statement that computes the result of the opcode given known constant inputs. Then take those const_expr's and expand them into a function that takes an opcode and an array of constant inputs and spits out the constant result. This means that when adding opcodes, there's one less place to update, and almost all the opcodes are self-documenting since the information on how to compute the result is right next to the definition. The helper functions in nir_constant_expressions.c were taken from ir_constant_expressions.cpp. v3 Jason Ekstrand <jason.ekstrand@iastate.edu> - Use mako to generate one function per opcode instead of doing piles of string splicing v4 Jason Ekstrand <jason.ekstrand@iastate.edu> - More comments and better indentation in the mako - Add a description of the constant expression language in nir_opcodes.py - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-24 21:35:35 -08:00
Connor Abbott	fa4bc6c130	nir: use Python to autogenerate opcode information Before, we used a system where a file, nir_opcodes.h, defined some macros that were included to generate the enum values and the nir_op_infos structure. This worked pretty well, but for development the error messages were never very useful, Python tools couldn't understand the opcode list, and it was difficult to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h, which contains all the enum names and gets included into nir.h like before. In addition to solving the above problems, using Python and Mako to generate everything means that it's much easier to add keep information centralized as we add new things like constant propagation that require per-opcode information. v2: - make Opcode derive from object (Dylan) - don't use assert like it's a function (Dylan) - style fixes for fnoise, use xrange (Dylan) - use iterkeys() in nir_opcodes_h.py (Dylan) - use pydoc-style comments (Jason) - don't make fmin/fmax commutative and associative yet (Jason) Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> v3 Jason Ekstrand <jason.ekstrand@intel.com> - Alphabetize source file lists - Generate nir_opcodes.h in the builddir instead of the source dir - Include $(builddir)/src/glsl/nir in the i965 build - Rework nir_opcodes.h generation so it generates a complete header file instead of one that has to be embedded inside an enum declaration	2015-01-24 21:33:56 -08:00
Eric Anholt	0680d170d1	nir: Expose nir_print_instr() for debug prints It's nice to have this present in your default cases so you can see what instruction is triggering an abort. v2: Just pass a NULL state, now that it won't crash when you do. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-23 17:30:11 -08:00
Eric Anholt	6445a40520	nir: When asked to print with a NULL state, just use bare variable names. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-23 17:30:01 -08:00
Eric Anholt	447ddfc137	nir: Add nir_lower_alu_to_scalar. This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted for vc4. v2: Use the nir_src_for_ssa() helper, and another instance of nir_alu_src_copy(). v3: Drop the non-SSA support. All intended callers will have SSA-only ALU ops. v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused unsupported() function, drop lower_context struct. v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert about weird input_sizes[]. Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>	2015-01-23 16:37:23 -08:00
Eric Anholt	b200127816	nir: Make some helpers for copying ALU src/dests. There aren't many users yet, but I wanted to do this from my scalarizing pass. v2: Constify the src arguments. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-23 16:37:16 -08:00
Kenneth Graunke	15063d2ad0	nir: Add algebraic optimizations for division and reciprocal. These also exist in opt_algebraic.cpp. total NIR instructions in shared programs: 2011430 -> 2011211 (-0.01%) NIR instructions in affected programs: 42221 -> 42002 (-0.52%) helped: 198 total i965 instructions in shared programs: 6020553 -> 6020116 (-0.01%) i965 instructions in affected programs: 84322 -> 83885 (-0.52%) helped: 394 HURT: 1 (by 1 instruction) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-23 14:53:26 -08:00
Kenneth Graunke	bbd60f6d79	nir: Add algebraic optimizations for exponential/logarithmic functions. Most of these exist in the GLSL IR algebraic pass already. However, SSA allows us to find more instances of the patterns. total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%) NIR instructions in affected programs: 124189 -> 120026 (-3.35%) helped: 604 total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%) i965 instructions in affected programs: 261295 -> 254507 (-2.60%) helped: 1295 HURT: 3 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-23 14:53:26 -08:00
Kenneth Graunke	391fb32bbe	nir: Add algebraic optimizations for simplifying comparisons. The first batch removes bonus fnot/inot operations, possibly allowing other optimizations to better recognize patterns. The next batch replaces a fadd and constant 0.0 with an fneg - negation is usually free on GPUs, while addition is not. total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%) NIR instructions in affected programs: 411143 -> 405922 (-1.27%) helped: 2233 HURT: 214 A few shaders are hurt by a few instructions due to moving neg such that it has a constant operand, which is then folded, resulting in two distinct load_consts for x and -x. We can always clean that up later. total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%) i965 instructions in affected programs: 784980 -> 775093 (-1.26%) helped: 4508 HURT: 2 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-23 14:53:26 -08:00
Kenneth Graunke	551a752a59	nir: Add algebraic optimizations for pointless shifts. The GLSL IR optimization pass contained these; we may as well include them too. v2: Fix a >> 0 and a << 0 optimizations (caught by Matt). No change in the number of NIR instructions on a shader-db run. total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%) i965 instructions in affected programs: 542 -> 537 (-0.92%) helped: 2 (in glamor) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-23 14:53:26 -08:00
Kenneth Graunke	3e56572c49	nir: Add a bunch of algebraic optimizations on logic/bit operations. Matt and I noticed a bunch of "val <- ior a a" operations in a shader, so we decided to add an algebraic optimization for that. While there, I decided to add a bunch more of them. v2: Delete bogus fand/for optimizations (caught by Jason). total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%) NIR instructions in affected programs: 149634 -> 146937 (-1.80%) helped: 1032 total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%) i965 instructions in affected programs: 537 -> 542 (0.93%) HURT: 2 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-23 14:53:26 -08:00
Kenneth Graunke	978b0a9cda	nir: Implement CSE on intrinsics that can be eliminated and reordered. Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had load_input and load_uniform intrinsics repeated several times, with the same parameters, but each one generating a distinct SSA value. This made ALU operations on those values appear distinct as well. Generating distinct SSA values is silly - these are read only variables. CSE'ing them makes everything use a single SSA value, which then allows other operations to be CSE'd away as well. Generalizing a bit, it seems like we should be able to safely CSE any intrinsics that can be eliminated and reordered. I didn't implement support for variables for the time being. v2: Assert that info->num_variables == 0 (requested by Jason). total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%) NIR instructions in affected programs: 2413496 -> 2001071 (-17.09%) helped: 16872 total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%) i965 instructions in affected programs: 640654 -> 620094 (-3.21%) helped: 2071 HURT: 585 GAINED: 14 LOST: 25 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-23 14:53:26 -08:00
Kenneth Graunke	cbdd623f13	nir: Pull nir_instr_can_cse()'s SSA checks out of the switch. This should not be a change in behavior, as all current cases that potentially answer "yes" require SSA. The next patch will introduce another case that requires SSA. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-01-23 14:53:26 -08:00
Connor Abbott	68a9d0b36f	nir: add generated file to .gitignore Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-23 10:20:46 -08:00
Eric Anholt	fc6938d23e	nir: Fix setup of constant bool initializers. brw_fs_nir has only seen scalar bools so far, thanks to vector splitting, and the ralloc of in glsl_to_nir.cpp will usually get you a 0-filled chunk of memory, so reading too large of a value will usually get you the right bool value. But once we start doing vector bools in a few commits, we end up getting bad values. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-22 13:52:19 -08:00
Eric Anholt	534a4ec82f	nir: Make an easier helper for setting up SSA defs. Almost all instructions we nir_ssa_def_init() for are nir_dests, and you have to keep from forgetting to set is_ssa when you do. Just provide the simpler helper, instead. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2015-01-22 13:52:19 -08:00
Matt Turner	28b7c6b285	nir: Replace assert(0) with unreachable(). Fixes a couple of warnings in the process. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-21 21:06:37 -08:00
Jason Ekstrand	f88c6a4997	nir: Stop using designated initializers Designated initializers with anonymous unions don't work in MSVC or GCC < 4.6. With a couple of constructor methods, we don't need them any more and the code is actually cleaner. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467 Reviewed-by: Connor Abbot <cwabbott0@gmail.com>	2015-01-21 19:55:02 -08:00
Jason Ekstrand	7da60eca4f	nir: Add src and dest constructors Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-21 12:21:10 -08:00
Jason Ekstrand	194f6235b3	nir: Add a nir_foreach_phi_src helper macro Reviewed-by: Connor Abbott <cwabbott02gmail.com>	2015-01-20 16:53:29 -08:00
Vinson Lee	10a4f1e77a	nir: s/malloc.h/stdlib.h/ Fix build error on Mac OS X. CC nir_to_ssa.lo nir_to_ssa.c:29:10: fatal error: 'malloc.h' file not found ^ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88478 Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2015-01-16 16:14:51 -08:00
Jason Ekstrand	bc6e57e019	nir/live_variables: Use a worklist This is a rework of the liveness algorithm using a worklist as suggested by Connor. Doing so reduces the number of times we walk over the instructions because we don't have to do an entire pointless walk over the instructions just to figure out it's time to stop. Also, the stuff after the last loop in the funciton will only ever get visited once. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 16:54:21 -08:00
Jason Ekstrand	4839d1aed1	nir: Add a worklist helper structure A worklist is a common concept in optimizations. This adds a structure that we can reuse for many different types of optimizations. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 16:54:21 -08:00
Brian Paul	0aaaa13ec9	nir: fix incorrect argument passed to validate_src() in validate_tex_instr() Silences a compiler warning. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 17:41:42 -07:00
Brian Paul	aa479a69d6	nir: silence compiler warning from visit_src() call v2: use proper argument Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 17:09:02 -07:00
Jason Ekstrand	153b8b3525	util/hash_set: Rework the API to know about hashing Previously, the set API required the user to do all of the hashing of keys as it passed them in. Since the hashing function is intrinsically tied to the comparison function, it makes sense for the hash set to know about it. Also, it makes for a somewhat clumsy API as the user is constantly calling hashing functions many of which have long names. This is especially bad when the standard call looks something like _mesa_set_add(ht, _mesa_pointer_hash(key), key); In the above case, there is no reason why the hash set shouldn't do the hashing for you. We leave the option for you to do your own hashing if it's more efficient, but it's no longer needed. Also, if you do do your own hashing, the hash set will assert that your hash matches what it expects out of the hashing function. This should make it harder to mess up your hashing. This is analygous to `94303a0750` where we did this for hash_table Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 13:21:27 -08:00
Jason Ekstrand	4c99e3ae78	util: Move main/set to util/hash_set Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 13:21:27 -08:00
Jason Ekstrand	8ed5305d28	hash_table: Rename insert_with_hash to insert_pre_hashed We already have search_pre_hashed. This makes the APIs match better. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-01-15 13:21:27 -08:00
Jason Ekstrand	0d05d1226e	nir/algebraic: Only replace an instruction once Without the break, it was possible that an instruction would match multiple expressions. If this happened, you could end up trying to replace it multiple times and get a segfault. This makes it so that, after a successful replacement, it moves on to the next instruction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	0f85310975	nir/vars_to_ssa: Use the copy lowering from lower_var_copies Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	d3636da902	nir: Add a pass for lowering copy instructions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	700ba5daaf	nir/vars_to_ssa: Refactor get_deref_node This refactor allows you to more easily get the deref node associated with a given variable. We then use that new functionality in the deref_may_be_aliased function instead of creating a 1-element deref chain. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	55b5058e69	nir: Rename lower_variables to lower_vars_to_ssa The original name wasn't particularly descriptive. This one indicates that it actually gives you SSA values as opposed to the old pass which lowered variables to registers. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	4aa6162f6e	nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src array This solves a number of problems. First is the ability to change the number of sources that a texture instruction has. Second, it solves the delema that may occur if a texture instruction has more than 4 sources. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	dcb1acdea0	nir/validate: Only build in debug mode Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:24 -08:00
Jason Ekstrand	347ab2bf24	nir/lower_variables: Improve documentation Additional description was added to a variety of places. Also, we no longer use the term "leaf" to describe fully-qualified direct derefs. Instead, we simply use the term "direct" or spell it out completely. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	8016fa39e1	nir/lower_variables: Use a for loop for get_deref_node Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	0c0ca8b6ae	nir: Use the actual FNV-1a hash for hashing derefs We also switch to using loops rather than recursion. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	e4115ca9d8	nir: Make intrinsic flags into an enum This should be much better for debugging as GDB will pick up on the fact that it's an enum and actually tell you what you're looking at instead of giving you some arbitrary hex value you have to go look up. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	ed13f4e716	nir: Use static inlines instead of macros for list getters This should make debugging a lot easier as GDB handles static inlines much better than macros. Also, static inlines are typesafe. Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	b95fae034f	nir/variable: Remove the constant_value field This was a left-over relic of GLSL IR that we aren't using for anything. If we ever want that value again, we can add it back, but NIR constant folding should be just as good as GLSL IR's if not better pretty soon, so I'm not worried about it. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	8599b30c67	nir: Add some documentation Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	ad9d0a9ea6	nir/lower_variables: Follow the Cytron paper more closely Previously, our variable renaming algorithm, while similar to the one in the Cytron paper, was not the same. While I'm pretty sure it was correct, it will be easier for readers of the code in the variable renaming pass if it follows more closely. This commit removes the automatic stack popping we were doing and replaces it with explicit popping like Cytron does. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	b1d114a48c	nir/print: Various cleanups recommended by Eric Cc: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	e2763339fe	nir/lower_variables: Add a bunch of comments and re-arrange a few things This commit seeks to make the lower_variables pass much more clear by adding a pile of comments and re-arranging a few things. There are no functional or algorithmic changes. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	40ca129ed5	nir: Rename parallel_copy_copy to parallel_copy_entry and add a foreach macro parallel_copy_copy was a silly name. Also, things were getting long and annoying, so I added a foreach macro. For historical reasons, several of the original iterations over parallel copy entries in from_ssa used the _safe variants of the loop. However, all of these no longer ever remove an entry so it's ok to make them all use the normal iterator. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	1b720c6ed8	nir/from_ssa: Clean up parallel copy handling and document it better Previously, we were doing a lazy creation of the parallel copy instructions. This is confusing, hard to get right, and involves some extra state tracking of the copies. This commit adds an extra walk over the basic blocks to add the block-end parallel copies up front. This should be much less confusing and, consequently, easier to get right. This commit also adds more comments about parallel copies to help explain what all is going on. As a consequence of these changes, we can now remove the at_end parameter from nir_parallel_copy_instr. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	de73d1e173	nir: Rename nir_block_following_if to nir_block_get_following_if The new name is a little longer but less confusing. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:23 -08:00
Jason Ekstrand	813316d150	nir/opcodes: Remove the per_component info field Originally, this field was intended for determining if the given instruction acted per-component or if it had mismatching source and destination sizes that would have to be interpreted specially. However, we can easily derive this from output_size == 0, so it's not really that useful. Also, the values we were setting in nir_opcodes.h for this field were completely bogus and it was never used. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	e2a8f9e5cc	nir/search: Use nir_op_infos to determine if an operation is commutative Prior to this commit, we had a big switch statement for this. Now it's baked into the opcode metadata so we can just use that. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	46f3e1ab50	nir/opcodes: Add algebraic properties metadata This commit adds some algebraic properties to the metadata of each opcode in NIR. In particular, you now know, just from the metadata, if a given opcode is commutative or associative. This will be useful for algebraic transformation passes that want to be able to match a + b as well as b + a in one go. v2: Make algebraic properties all caps. This was more consistent with the intrinsics flags and seems better for flags in general. Also, the enums are now declared with (1 << n) rather then hex values. v3: fmin and fmax technically aren't commutative or associative. Things get funny when one of the arguments is a NaN. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	2c7da78805	nir: Make load_const SSA-only As it was, we weren't ever using load_const in a non-SSA way. This allows us to substantially simplify the load_const instruction. If we ever need a non-SSA constant load, we can do a load_const and an imov. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	675ffdef30	nir: Make nir_ssa_undef_instr_create initialize the destination Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	5c16be1c52	nir/lower_system_values: Handle SSA destinations Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	821e75a160	nir/lower_atomics: Use/support SSA Previously, lower_atomics was non-SSA only. We assert-failed if the destination of an atomic operation intrinsic was an SSA def and we used temporary registers for computing offsets. This commit changes both of these behaviors. We now use SSA values for computing offsets (so we can optimize them) and we handle SSA destinations. We also move the pass to run before we go out of SSA on i965 as it now generates SSA values. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	8ddb03d56d	nir/live_variables: Use the new ssa_def iterator Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	28a3e164e2	nir: Use nir_foreach_ssa_def for setting up ssa destinations Before, we were using foreach_dest and switching on whether the destination was an SSA value. This works, except not all destinations are SSA values so we have to special-case ssa_undef instructions. Now that we have a foreach_ssa_def function, we can iterate over all of the register destinations in one pass and iterate over the SSA destinations in a second. This way, if we add other ssa-only instructions, we won't have to worry about adding them to the special case we have for ssa_undef. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	193fea9eb6	nir: Add a foreach_ssa_def function There are some functions whose destinations are SSA-only and so aren't a nir_dest. This provides a function that is capable of iterating over the SSA definitions defined by those functions. If you want registers, you should use the old iterator. v2: Kenneth Graunke <kenneth@whitecape.org>: - Fix nir_foreach_ssa_def's return value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	bc0735857f	nir/lower_variables: Use a real dominance DFS for variable renaming Previously, we were just iterating over the program "in order" which kind-of approximates a DFS, but not really. In particular, we got the following case wrong: loop { a = 3; if (foo) { a = 5; } else { break; } use(a); } where use(a) would get 3 instead of 5 because of premature popping of the SSA def stack. Now, since we do an actaul DFS, we should evaluate use(a) immediately after a = 5 and we should be ok. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:22 -08:00
Jason Ekstrand	dfb3abbaec	nir: Remove predication We stopped generating predicates in glsl_to_nir some time ago. Right now, it's all dead untested code that I'm not convinced always worked in the first place. If we decide we want them back, we can revert this patch. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	b3fd098e7d	nir: Make bcsel a fully vector operation Previously, the condition was a scalar that applied to all components simultaneously. As of this commit, the condition is a vector and each component is switched seperately. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	295faf9462	nir: Call nir_metadata_preserve more places Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	b6c81b3ff4	nir/metadata: Rename metadata_dirty to metadata_preserve nir_metadata_dirty was a terrible name because the parameter it takes is the metadata to be preserved. This is really confusing because it looks like it's doing the opposite of what it is actually doing. Now it's named sensibly. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	60ec60a600	nir: Rework the way samplers are lowered v2 Jason Ekstrand <jason.ekstrand@intel.com>: - Use the nir_tex_src_sampler_offset source type instead of the sampler_indirect thing that I cooked up before. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	4cdabcc0fa	nir/tex_instr_create: Initialize all 4 sources This helps a lot with things like lowering passes that may need to add sources. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	62ac0ee804	nir/tex_instr: Rename the indirect source type and add an array size In particular, we rename nir_tex_src_sampler_index to _sampler_offset and add a sampler_array_size field to nir_tex_instr. This way we can pass the size of sampler arrays through to backends even after removing the variable information and, with it, the type. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	534d145e5e	nir: Use a source for uniform buffer indices instead of an index In GLSL-to-NIR we were just setting the base index to 0 whenever there was an indirect so having it expressed as a sum makes no sense. Also, while a base offset may make sense for the memory location (first element in the array, etc.) it makes less sense for the actual uniform buffer index. This may change later, but it seems to make more sense for now. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	6a5604ca6a	nir: Constant fold array indirects Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	cd4b995254	nir: Make texture instruction names more consistent This commit renames nir_instr_as_texture to nir_instr_as_tex and renames nir_instr_type_texture to nir_instr_type_tex to be consistent with nir_tex_instr. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	d6fe35a418	nir: Remove the ffma peephole This is no longer needed because it's now part of the algebraic optimization pass Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:21 -08:00
Jason Ekstrand	f77f4c00ce	nir: Add a basic constant folding pass Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	d5410bd8f6	nir: Add an algebraic optimization pass This pass uses the previously built algebraic transformations framework and should act as an example for anyone else wanting to make an algebraic transformation pass for NIR. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	0e145a951e	nir: Add infastructure for generating algebraic transformation passes This commit builds on the nir_search.h infastructure by adding a bit of python code that makes it stupid easy to write an algebraic transformation pass. The nir_algebraic.py file contains four python classes that correspond directly to the datastructures in nir_search.c and allow you to easily generate the C code to represent them. Given a list of search-and-replace operations, it can then generate a function that applies those transformations to a shader. The transformations can be specified manually, or they can be specified using nested tuples. The nested tuples make a neat little language for specifying expression trees and search-and-replace operations in a very readable and easy-to-edit fasion. The generated code is also fairly efficient. Insteady of blindly calling nir_replace_instr with every single transformation and on every single instruction, it uses a switch statement on the instruction opcode to do a first-order culling and only calls nir_replace_instr if the opcode is known to match the first opcode in the search expression. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	0057dfd673	nir: Add an expression matching framework This framework provides a simple way to do simple search-and-replace operations on NIR code. The nir_search.h header provides four simple data structures for representing expressions: nir_value and four subtypes: nir_variable, nir_constant, and nir_expression. An expression tree can then be represented by nesting these data structures as needed. The nir_replace_instr function takes an instruction, an expression, and a value; if the instruction matches the expression, it is replaced with a new chain of instructions to generate the given replacement value. The framework keeps track of swizzles on sources and automatically generates the currect swizzles for the replacement value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	a94d1c2481	nir/glsl: Emit abs, neg, and sat operations instead of source modifiers Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	8edcd1de14	nir: Make the type casting operations static inline functions Previously, the casting operations were macros. While this is usually fine, the casting macro used the input parameter twice leading to strange behavior when you passed the result of another function into it. Since we know the source and destination types explicitly, we don't loose anything by making it a function. Also, this gives us a nice little macro for creating cast function that will hopefully prevent mistyping. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	919426631b	nir: Add a lowering pass for adding source modifiers where possible Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	1d83a8eb7a	nir: Add neg, abs, and sat opcodes Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:20:20 -08:00
Jason Ekstrand	a3ad7fdf33	nir: Add a helper for getting a constant value from an SSA source Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:03 -08:00
Jason Ekstrand	940ccc45ad	nir/glsl: Add support for gpu_shader5 interpolation instrinsics Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:03 -08:00
Jason Ekstrand	45bdcc257e	nir: Add gpu_shader5 interpolation intrinsics Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:03 -08:00
Jason Ekstrand	e3fa49c9e6	nir/validate: Validate intrinsic source/destination sizes Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:03 -08:00
Jason Ekstrand	27663dbe8e	nir: Vectorize intrinsics We used to have the number of components built into the intrinsic. This meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4 variants. This lead to piles of switch statements to generate the correct intrinsic names, and introspection to figure out the number of components. We can make things much nicer by allowing "vectorized" intrinsics. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:03 -08:00
Jason Ekstrand	d1d12efb36	nir: Remove the old variable lowering code Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:03 -08:00
Jason Ekstrand	faad82b4e7	nir/validate: Ensure that outputs are write-only and inputs are read-only Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	29e607e5cf	nir/glsl: Generate SSA NIR With this commit, the GLSL IR -> NIR pass generates NIR in more-or-less SSA form. It's SSA in the sense that it doesn't have any registers, but it isn't really useful SSA because it still has a pile of load/store intrinsics that we will need to get rid of. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	6962c332e5	nir: Add a pass to lower global variables to local variables Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	619b2e2499	nir: Add a pass for lowering input/output loads/stores Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	aff431293b	nir: Add a pass to lower local variables to registers Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	d477beab07	nir: Add a pass to lower local variable accesses to SSA values This pass analizes all of the load/store operations and, when a variable is never aliased (potentially used by an indirect operation), it is lowered directly to an SSA value. This pass translates to SSA directly and does not require any fixup by the original to-SSA pass. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	615ba5ad04	nir: Add a copy splitting pass Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	68778d52cd	nir: Automatically update SSA if uses Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	9318ce8c5a	nir/glsl: Don't allocate a state_slots array for 0 state slots Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	9d62df3800	nir: Validate that the sources of a phi have the same size as the destination Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	24249599b1	nir/copy_propagate: Don't cause size mismatches on phi node sources Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	6a52d2af2f	nir: Don't require a function in ssa_def_init Instead, we give SSA definitions a temporary index of 0xFFFFFFFF if the instruction does not have a block and a proper index when it actually gets added to the list. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	829aa98320	nir: Use an integer index for specifying structure fields Previously, we used a string name. It was nice for translating out of GLSL IR (which also does that) but cumbersome the rest of the time. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	4f8230e247	nir: Add a concept of a wildcard array dereference Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	b5143edaee	nir: Make array deref direct vs. indirect an enum Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:02 -08:00
Jason Ekstrand	8219ff1796	nir: Clean up nir_deref helper functions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	895eee505c	nir/lower_samplers: Use the nir_instr_rewrite_src function Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	cd01de0812	nir: Add a helper for rewriting an instruction source Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	5690c2b54c	nir/from_ssa: Don't lower constant SSA values to registers Backends want to be able to do special things with constant values such as put them into immediates or make decisions based on whether or not a value is constant. Before, constants always got lowered to a load_const into a register and then a register use. Now we leave constants as SSA values so backends can special-case them if they want. Since handling constant SSA values is trivial, this shouldn't be a problem for backends. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	6bdce55c44	nir: Add a basic CSE pass This pass is still fairly basic. It only handles ALU operations, constant loads, and phi nodes. No texture ops or intrinsics yet. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	20a5812606	nir: Add a fused multiply-add peephole	2015-01-15 07:19:01 -08:00
Jason Ekstrand	02ee1d22a1	nir: Validate that the SSA def and register indices are unique Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	13ec15bdbf	nir: Add a peephole select optimization Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	ef7ebb908e	nir/nir: Patch up phi predecessors in move_successors Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	02eef48343	nir/nir: Use safe iterators when iterating over the CFG Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	dc4e660dfa	nir/nir: Fix a bug in move_successors The unlink_blocks function moves successors around to make sure that, if there is a remaining successor, it is in the first successors slot and not the second. To fix this, we simply get both successors up front. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	10adf8fc85	nir: Differentiate between signed and unsigned versions of find_msb We also make the return types match GLSL. The GLSL spec specifies that findMSB and findLSB return a signed integer. Previously, nir had them return unsigned. This updates nir's behavior to match what GLSL expects. We also update the nir-to-fs generator to take the new instructions. While we're at it, we fix the case where the input to findMSB is zero. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	a76ccbfacf	nir/print: Don't reindex things These indices should now be reasonably stable/consistent. Redoing the indices in the print functions makes it harder to debug problems. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:01 -08:00
Jason Ekstrand	73522ec83f	nir: Validate all lists in the validator Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	943ddb9458	nir: Add a better out-of-SSA pass This commit rewrites the out-of-SSA pass to not be nearly as naieve. It's based on "Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency" by Boissinot et. al. It should be fairly close to state-of-the art. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	4f44120ff5	nir: Add a function for comparing two sources Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	366181d826	nir: Add a parallel copy instruction type Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	7de6b7fc3e	nir: Add a function for rewriting all the uses of a SSA def Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	946012f10f	nir: Automatically handle SSA uses when an instruction is inserted Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	fbc443ad56	nir: Add an initialization function for SSA definitions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	f86902e75d	nir: Add an SSA-based liveness analysis pass. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	c9a21c725d	nir: set reg_alloc and ssa_alloc when indexing registers and SSA values Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	d7e482d32c	nir: Add a function to detect if a block is immediately followed by an if Since we don't actually have an "if" instruction, this is a very common pattern when iterating over instructions. This adds a helper function for it to make things a little less painful. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	dfdf0c4673	nir: Add a foreach_block_reverse function Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	07556442a7	nir/foreach_block: Return false if the callback on the last block fails Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	49911cf4db	nir: Add a basic metadata management system Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	ea1eefe13f	nir/lower_variables_scalar: Silence a compiler warning Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	9d986d19d0	nir: Add a lower_vec_to_movs pass Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:19:00 -08:00
Jason Ekstrand	2943522d80	nir: Add a naieve from-SSA pass This pass is kind of stupidly implemented but it should be enough to get us up and going. We probably want something better that doesn't generate all of the redundant moves eventually. However, the i965 backend should be able to handle the movs, so I'm not too worried about it in the short term.	2015-01-15 07:18:59 -08:00
Jason Ekstrand	b600f1a381	nir: Add intrinsics to do alternate interpolation on inputs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Jason Ekstrand	4b4f90dbff	nir: Add NIR_TRUE and NIR_FALSE constants and use them for boolean immediates Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Jason Ekstrand	6e46c98ec1	nir/lower_atomics: Multiply array offsets by ATOMIC_COUNTER_SIZE Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Jason Ekstrand	d40b5ca5c5	nir/glsl: Add support for coarse and fine derivatives Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Jason Ekstrand	8c75a7ce59	nir: Add fine and coarse derivative opcodes Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Jason Ekstrand	458a6ce500	nir/glsl: Add support for saturate Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Jason Ekstrand	4bb81f6d02	Fix what I think are a few NIR typos Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-01-15 07:18:59 -08:00
Connor Abbott	7602385ac5	nir: add an SSA-based dead code elimination pass v2: Jason Ekstrand <jason.ekstrand@intel.com>: whitespace fixes	2015-01-15 07:18:58 -08:00
Connor Abbott	8b7cb7674c	nir: add an SSA-based copy propagation pass	2015-01-15 07:18:58 -08:00
Connor Abbott	4553887d4a	nir: add a pass to convert to SSA v2: Jason Ekstrand <jason.ekstrand@intel.com>: whitespace fixes	2015-01-15 07:18:58 -08:00
Connor Abbott	b559ee709b	nir: calculate dominance information	2015-01-15 07:18:58 -08:00
Connor Abbott	cff1deff72	nir: add an optimization to turn global registers into local registers After linking and inlining, this allows us to convert these registers into SSA values and optimise more code.	2015-01-15 07:18:58 -08:00
Connor Abbott	613bf6818a	nir: add a pass to lower atomics v2: Jason Ekstrand <jason.ekstrand@intel.com> whitespace fixes	2015-01-15 07:18:58 -08:00

... 5 6 7 8 9 ...

665 Commits