KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Timothy Arceri	203c8794a1	st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in gl_shader_program Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-19 15:45:46 +11:00
Eric Anholt	80786a67cf	nir: Avoid an extra NIR op in integer divide lowering. NIR bools are ~0 for true, so ((unsigned)a >> 31) != 0 -> ((int)a >> 31). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-11-16 19:45:01 -08:00
Timothy Arceri	6b82e957be	nir: add support for counting AoA uniforms in nir_shader_gather_info() Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2016-11-17 12:52:24 +11:00
Timothy Arceri	7372d2153a	nir: update nir_gather_info to only mark used array/matrix elements This is based on the code from the GLSL IR pass however unlike the GLSL IR pass it also supports arrays of arrays. As well as implementing the logic from the GLSL IR pass we add some additional intrinsic cases to catch more system values. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-11-11 09:17:07 +11:00
Kenneth Graunke	ad9d4a4f8d	nir: Generalize the "is per-vertex variable?" helpers and export them. I want this function for nir_gather_info(), and realized it's basically the same as the ones in nir_lower_io(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-11-11 09:17:07 +11:00
Dave Airlie	b16dff2d88	nir: add conditional discard optimisation (v4) This is ported from GLSL and converts if (cond) discard; into discard_if(cond); This removes a block, but also is needed by radv to workaround a bug in the LLVM backend. v2: handle if (a) discard_if(b) (nha) cleanup and drop pointless loop (Matt) make sure there are no dependent phis (Eric) v3: make sure only one instruction in the then block. v4: remove sneaky tabs, add cursor init (Eric) Reviewed-by: Eric Anholt <eric@anholt.net> Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-11-10 05:46:33 +10:00
Eric Anholt	49936364e4	nir: Make sure to set the texsrc type in nir drawpixels/bitmap lowering. We were leaving an undefined value since the ralloc zeroing changes. Fixes nir_validate() failures on vc4. v2: Fix the color-index case of drawpixels as well. Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)	2016-11-03 18:42:58 -07:00
Francisco Jerez	f3d387867f	nir: Flip gl_SamplePosition in nir_lower_wpos_ytransform(). Assuming the hardware is set up to use a screen coordinate system flipped vertically with respect to the GL's window coordinate system, the SYSTEM_VALUE_SAMPLE_POS vector will also be flipped vertically with respect to the value expected by the GL, so we need to give it the same treatment as gl_FragCoord. Fixes the following CTS tests on i965: ES31-CTS.functional.shaders.multisample_interpolation.interpolate_at_offset.at_sample_position.default_framebuffer ES31-CTS.functional.shaders.sample_variables.sample_pos.correctness.default_framebuffer when run with any multisample configuration, e.g. rgba8888d24s8ms4. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2016-11-03 11:46:44 -07:00
Timothy Arceri	903e5eae97	nir: fix nir_shader_clone() and nir_sweep() These were broken in `e1af20f18a` when the info field in nir_shader was turned into a pointer. Clone was copying the pointer rather than the data and nir_sweep was cleaning up shader_info rather than claiming it. Reviewed-by: Eric Anholt <eric@anholt.net>	2016-11-03 10:39:13 +11:00
Marek Olšák	52d2b28f7f	ralloc: use rzalloc where it's necessary No change in behavior. ralloc_size is equivalent to rzalloc_size. That will change though. Calls not switched to rzalloc_size: - ralloc_vasprintf - glsl_type::name allocation (it's filled with snprintf) - C++ classes where valgrind didn't show uninitialized values I switched most of non-glsl stuff to rzalloc without checking whether it's really needed. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-10-31 11:53:38 +01:00
Juha-Pekka Heikkila	3bf6c6c3ad	nir: zero allocated memory where needed Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-10-31 11:53:38 +01:00
Timothy Arceri	2e423ca147	nir: stop adjusting driver location for varying packing As of `59864e8e02` we just use the location assigned by the front-end and no longer need this for i965. Since there were some issues in the logic with assigning arrays the same driver location if they didn't start at the same location just remove it and let other drivers implement a solution if needed when they add ARB_enhanced_layouts support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-10-26 14:29:36 +11:00
Timothy Arceri	e1af20f18a	nir/i965/anv/radv/gallium: make shader info a pointer When restoring something from shader cache we won't have and don't want to create a nir_shader this change detaches the two. There are other advantages such as being able to reuse the shader info populated by GLSL IR. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-10-26 14:29:36 +11:00
Timothy Arceri	094fe3a959	nir: move nir_shader_info to a common compiler header This will allow use to stop copying values between structs and will also simplify handling handling these values in the shader cache. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-10-26 14:29:36 +11:00
Ian Romanick	4d35683d91	nir: Optimize integer division and modulus with 1 The previous power-of-two rules didn't catch idiv (because i965 doesn't set lower_idiv) and imod cases. The udiv and umod cases should have been caught, but I included them for orthogonality. This fixes silly code observed from compute shaders with local_size_[xy] = 1. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-10-19 14:25:10 -07:00
Jason Ekstrand	325b3fd668	nir: Fix the control flow tests for nir_loop_first_block changes Commit `2ed17d46de` changed nir_loop_first_cf_node and friends to return a nir_block instead of a nir_cf_node. This broke one of the NIR control flow tests. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98128	2016-10-06 15:48:30 -07:00
Jason Ekstrand	ae032e5ea6	nir: Remove some no longer needed asserts Now that the NIR casting functions have type assertions, we have a bunch of assertions that aren't needed anymore. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-10-06 09:16:39 -07:00
Jason Ekstrand	2ed17d46de	nir: Make nir_foo_first/last_cf_node return a block instead One of NIR's invariants is that control flow lists always start and end with blocks. There's no good reason why we should return a cf_node from these functions since we know that it's always a block. Making it a block lets us remove a bunch of code. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-10-06 09:16:37 -07:00
Jason Ekstrand	7a3bcadf4e	nir: Add asserts to the casting functions This makes calling nir_foo_as_bar a bit safer because we're no longer 100% trusting in the caller to ensure that it's safe. The caller still needs to do the right thing but this ensures that we catch invalid casts with an assert rather than by reading garbage data. The one downside is that we do use the casts a bit in nir_validate and it's not a validate_assert. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-10-06 09:16:24 -07:00
Kenneth Graunke	f7659e02c3	nir: Delete open coded type printing. glsl_print_type() prints arrays of arrays incorrectly. For example, a type with name float[3][7] would be printed as float[7][3]. (This is an array of length 3 containing arrays of 7 floats.) cdecl says that the type name is correct. glsl_print_type() doesn't really do anything above and beyond printing type->name, and glsl_print_struct() wasn't used at all. So, drop them. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-10-06 02:13:36 -07:00
Jason Ekstrand	28ab2570c8	nir: Use the correct infos structure for copying atomic sources Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Tested-by: Mark Janes <mark.a.janes@intel.com> Cc: "12.0" <mesa-dev@lists.freedestkop.org>	2016-10-05 13:04:54 -07:00
Ian Romanick	7cd0b3084c	nir/intrinsics: Add more atomic_counter ops v2: Delete some stray debug code notice by Iago. v3: Massive rebase on new ir_function_signature::intrinsic_id mechanism. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1] Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-04 16:53:32 -07:00
Ian Romanick	2c9a17ac79	nir/intrinsics: Include atomic_counter_ in the names used in macro invocations Otherwise grepping for where atomic_counter_inc and friends are defined is a very frustrating experience. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-10-04 16:53:32 -07:00
Jason Ekstrand	7697b4b98b	nir: Add a nop intrinsic This intrinsic has no destination, no sources, no variables, and can be eliminated. In other words, it does nothing and will always get deleted by dead code elimination. However, it does provide a quick-and-easy way to temporarily tag a particular location in a NIR shader. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-10-03 16:17:12 -07:00
Eric Anholt	1aa8a0392f	nir: Optimize out discard_ifs with a constant 0 argument. I found this in a shader that was doing an alpha test when alpha is fixed at 1.0. v2: Rebase on master (now the const value is "u32" not "u"). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2016-09-28 08:31:14 -07:00
Eric Anholt	36f0f03182	nir: Allow opt_peephole_sel to be more aggressive in flattening IFs. VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)	2016-09-22 11:10:21 +03:00
Dave Airlie	7bf76563e2	glsl: add subpass image type (v2) SPIR-V/Vulkan have a special image type for input attachments called the subpass type. It has different characteristics than other images types. The main one being it can only be an input image to fragment shaders and loads from it are relative to the frag coord. This adds support for it to the GLSL types. Unfortunately we've run out of space in the sampler dim in types, so we need to use another bit. v2: Fixup subpass input name (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-09-16 15:16:31 +10:00
Jason Ekstrand	ed65e6ef49	nir: Add a flag to lower_io to force "sample" interpolation Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-09-15 13:31:43 -07:00
Kenneth Graunke	2d8a3fa7ea	nir: Report progress from nir_lower_phis_to_scalar. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-09-14 12:01:51 -07:00
Kenneth Graunke	32630e211e	nir: Report progress from nir_lower_alu_to_scalar. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-09-14 12:01:49 -07:00
Kenneth Graunke	e6eed3533e	nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar(). This is mandatory. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-09-14 12:01:39 -07:00
Rob Clark	bff90aedf1	nir/lower_tex: fix typo with sample_dim Numeric 2 is actually GLSL_SAMPLER_DIM_3D, which I don't think is what was intended. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-09-14 13:45:32 -04:00
Rob Clark	1a8424ceba	nir: move tex_instr_remove_src I want to re-use this in a different pass, so move to nir.h Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-09-14 13:45:32 -04:00
Rob Clark	2c3f966276	nir/lower_tex: remove tex_instr_find_src() Turns out it already exists.. so don't duplicate it. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-09-14 13:45:32 -04:00
Jason Ekstrand	88a2a2e053	nir/gcm: Add global value numbering support Unlike the current CSE pass, global value numbering is capable of detecting common values even if one does not dominate the other. For instance, in you have if (...) { ssa_1 = ssa_0 + 7; /* use ssa_1 / } else { ssa_2 = ssa_0 + 7; / use ssa_2 / } Global value numbering doesn't care about dominance relationships so it figures out that ssa_1 and ssa_2 are the same and converts this to if (...) { ssa_1 = ssa_0 + 7; / use ssa_1 / } else { / use ssa_1 / } Obviously, we just broke SSA form which is bad. Global code motion, however, will repair this for us by turning this into ssa_1 = ssa_0 + 7; if (...) { / use ssa_1 / } else { / use ssa_1 */ } This intended to eventually mostly replace CSE. However, conventional CSE may still be useful because it's less of a scorched-earth approach and doesn't require GCM. This makes it a bit more appropriate for use as a clean-up in a late optimization run. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-09-08 20:53:01 -07:00
Jason Ekstrand	99ff4b3eb2	nir/gcm: Call nir_metadata_preserve Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-09-08 20:53:01 -07:00
Ilia Mirkin	8c8874eafb	nir: fix definition of pack_uvec2_to_uint Found by inspection. Untested beyond compilation. This also matches the logic used in nir_lower_alu_to_scalar. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2016-09-06 22:45:44 -04:00
Jason Ekstrand	821e366385	nir/tests: Update the CF tests to not assume fake edges In `aad4f1550`, we removed the concept of "fake" edges from NIR. Now, if you have a block at the end of an infinite loop it really has no predecessors. This updates the unit tests to match. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97587 Tested-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-09-04 20:44:59 -07:00
Timothy Arceri	1692228a38	nir: remove unused variable This was let over from `aad4f15506` Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-09-03 20:30:19 +10:00
Connor Abbott	356d101af3	nir: remove some fields from nir_shader_compiler_options I accidentally added these with `0dc4cab`. Oops!	2016-09-03 00:49:58 -04:00
Connor Abbott	c62b58c216	nir: fix bug with moves in nir_opt_remove_phis() In `144cbf8` ("nir: Make nir_opt_remove_phis see through moves."), Ken made nir_opt_remove_phis able to coalesce phi nodes whose sources are all moves with the same swizzle. However, he didn't add the logic necessary for handling the fact that the phi may now have multiple different sources, even though the sources point to the same thing. For example, if we had something like: if (...) a1 = b.yx; else a2 = b.yx; a = phi(a1, a2) ... = a then we would rewrite it to if (...) a1 = b.yx; else a2 = b.yx; ... = a1 by picking a random phi source, which in this case is invalid because the source doesn't dominate the phi. Instead, we need to change it to: if (...) a1 = b.yx; else a2 = b.yx; a3 = b.yx; ... = a3; Fixes 12 CTS tests: ES31-CTS.functional.tessellation.invariance.outer_edge_symmetry.quads* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-09-03 00:37:48 -04:00
Connor Abbott	0dc4cabee2	nir: add nir_after_phis() cursor helper And re-implement nir_after_cf_node_and_phis() using it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-09-03 00:37:48 -04:00
Jason Ekstrand	aad4f15506	nir: Remove fake edges in the CF handling code When NIR was first introduced, Connor added this fake-edge hack to work around issues related to unreachable blocks. Thanks to GLSL IR's jump lowering code, the only unreachable code you can have is a block after an infinite loop. With SPIR-V, we didn't have the jump lowering code so we could also end up with the "if (...) { break; } else { continue; }" case which generates an unreachable block after the if. Because of this, most of NIR had to be fixed up for handling unreachable blocks. The only remaining case of not handling unreachable blocks was specifically the block-after-infinite-loop case in dead_cf which was fixed by the previous commit. We can now delete the fake edge hack. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-09-02 11:24:09 -07:00
Jason Ekstrand	9a4d76e534	nir/dead_cf: Don't crash on unreachable after-loop blocks Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-09-02 11:24:09 -07:00
Eric Anholt	a99d70d105	nir: Update shader info when adding discards vc4 is about to start using the shader info field to set up discard handling. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-29 10:56:59 -07:00
Kenneth Graunke	93bfa1d7a2	nir: Change nir_shader_get_entrypoint to return an impl. Jason suggested adding an assert(function->impl) here. All callers of this function actually want ->impl, so I decided just to change the API. We also change the nir_lower_io_to_temporaries API here. All but one caller passed nir_shader_get_entrypoint(), and with the previous commit, it now uses a nir_function_impl internally. Folding this change in avoids the need to change it and change it back. v2: Fix one call I missed in ir3_compiler (caught by Eric). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-08-25 19:18:24 -07:00
Kenneth Graunke	8479b03c58	nir: Make nir_lower_io_to_temporaries store an impl internally. This changes the pass internals to work with a nir_function_impl directly rather than a nir_function. The next patch will change the API. v2: Rebase after framebuffer fetch landed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-08-25 19:18:11 -07:00
Francisco Jerez	aee3d8f0d9	nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries. This requires emitting a series of copies at the top of the program from each output variable to the corresponding temporary. The initial copy can be skipped for non-framebuffer fetch outputs whose initial value is undefined, and the final copy needs to be skipped for read-only outputs (i.e. gl_LastFragData), since it would be illegal to emit a store output intrinsic for it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:33:29 -07:00
Francisco Jerez	97ac3eba58	nir: Pass through fb_fetch_output and OutputsRead from GLSL IR. The NIR representation of framebuffer fetch is the same as the GLSL IR's until interface variables are lowered away, at which point it will be translated to load output intrinsics. The GLSL-to-NIR pass just needs to copy the bits over to the NIR program. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-25 18:33:29 -07:00
Jason Ekstrand	78715c7211	nir/phi_builder: Don't recurse in value_get_block_def In some programs, we can have very deep dominance trees and the recursion can cause us to risk stack overflows. Instead, we replace the recursion with a pair of loops, one at the start and one at the end. This is functionally equivalent to what we had before and it's actually a bit easier to read in the new form without the recursion. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-08-25 14:08:07 -07:00
Matt Turner	e53130cc27	nir: Walk blocks in source code order in lower_vars_to_ssa. Prior to this commit rename_variables_block() is recursively called, performing a depth-first traversal of the control flow graph. The function uses a non-trivial amount of stack space for local variables, which puts us in danger of smashing the stack, given a sufficiently deep dominance tree. XCOM: Enemy Within contains a shader with such a dominance tree (1574 nir_blocks in total, depth of at least 143). Jason tells me that he believes that any walk over the nir_blocks that respects dominance is sufficient (a DFS might have been necessary prior to the introduction of nir_phi_builder). In fact, the introduction of nir_phi_builder made the problem worse: rename_variables_block(), walks to the bottom of the dominance tree before calling nir_phi_builder_value_get_block_def() which walks back to the top of the dominance tree... In any case, this patch ensures we avoid that problem as well. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-08-25 13:45:39 -07:00
Timothy Arceri	8ee909ee42	nir: avoid segfault when ssa src not found Without this the following line will segfault and we don't get to see the results of the validate_assert() above. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2016-08-23 09:06:29 +10:00
Eric Anholt	3ef1853f7d	nir: Fix crash in nir_lower_drawpixels. Generally you'd see the gl_Color reference first and get some cursor set. However, in piglit draw-pixel-with-texture we're now seeing the TexCoord dereferenced first. Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-08-22 11:52:27 -07:00
Eric Anholt	0a8ff1681b	nir: Fix a comment typo in nir_lower_drawpixels. Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-08-22 11:52:26 -07:00
Eric Anholt	e8378fee0c	nir: Define system values for vc4's blending-lowering arguments. In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I was calling the "state uniforms" that I was putting into the NIR fighting with its other lowering passes. Instead of using magic uniform base numbers in the backend, follow the lead of load_user_clip_plane and just define system values for them. v2: Fix unintended change to channel_num, drop unspecified const_index value on blend_const_color_r_float. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-22 11:52:26 -07:00
Eric Anholt	9f1411d1ec	nir: Add an IO scalarizing pass using the intrinsic's first_component. vc4 wants to have per-scalar IO load/stores so that dead code elimination can happen on a more granular basis, which it has been doing in the backend using a multiplication by 4 of the intrinsic's driver_location. We can represent it properly in the NIR using the first_component field, though. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-19 13:11:36 -07:00
Eric Anholt	c35f979220	nir: Add nir_builder support for individual system value loads. The previous nir_load_system_value(b, nir_intrinsic_load_whatever), 0) was rather verbose, when system values should be easy to generate. The index is left out because only one system value had an index included in it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-19 13:11:36 -07:00
Eric Anholt	24728637e2	nir: Move the undef of nir_intrinsics.h macros to the .h. I wanted to include this from nir_builder as well, so it also needed the undefs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-19 13:11:36 -07:00
Eric Anholt	3f607f9e4f	nir: Use the system-value front face for twoside lowering. GLSL-to-NIR generates system value usage, and vc4/freedreno would both like the system value instead of the varying, so switch this pass over to it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-19 13:11:36 -07:00
Kenneth Graunke	7d0554f341	nir: Rely on the fact that bcsel takes a well formed boolean. According to Connor, it's safe to assume that the first operand of bcsel, as well as the operand of b2f and b2i, must be well formed booleans. https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html With the previous improvements to a@bool handling, this now has no change in shader-db instruction counts on Broadwell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-08-19 02:05:23 -07:00
Kenneth Graunke	3a9e6102b4	nir/search: Extend 'a@bool' to handle a couple of system values. load_front_face and load_helper_invocation produce booleans. On Broadwell: total instructions in shared programs: 11638956 -> 11638011 (-0.01%) instructions in affected programs: 115093 -> 114148 (-0.82%) helped: 628 HURT: 14 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-18 01:27:27 -07:00
Kenneth Graunke	e8543feba7	nir/search: Fold src_is_bool()/alu_instr_is_bool() into src_is_type(). I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave differently. Having the logic spread out over three functions makes it harder to decide where to put new logic, as well. So, combine them all. It's a bit simpler because there's now only one recursive function rather than a pair of mutually recursive functions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-18 01:27:15 -07:00
Kenneth Graunke	241870fe5b	nir/search: Introduce a src_is_type() helper for 'a@type' handling. Currently, 'a@type' can only match if 'a' is produced by an ALU instruction. This is rather limited - there are other cases we can easily detect which we should handle. Extending the code in-place would be fairly messy, so we introduce a new src_is_type() helper. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-18 01:26:47 -07:00
Kenneth Graunke	d8971128ac	nir/builder: Add bany_inequal and bany helpers. The first simply picks the bany_inequal[234] opcodes based on the SSA def's number of components. The latter implicitly compares with zero to achieve the same semantics of GLSL's any(). Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2016-08-18 00:46:04 -07:00
Ian Romanick	cceb50e14e	nir/algebraic: Optimize common array indexing sequence Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-17 10:52:38 +01:00
Eric Anholt	60f1b436b9	nir: Drop an unused program/hash_table.h include. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-08-10 12:27:22 -07:00
Timothy Arceri	8c4d9afb7e	nir: make use of nir_cf_list_extract() helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-08-09 13:21:30 +10:00
Matt Turner	b1d9c742e9	nir: Always print non-identity swizzles. Previously we would not print a swizzle on ssa_52 when only its .x component is used (as seen in the definition of ssa_53): vec3 ssa_52 = fadd ssa_51, ssa_51 vec1 ssa_53 = flog2 ssa_52 vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z But this makes the interpretation of the RHS of the definition difficult to understand and dependent on the size of the LHS. Just print swizzles when they are not the identity swizzle, so the previous example is now printed as: vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz vec1 ssa_53 = flog2 ssa_52.x vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-08 17:52:35 -07:00
Kenneth Graunke	144cbf8987	nir: Make nir_opt_remove_phis see through moves. I found a shader in Tales of Maj'Eyal that contains: if ssa_21 { block block_1: /* preds: block_0 / ...instructions that prevent the select peephole... vec1 32 ssa_23 = imov ssa_4 vec1 32 ssa_24 = imov ssa_4.y vec1 32 ssa_25 = imov ssa_4.z / succs: block_3 / } else { block block_2: / preds: block_0 / vec1 32 ssa_26 = imov ssa_4 vec1 32 ssa_27 = imov ssa_4.y vec1 32 ssa_28 = imov ssa_4.z / succs: block_3 / } block block_3: / preds: block_1 block_2 */ vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26 vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27 vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28 Here, copy propagation will bail because phis cannot perform swizzles, and CSE won't do anything because there is no dominance relationship between the imovs. By making nir_opt_remove_phis handle identical moves, we can eliminate the phis and rewrite everything to use ssa_4 directly, so all the moves become dead and get eliminated. I don't think we need to check "exact" - just the alu sources. Presumably phi sources should match in their exactness. On Broadwell: total instructions in shared programs: 11639872 -> 11638535 (-0.01%) instructions in affected programs: 134222 -> 132885 (-1.00%) helped: 338 HURT: 0 v2: Fix return value to be NULL, not false (caught by Iago). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-08-04 00:42:12 -07:00
Kenneth Graunke	7603b4d3a1	nir: Make nir_alu_srcs_equal non-static. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-08-04 00:41:07 -07:00
Kenneth Graunke	6aa730000f	nir: Turn imov/fmov of undef into undef. On Broadwell: total instructions in shared programs: 11640214 -> 11639872 (-0.00%) instructions in affected programs: 17744 -> 17402 (-1.93%) helped: 78 HURT: 0 total spills in shared programs: 2924 -> 2922 (-0.07%) spills in affected programs: 104 -> 102 (-1.92%) helped: 1 HURT: 0 total fills in shared programs: 4394 -> 4389 (-0.11%) fills in affected programs: 237 -> 232 (-2.11%) helped: 1 HURT: 0 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-08-04 00:40:59 -07:00
Eric Anholt	9128acfb57	nir: Allow opt_peephole_select to work on empty blocks. nir_opt_peephole_select has the job of removing IF statements with no side effects. However, if the IF statement's successor didn't have any instructions in it, we were skipping it, which occurred in mupen64 on vc4 with glsl_to_nir enabled: instructions in affected programs: 6134 -> 4120 (-32.83%) total uniforms in shared programs: 38268 -> 38219 (-0.13%) No changes on Haswell shader-db. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-08-03 10:25:08 -07:00
Timothy Arceri	6fb6201f71	nir: fix validation message Looks like a copy and paste error from `f752effa08` Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2016-08-03 09:31:57 +10:00
Matt Turner	d1f6f65697	glsl: Separate overlapping sentinel nodes in exec_list. I do appreciate the cleverness, but unfortunately it prevents a lot more cleverness in the form of additional compiler optimizations brought on by -fstrict-aliasing. No difference in OglBatch7 (n=20). Co-authored-by: Davin McCall <davmac@davmac.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-07-26 12:12:27 -07:00
Kenneth Graunke	0ba7288376	nir: Lower interp_var_at_* like a normal load_var for flat inputs. "flat centroid" and "flat sample" both just mean "flat", so we should ignore interpolateAtCentroid/Sample and just return the flat value. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2016-07-22 20:31:20 -07:00
Jason Ekstrand	d9156efc52	nir/lower_tex: Add support for lowering coordinate offsets On i965, we can't support coordinate offsets for texelFetch or rectangle textures. Previously, we were doing this with a GLSL pass but we need to do it in NIR if we want those workarounds for SPIR-V. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-dev@lists.freedesktop.org>	2016-07-22 16:48:53 -07:00
Jason Ekstrand	843fc8f3e7	nir/lower_tex: Add some helpers for working with tex sources Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-dev@lists.freedesktop.org>	2016-07-22 16:48:53 -07:00
Jason Ekstrand	09135cd55a	nir: Add a helper for determining the type of a texture source Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-dev@lists.freedesktop.org>	2016-07-22 16:27:35 -07:00
Kenneth Graunke	cf6f2d3ce7	nir: Add a base const_index to shared atomic intrinsics. Commit `52e75dcb8c` made nir_lower_io start using nir_intrinsic_set_base instead of writing const_index[0] directly. However, those intrinsics apparently don't /have/ a base, so this caused assert failures. However, the old code was happily setting non-existent const_index fields, so it was pretty bogus too. Jason pointed out that load_shared and store_shared have a base, and that the i965 driver uses that field. So presumably atomics should have one as well, so that loads/stores/atomics all refer to variables with consistent addressing. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2016-07-21 21:31:41 -07:00
Timothy Arceri	cba6657d8b	nir: add doubles component packing support This makes sure we give the correct driver location for doubles when using component packing. Specifically it handles packing a dvec3 with a double which is the only packing scenario allowed which spans across two locations. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2016-07-21 09:10:53 +10:00
Jason Ekstrand	9d503aea06	nir/inline: Constant-initialize local variables in the callee if needed Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-20 15:29:55 -07:00
Jason Ekstrand	dc9f2436c3	nir: Add a nir_deref_foreach_leaf helper Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-20 15:29:55 -07:00
Kenneth Graunke	707ca00fce	nir: Add nir_load_interpolated_input lowering code. Now nir_lower_io can optionally produce load_interpolated_input and load_barycentric_* intrinsics for fragment shader inputs. flat inputs continue using regular load_input. v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean passing (in response to review feedback from Chris Forbes). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:01:00 -07:00
Kenneth Graunke	2496462479	nir: Add new intrinsics for fragment shader input interpolation. Backends can normally handle shader inputs solely by looking at load_input intrinsics, and ignore the nir_variables in nir->inputs. One exception is fragment shader inputs. load_input doesn't capture the necessary interpolation information - flat, smooth, noperspective mode, and centroid, sample, or pixel for the location. This means that backends have to interpolate based on the nir_variables, then associate those with the load_input intrinsics (say, by storing a map of which variables are at which locations). With GL_ARB_enhanced_layouts, we're going to have multiple varyings packed into a single vec4 location. The intrinsics make this easy: simply load N components from location <loc, component>. However, working with variables and correlating the two is very awkward; we'd much rather have intrinsics capture all the necessary information. Fragment shader input interpolation typically works by producing a set of barycentric coordinates, then using those to do a linear interpolation between the values at the triangle's corners. We represent this by introducing five new load_barycentric_* intrinsics: - load_barycentric_pixel (ordinary variable) - load_barycentric_centroid (centroid qualified variable) - load_barycentric_sample (sample qualified variable) - load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample()) - load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset()) Each of these take the interpolation mode (smooth or noperspective only) as a const_index, and produce a vec2. The last two also take a sample or offset source. We then introduce a new load_interpolated_input intrinsic, which is like a normal load_input intrinsic, but with an additional barycentric coordinate source. The intention is that flat inputs will still use regular load_input intrinsics. This makes them distinguishable from normal inputs that need fancy interpolation, while also providing all the necessary data. This nicely unifies regular inputs and interpolateAt functions. Qualifiers and variables become irrelevant; there are just load_barycentric intrinsics that determine the interpolation. v2: Document the interp_mode const_index value, define a new BARYCENTRIC() helper rather than using SYSTEM_VALUE() for some of them (requested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-20 11:00:45 -07:00
Kenneth Graunke	f0f466214e	nir: Fix uninitialized use of 'replacement'. For intrinsics we don't care about, just skip to the next loop iteration and process the next instruction. We don't want to execute the rest of the code. This was a bug in commit `cdfc05ea6e`. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2016-07-19 17:34:59 -07:00
Ian Romanick	0b626d7524	nir/algebraic: Optimize fabs(u2f(x)) I noticed this when I tried to do frexp(float(some_unsigned)) in the ir_unop_find_lsb lowering pass. The code generated for frexp() uses fabs, and this resulted in an extra instruction. Ultimately I ended up not using frexp. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-07-19 12:19:30 -07:00
Kenneth Graunke	ac1181ffbe	compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_. Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.\.(c\|cpp\|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Dave Airlie <airlied@redhat.com>	2016-07-17 19:26:48 -07:00
Kenneth Graunke	aa6f60f844	nir: Use dest.ssa.num_components rather than intrin->num_components. I recently refactored this to share code between load and atomic lowering. loads used intrin->num_components, while atomics used intrin->dest.ssa.num_components. They should be equivalent, but Jason wanted me to use the latter. I missed applying his review. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-15 19:42:43 -07:00
Kenneth Graunke	da3d4a4c56	nir: Update outdated intrinsic const_index comments. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:10 -07:00
Kenneth Graunke	52e75dcb8c	nir: Use nir_intrinsic_set_base in atomic lowering. This is more readable and also offers assertions that protect against setting const_index fields on the wrong kind of intrinsic. Suggested by Jason. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:10 -07:00
Kenneth Graunke	50b9bb9421	nir: Split nir_lower_io's input/output/atomic handling into helpers. The original function was becoming a bit hard to read, with the details of creating and filling out load/store/atomic atomics all in one function. This patch makes helpers for creating each type of intrinsic, and also combines them with the *_op() helpers, as they're closely coupled and not too large. v2: Minor style nits from Jason. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:10 -07:00
Kenneth Graunke	e12e4af780	nir: Drop bogus nir_var_shader_in case in nir_lower_io's store_op(). This can't happen, the caller asserts that mode is shader_out or shared. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:09 -07:00
Kenneth Graunke	cdfc05ea6e	nir: Share destination rewriting and replacement code in IO lowering. Both loads and atomics had identical code to rewrite destinations, and all cases had the same two lines to replace instructions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:09 -07:00
Kenneth Graunke	349fe79c9b	nir: Share get_io_offset handling in nir_lower_io. The load/store/atomic cases all duplicated the get_io_offset code, with a few tiny differences: stores didn't bother checking for per-vertex inputs, because they can't be stored to, and atomics didn't check at all, since shared variables aren't per-vertex. However, it's harmless to check, and allows us to share more code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:09 -07:00
Kenneth Graunke	7171a9a87d	nir: Make a 'var' temporary in nir_lower_io. Less typing and word wrapping issues than intrin->variables[0]->var. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-15 17:17:09 -07:00
Eric Anholt	c93f6938d5	nir: Add optimization for (a \|\| True == True) This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering producing some constants that were getting compared. total instructions in shared programs: 112276 -> 112198 (-0.07%) instructions in affected programs: 2239 -> 2161 (-3.48%) total estimated cycles in shared programs: 283102 -> 283038 (-0.02%) estimated cycles in affected programs: 2365 -> 2301 (-2.71%) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-12 15:46:09 -07:00
Timothy Arceri	448adfbc67	nir: use the same driver location for packed varyings Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-07 10:26:43 +10:00
Timothy Arceri	0eea6b3297	nir: add new intrinsic field for storing component offset This offset is used for packing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-07 10:26:43 +10:00
Eric Anholt	d20b89e928	nir: Fix copy_prop_src when src is an indirect access on a reg. The intent was to continue down the indirect chain, not to call ourselves with unchanged input arguments. Found by code inspection, and comparison to copy_prop_alu_src(). We haven't hit this because callers of NIR's copy prop are doing so in SSA, before indirect variable dereferences have been lowered to registers. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-06-26 15:38:09 -07:00
Jason Ekstrand	81978c6feb	nir: Add a NIR_VALIDATE environment variable It defaults to true so default behavior doesn't change but it allows you to do NIR_VALIDATE=false if you don't want validation. Disabling validation can substantially speed up shader compiles so you frequently want to turn it off if compiler invariants aren't in question. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-06-25 07:34:20 -04:00
Giuseppe Bilotta	60a27ad122	Remove wrongly repeated words in comments Clean up misrepetitions ('if if', 'the the' etc) found throughout the comments. This has been done manually, after grepping case-insensitively for duplicate if, is, the, then, do, for, an, plus a few other typos corrected in fly-by v2: * proper commit message and non-joke title; * replace two 'as is' followed by 'is' to 'as-is'. v3: * 'a integer' => 'an integer' and similar (originally spotted by Jason Ekstrand, I fixed a few other similar ones while at it) Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-06-23 13:55:03 -07:00
Jason Ekstrand	bec07b7292	nir/alu_to_scalar: Respect the exact ALU operation qualifier Just setting builder->exact isn't sufficient because that only applies to instructions that are built with the builder but instructions created manually and only inserted using the builder are left alone. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-06-20 12:02:55 -07:00
Jason Ekstrand	202751fbb7	nir: Add a pass for propagating invariant decorations This pass is similar to propagate_invariance in the GLSL compiler. The real "output" of this pass is that any algebraic operations which are eventually consumed by an invariant variable get marked as "exact". Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-06-20 12:02:45 -07:00
Jason Ekstrand	68e308d853	nir/algebraic: Remove imprecise flog2 optimizations While mathematically correct, these two optimizations result in an expression with substantially lower precision than the original. For any positive finite floating-point value, log2(x) is well-defined and finite. More precisely, it is in the range [-150, 150] so any sum of logarithms log2(a) + log2(b) is also well-defined and finite as long as a and b are both positive and finite. However, if a and b are either very small or very large, their product may get flushed to infinity or zero causing log2(a * b) to be nowhere close to log2(a) + log2(b). This imprecision was causing incorrect rendering in Talos Principal because part of its HDR rendering process involves doing 8 texture operations, clamping the result to [0, 65000], taking a dot-product with a constant, and then taking the log2. This is done 6 or 8 times and summed to produce the final result which is written to a red texture. In cases where you have a region of the screen that is very dark, it can end up getting a result value of -inf which is not what is intended. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425 Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>	2016-06-20 11:56:57 -07:00
Jason Ekstrand	4d3b8318a7	nir/info: Get rid of uses_interp_var_at_offset We were using this briefly in the i965 driver to trigger recompiles but we haven't been using it since we switched to the NIR y-transform lowering pass. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-06-03 19:29:28 -07:00
Jason Ekstrand	45542f554c	nir/lower_indirect_derefs: Use the direct array deref for recursion This fixes about 100 of the new Vulkan CTS tests. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-06-03 19:29:28 -07:00
Rob Clark	dfbae7d64f	nir/algebraic: support for power-of-two optimizations Some optimizations, like converting integer multiply/divide into left/ right shifts, have additional constraints on the search expression. Like requiring that a variable is a constant power of two. Support these cases by allowing a fxn name to be appended to the search var expression (ie. "a#32(is_power_of_two)"). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-06-03 16:05:03 -04:00
Jordan Justen	8f48d23e0f	i965: Add nir channel_num system value v2: * simd16/32 fixes (curro) Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-06-01 19:29:02 -07:00
Jordan Justen	6f316c9d86	nir: Make lowering gl_LocalInvocationIndex optional Cc: "12.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-06-01 19:29:02 -07:00
Ilia Mirkin	ca135a2612	nir: allow sat on all float destination types With the introduction of fp64 and fp16 to nir, there are now a bunch of float types running around. A F1 2015 shader ends up with an i2f.sat operation, which has a nir_type_float32 destination. Allow sat on all the float destination types. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-06-01 10:44:40 -04:00
Jason Ekstrand	0482efdc93	nir/inline: Also rewrite param derefs for texture instructions Without this, samplers get left hanging as derefs to variables that don't actually exist. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-27 10:28:27 -07:00
Jason Ekstrand	2522180845	nir/inline: Break the guts of rewrite_param-derefs into a helper Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-27 10:28:27 -07:00
Jason Ekstrand	d19c406395	nir/inline: Make the rewrite_param_derefs helper work on instructions Now that we have the better nir_foreach_block macro, there's no reason to use the archaic block version for everything. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-27 10:28:27 -07:00
Jason Ekstrand	2fcba404f8	nir/inline: Don't use foreach_instr_safe unless we need to Suggested-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-27 10:28:27 -07:00
Jason Ekstrand	15e553daf0	nir: Make nir_const_value a union There's no good reason for it to be a struct of an anonymous union. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96221 Tested-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-26 16:03:44 -07:00
Jason Ekstrand	32210dea8e	compiler: Move glsl_to_nir to libglsl.la Right now libglsl.la depends on libnir.la so putting it in libnir.la adds a dependency on libglsl.la that goes the wrong direction. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>	2016-05-26 14:13:38 -07:00
Matt Turner	4a5e92ac70	nir: Strengthen assertion that 'out' is nonnull. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2016-05-25 12:44:34 -07:00
Kristian Høgsberg Kristensen	595224f714	mesa: Add .gitignore entries for make check binaries Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>	2016-05-25 09:41:44 -07:00
Kristian Høgsberg Kristensen	a41b57679f	nir: Add a lowering pass for YUV textures This lowers sampling from YUV textures to 1) one or more texture instructions to sample each plane and 2) color space conversion to RGB. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen	50c24c3ff3	nir: Handle NULL in nir_copy_deref() Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen	29921ee987	nir: Add new 'plane' texture source type This will be used to select the plane to sample from for planar textures. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-24 10:14:56 -07:00
Jason Ekstrand	66e137ecf1	nir/lower_samplers: Protect against sampler index overflow Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-23 19:12:34 -07:00
Rob Clark	5245d845b6	nir/validate: fix null deref coverity warning CID 1265536 (#1 of 2): Explicit null dereferenced (FORWARD_NULL)6. var_deref_op: Dereferencing null pointer parent. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-23 10:14:50 -04:00
Iago Toral Quiroga	38b719d624	nir: handle double-precision in fsign, fsat, fnot and frcp I think these are not strictly necessary since the floats in them should be automatically promoted to doubles when operated with double sources, but it makes things more explicit at least. Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-23 08:54:37 +02:00
Iago Toral Quiroga	3f73039ade	nir: handle double-precision in fabs, frsq and fsqrt Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-23 08:54:28 +02:00
Kenneth Graunke	f7eb95a526	nir: Fix crash in nir_lower_wpos_center(). Otherwise we rewrote the fadd to use itself, causing crashes in validation. Instead, start after the last use like we should. A brown paper bag fix. Fixes crashes in several Vulkan tests. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-20 16:33:24 -07:00
Kenneth Graunke	6e5d86c07a	nir: Add a simple nir_lower_wpos_center() pass for Vulkan drivers. nir_lower_wpos_ytransform() is great for OpenGL, which allows applications to choose whether their coordinate system's origin is upper left/lower left, and whether the pixel center should be on integer/half-integer boundaries. Vulkan, however, has much simpler requirements: the pixel center is always half-integer, and the origin is always upper left. No coordinate transform is needed - we just need to add <0.5, 0.5>. This means that we can avoid using (and setting up) a uniform. I thought about adding more options to nir_lower_wpos_ytransform(), but making a new pass that never even touched uniforms seemed simpler. v2: Use normal iterator rather than _safe variant (noticed by Matt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:30:00 -07:00
Kenneth Graunke	12ab7fc6ac	nir: Don't use ffma in nir_lower_wpos_ytransform(). ffma is an explicitly fused multiply add with higher precision. The optimizer will take care of promoting mul/add to fma when it's beneficial to do so. This fixes failures on Gen4-5 when using this pass, as those platforms don't actually implement fma(). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-20 14:29:04 -07:00
Kenneth Graunke	b8b1b1c34c	nir: Handle fddy_fine and fddy_coarse in nir_lower_wpos_ytransform. These also need flipping! Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:29:04 -07:00
Kenneth Graunke	4b7577fad8	nir: Make lower_wpos_ytransform_block a void function. The return value was used for the old nir_foreach_block callback system, but at this point it no longer means anything. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:29:04 -07:00
Kenneth Graunke	88ea960aa7	nir: Make nir_lower_wpos_ytransform() match FragCoord by location. gl_FragCoord is a shader input with location == VARYING_SLOT_POS. ARB_fragment_programs have an equivalent input at VARYING_SLOT_POS, but it isn't called gl_FragCoord. We do want to transform it. Matching by location guarantees we catch both. Fixes several fp tests on a branch which uses this pass on i965. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:29:04 -07:00
Kenneth Graunke	c9192fcbd2	nir: Add interp_var_at_offset flipping. The Y-offset needs flipping as well, similar to ddy. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:29:04 -07:00
Kenneth Graunke	287f099db1	nir: Fix fddy swizzles in nir_lower_wpos_ytransform(). The original value might have been swizzled. That's taken care of in the fmul source - we don't want to reswizzle it again. Fixes validation failures in glsl-derivs-varyings on a branch of mine which uses this pass in i965. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:29:04 -07:00
Kenneth Graunke	7fe9a19302	nir: Fix wpos_ytransform lowering state_slot swizzle. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2016-05-20 14:28:30 -07:00
Rob Clark	df361fc58c	nir/validate: assume() that hashtable entry exists At this point, it would require a logic error in nir_validate to not have already populated this hashtable entry, but coverity doesn't realize that: CID 1265547 (#1 of 1): Dereference null return value (NULL_RETURNS)3. dereference: Dereferencing a null pointer entry. CID 1271039 (#1 of 1): Dereference null return value (NULL_RETURNS)3. dereference: Dereferencing a null pointer entry. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-20 11:13:50 -04:00
Rob Clark	fcd6b3f42b	nir: coverity unitialized pointer read Not sure how coverity arrives at the conclusion that we can read comp[j] unitialized (around line 204), other than not being aware that ncomp is greater than 1 so it won't underflow in the 'if (tex->is_array)' case. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-20 11:13:50 -04:00
Rob Clark	53c48feae0	nir: coverity sign-extension fix Not 100% sure, but I think being an unsigned literal will help: CID 1358505 (#1 of 1): Unintended sign extension (SIGN_EXTENSION)sign_extension: Suspicious implicit sign extension: load1->def.num_components with type unsigned char (8 bits, unsigned) is promoted in load1->def.num_components * (load1->def.bit_size / 8) to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If load1->def.num_components * (load1->def.bit_size / 8) is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-20 11:13:50 -04:00
Rob Clark	bb993da795	nir/glsl_to_nir: quell some uninit_member coverity errors Signed-off-by: Rob Clark <robclark@freedesktop.org> Acked-by: Matt Turner <mattst88@gmail.com>	2016-05-20 11:13:50 -04:00
Rob Clark	e8beffb1b3	nir/validate: dump annotated shader with error msgs Log all the errors, and at the end dump the shader w/ error annotations to make it easier to see where the problems are. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-17 10:05:20 -04:00
Rob Clark	54ecfcc162	nir/validate: assert() -> validate_assert() Prep work for next patch. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-17 10:05:20 -04:00
Rob Clark	a0ef26c1c2	nir/print: add support for print annotations Caller can pass a hashtable mapping NIR object (currently instr or var, but I guess others could be added as needed) to annotation msg to print inline with the shader dump. As the annotation msg is printed, it is removed from the hashtable to give the caller a way to know about any unassociated msgs. This is used in the next patch, for nir_validate to try to associate error msgs to nir_print dump. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-05-17 10:05:20 -04:00
Juan A. Suarez Romero	80535873bb	nir: add double input bitmap This bitmap tracks which input attributes are double-precision. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-17 09:05:54 +02:00
Matt Turner	4191551262	nir: Mark nir_start_block()/nir_impl_last_block() with returns_nonnull. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-16 11:06:15 -07:00
Kenneth Graunke	6d65b0c6dc	nir: Add a nir->info.uses_interp_var_at_offset flag. I've added this to nir_gather_info(), but also to glsl_to_nir() as a temporary measure, since the i965 GL driver today doesn't use nir_gather_info() yet. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-15 23:50:28 -07:00
Rob Clark	f06343d6ea	nir: forward-declare 'struct gl_shader_program' Drop extra #include which is otherwise unneeded (and makes this header difficult to include from outside of src/mesa). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-15 17:25:48 -04:00
Rob Clark	79d6409a14	nir: return progress from lower_idiv With algebraic-opt support for lowering div to shift, the driver would like to be able to run this pass after the main opt-loop, and then conditionally re-run the opt-loop if this pass actually lowered some- thing. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-15 17:25:48 -04:00
Rob Clark	8b24f7b440	nir: fix comment typo about f2d/d2f Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-15 17:25:47 -04:00
Jason Ekstrand	f47faa4316	nir: Add texture opcodes and source types for multisample compression Intel hardware does a form of multisample compression that involves an auxilary surface called the MCS. When an MCS is in use, you have to first sample from the MCS with a special opcode and then pass the result of that operation into the next sample instrucion. Normally, we just do this ourselves in the back-end, but we want to expose that functionality to NIR so that we can use MCS values directly in NIR-based blorp. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-14 13:34:44 -07:00
Jason Ekstrand	87a41e862b	nir/builder: Add a helper for grabbing multiple channels from an ssa def This is similar to nir_channel except that it lets you grab more than one channel by providing a mask. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-14 13:34:40 -07:00
Jason Ekstrand	fc58cb543f	nir/builder: Generate the alu helpers directly in python There's no reason for having a macro and a python generator. We can easily just do the whole thing in python. This has the advantage that we are no longer definining ALU# macros which conflict with the ones in brw_fs_builder.h. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-14 13:34:38 -07:00
Jason Ekstrand	a2f50d87b6	nir: Add an info bit for uses_sample_qualifier Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-14 13:33:52 -07:00
Ian Romanick	8f05a0a4c0	nir: Remove empty visit_call_src and visit_load_const_src functions The guts were removed in `dfb3abba`. It has been almost exactly a year, so I dont think we're going to "decide we want [predication] back." Silences several "unused parameter" warnings: nir/nir.c: In function ‘visit_call_src’: nir/nir.c:1052:32: warning: unused parameter ‘instr’ [-Wunused-parameter] visit_call_src(nir_call_instr instr, nir_foreach_src_cb cb, void state) ^ nir/nir.c:1052:58: warning: unused parameter ‘cb’ [-Wunused-parameter] visit_call_src(nir_call_instr instr, nir_foreach_src_cb cb, void state) ^ nir/nir.c:1052:68: warning: unused parameter ‘state’ [-Wunused-parameter] visit_call_src(nir_call_instr instr, nir_foreach_src_cb cb, void state) ^ nir/nir.c: In function ‘visit_load_const_src’: nir/nir.c:1058:44: warning: unused parameter ‘instr’ [-Wunused-parameter] visit_load_const_src(nir_load_const_instr instr, nir_foreach_src_cb cb, ^ nir/nir.c:1058:70: warning: unused parameter ‘cb’ [-Wunused-parameter] visit_load_const_src(nir_load_const_instr instr, nir_foreach_src_cb cb, ^ nir/nir.c:1059:28: warning: unused parameter ‘state’ [-Wunused-parameter] void *state) ^ v2: Add some comments in nir_foreach_src suggested by Jason. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Connor Abbott <cwabbott0@gmail.com>	2016-05-12 16:47:14 -07:00
Ian Romanick	098166e1bc	nir: Silence unused parameter warnings These cases had the parameter removed: nir/nir_lower_vec_to_movs.c: In function ‘try_coalesce’: nir/nir_lower_vec_to_movs.c:124:66: warning: unused parameter ‘shader’ [-Wunused-parameter] try_coalesce(nir_alu_instr vec, unsigned start_idx, nir_shader shader) ^ nir/nir_lower_io.c: In function ‘load_op’: nir/nir_lower_io.c:147:32: warning: unused parameter ‘state’ [-Wunused-parameter] load_op(struct lower_io_state state, ^ These cases had the parameter (void) silenced because the parameter was necessary for an interface: nir/glsl_to_nir.cpp:1900:32: warning: unused parameter 'ir' [-Wunused-parameter] nir_visitor::visit(ir_barrier ir) ^ nir/nir.c: In function ‘remove_use_cb’: nir/nir.c:802:35: warning: unused parameter ‘state’ [-Wunused-parameter] remove_use_cb(nir_src src, void state) ^ nir/nir.c: In function ‘remove_def_cb’: nir/nir.c:811:37: warning: unused parameter ‘state’ [-Wunused-parameter] remove_def_cb(nir_dest dest, void state) ^ Number of total warnings in my build reduced from 2543 to 2538 (reduction of 5). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-12 16:46:41 -07:00
Rob Clark	9d3cc80b75	nir: glsl_get_bit_size() should take glsl_type It's what all the call-sites once, so gets rid of a bunch of inlined glsl_get_base_type() at the call-sites. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-12 13:39:40 -04:00
Jason Ekstrand	1b72c31e1f	nir/algebraic: Separate ffma lowering from fusing The i965 driver has its own pass for fusing mul+add combinations that's much smarter than what nir_opt_algebraic can do so we don't want to get the nir_opt_algebraic one just because we didn't set lower_ffma. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-11 11:44:35 -07:00
Rob Clark	dfbabc6bad	nir/lower-io: add support for lowering inputs Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-11 12:20:11 -04:00
Rob Clark	595f9d5476	nir/lower-io: split out some helper fxns Prep work to reduce the noise in the next patch. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-11 12:20:11 -04:00
Rob Clark	b085016f94	nir: rename lower_outputs_to_temporaries -> lower_io_to_temporaries Since it will gain support to lower inputs, give it a more generic name. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-11 12:20:11 -04:00
Rob Clark	47fcef9a20	nir: move callsite of lower_outputs_to_temporaries Going to convert this pass to parameterized lower_io_to_temporaries, and we want the user to be able to specify whether to lower outputs or inputs or both. The restriction of running this pass before validate to avoid output reads no longer applies. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-11 12:20:11 -04:00
Rob Clark	5261947260	nir: lower-io-types pass A pass to lower complex (struct/array/mat) inputs/outputs to primitive types. This allows, for example, linking that removes unused components of a larger type which is not indirectly accessed. In the near term, it is needed for gallium (mesa/st) support for NIR, since only used components of a type are assigned VBO slots, and we otherwise have no way to represent that to the driver backend. But it should be useful for doing shader linking in NIR. v2: use glsl_count_attribute_slots() rather than passing a type_size fxn pointer Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-11 12:20:11 -04:00
Rob Clark	b10cc24519	nir: passthrough-edgeflags support Handled by tgsi_emulate for glsl->tgsi case. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-05-11 12:20:11 -04:00
Rob Clark	3a939d034e	nir: add lowering pass for glBitmap Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-05-11 12:20:11 -04:00
Rob Clark	12c18ce476	nir: add lowering pass for glDrawPixels Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-05-11 12:20:11 -04:00
Rob Clark	b26645a00f	nir: add lowering pass for y-transform Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2016-05-11 12:20:11 -04:00
Jose Fonseca	94e8653a3b	Revert "nir: Try to warn when C99 extensions are used in nir headers." This reverts commit `99474dc29b`. -Wpedantic is too verbose, even when applied to just a few includes. We'll just have to deal with the issues as they come. Reviewed-by: Brian Paul <brianp@vmware.com>	2016-05-10 03:29:24 -07:00
Eduardo Lima Mitev	60a5d02416	nir/print: Print memory qualifiers in a variable declaration Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-10 06:22:05 +02:00
Rob Clark	f096096b77	nir/search: fix typo Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-09 12:46:24 -04:00
Jose Fonseca	8ae78f7d28	nir: Remove spurious return from void function. Left over from `450c061362`. Trivial. Built locally with clang and gcc. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95296	2016-05-06 12:03:34 +01:00
Connor Abbott	4fab8dd5ea	nir: remove now-unused nir_foreach_block*_call() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:42 -07:00
Jason Ekstrand	31fc4a2528	nir/lower_double_ops: fixup for new nir_foreach_block() Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:41 -07:00
Jason Ekstrand	450c061362	nir/lower_double_pack: fixup for new nir_foreach_block() Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:41 -07:00
Jason Ekstrand	8c807cc2a6	nir/gather_info: fixup for new foreach_block() Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:41 -07:00
Connor Abbott	331b9f73a2	nir/lower_two_sided_color: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:41 -07:00
Connor Abbott	d40fbbc27e	nir/lower_tex: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:41 -07:00
Connor Abbott	8a7fe634d2	nir/lower_outputs_to_temporaries: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-05 16:19:41 -07:00
Kenneth Graunke	bc0062c54a	nir: Optimize out stores of undefs. There are a couple of cycle count changes in shader-db, but it's basically a wash. However, with the Broadwell scalar TCS backend enabled, many Shadow of Mordor shaders benefit from this patch. Because we don't batch up output writes for TCS, vec4 outputs might not have all components defined. Many output writes have a value of undef, which is useless. With scalar TCS, stats for tessellation shaders on Broadwell: total instructions in shared programs: 1283000 -> 1280444 (-0.20%) instructions in affected programs: 34302 -> 31746 (-7.45%) helped: 71 HURT: 0 total cycles in shared programs: 10798768 -> 10780682 (-0.17%) cycles in affected programs: 158004 -> 139918 (-11.45%) helped: 71 HURT: 0 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-05 14:24:00 -07:00
Kenneth Graunke	c7a8b32700	nir: Replace vecN(undef, undef, ...) with a single undef. shader-db statistics on Broadwell: total instructions in shared programs: 8963409 -> 8962455 (-0.01%) instructions in affected programs: 60858 -> 59904 (-1.57%) helped: 318 HURT: 0 total cycles in shared programs: 71408022 -> 71406276 (-0.00%) cycles in affected programs: 398416 -> 396670 (-0.44%) helped: 199 HURT: 51 GAINED: 1 The only shaders affected were in Dota 2 Reborn. It also sets up for the next optimization. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-05 14:24:00 -07:00
Kenneth Graunke	49ea7454a1	nir: Rename opt_undef_alu to opt_undef_csel; update comments. This better reflects what it does. I plan to add other ALU optimizations as well, so the old name would be confusing. In preparation for that, also move the file comments about csels above the opt_undef_csel function, and delete the ones about there not being other optimizations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-05-05 14:24:00 -07:00
Thomas Hindoe Paaboel Andersen	8698194313	nir: fix assert for wildcard pairs The assert was null checking dest_arr_parent twice. The intention seems to be to check both dest_ and src_. Added in `d3636da9` Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2016-05-05 09:33:02 +02:00
Samuel Iglesias Gonsálvez	2ab2d2e588	nir: Separate 32 and 64-bit fmod lowering Split 32-bit and 64-bit fmod lowering as the drivers might need to lower them separately inside NIR depending on the HW support. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-05-04 08:07:49 +02:00
Samuel Iglesias Gonsálvez	b902377a56	nir/lower_double_ops: lower mod() There are rounding errors with the division in i965 that affect the mod(x,y) result when x = N * y. Instead of returning '0' it was returning 'y'. This lowering pass fixes those cases. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-05-04 08:07:49 +02:00
Rob Clark	dcf8c4425a	nir: make lower_clamp_color pass work after lower i/o Kinda important to work with tgsi_to_nir, which generates nir which already has i/o lowered. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-02 14:25:38 -04:00
Thomas Hindoe Paaboel Andersen	cbcd7b60f5	nir/lower_double_ops: fix indentation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-30 12:16:32 -07:00
Thomas Hindoe Paaboel Andersen	21424e019d	nir/opt_dead_cf: fix indentation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-30 12:16:29 -07:00
Thomas Hindoe Paaboel Andersen	6935726197	nir/opt_dead_cf: correction of side effect check Parenthesis are needed here as ! takes precedence over the &. The check had the opposite effect than intended. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-30 12:16:22 -07:00
Rob Clark	64abf6d404	nir: clamp-color-output support Handled by tgsi_emulate for glsl->tgsi case. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2016-04-30 14:56:19 -04:00
Jason Ekstrand	6d4a426745	nir/algebraic: Support lowering for both 64 and 32-bit ldexp Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-04-28 21:36:52 -07:00
Jason Ekstrand	f0af5b87ec	nir/opcodes: Make ldexp take an explicitly 32-bit int There is no sense in having the double version of ldexp take a 64-bit integer. Instead, let's just take a 32-bit int all the time. This also matches what GLSL does where both variants of ldexp take a regular integer for the exponent argument. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-04-28 21:36:52 -07:00
Jason Ekstrand	bee40dd730	nir/opcodes: Simplify the expressions for [un]pack_double The new expressions are more explicit in terms of where the bits go so it's a little easier to tell what's going on. This is the way GLSL specifies things so it's a bit easier to verify too. It also has the benifit that the new expressions easily vectorize so we can constant-fold vector forms of the _split versions correctly. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2016-04-28 21:36:52 -07:00
Jason Ekstrand	70f89dd75e	nir: Switch the arguments to nir_foreach_def This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_def($[^,]$,\s$[^,]*$)/nir_foreach_def(\2, \1)/ Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-04-28 15:54:48 -07:00
Jason Ekstrand	5015260a05	nir: Switch the arguments to nir_foreach_use and friends This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_use($[^,]$,\s$[^,]*$)/nir_foreach_use(\2, \1)/ and similar expressions for nir_foreach_use_safe, etc. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-04-28 15:54:48 -07:00
Jason Ekstrand	9464d8c498	nir: Switch the arguments to nir_foreach_function This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_function($[^,]$,\s$[^,]*$)/nir_foreach_function(\2, \1)/ Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-04-28 15:54:48 -07:00
Jason Ekstrand	e63766fb4b	nir: Switch the arguments to nir_foreach_parallel_copy_entry This matches the "foreach x in container" pattern found in many other programming languages. Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-04-28 15:54:48 -07:00
Jason Ekstrand	8564916d01	nir: Switch the arguments to nir_foreach_phi_src This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_phi_src($[^,]$,\s$[^,]*$)/nir_foreach_phi_src(\2, \1)/ and a similar expression for nir_foreach_phi_src_safe. Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2016-04-28 15:54:48 -07:00
Jason Ekstrand	707e72f13b	nir: Switch the arguments to nir_foreach_instr This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_instr($[^,]$,\s$[^,]*$)/nir_foreach_instr(\2, \1)/ and similar expressions for nir_foreach_instr_safe etc. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-04-28 15:54:48 -07:00
Connor Abbott	3a8688fb41	nir/algebraic: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 15:52:17 -07:00
Connor Abbott	1f8c100614	nir/validate: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 15:52:17 -07:00
Connor Abbott	a471c161b1	nir/nir_worklist: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 15:52:17 -07:00
Connor Abbott	db35177772	nir/remove_dead_variables: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 15:52:17 -07:00
Connor Abbott	b3aaae398e	nir/split_var_copies: fixup for new foreach_block() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-04-28 15:52:17 -07:00

... 2 3 4 5 6 ...

608 Commits