Commit Graph

665 Commits

Author SHA1 Message Date
Iago Toral Quiroga 9bb7d9ecf8 nir: Implement __intrinsic_store_ssbo
v2 (Connor):
 - Make the STORE() macro take arguments for the extra sources (and their
   size) and any extra indices required.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:22 +02:00
Samuel Iglesias Gonsalvez 003ce30e36 nir: Implement ir_unop_get_buffer_size
This is how backends provide the buffer size required to compute
the size of unsized arrays in the previous patch

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-25 08:39:21 +02:00
Kenneth Graunke 542d40d698 nir: Add new GS intrinsics that maintain a count of emitted vertices.
This patch also introduces a lowering pass to convert the simple GS
intrinsics to the new ones.  See the comments above that for the
rationale behind the new intrinsics.

This should be useful for i965; it's a generic enough mechanism that I
could see other drivers potentially using it as well, so I don't feel
too bad about putting it in the generic code.

v2:
- Use nir_after_block_before_jump for the cursor (caught by Jason
  Ekstrand - I'd mistakenly used nir_after_block when rebasing this
  code onto the new NIR control flow API).
- Remove the old emit_vertex intrinsic at the end, rather than in
  the middle (requested by Jason).
- Use state->... directly rather than locals (requested by Jason).
- Report progress from nir_lower_gs_intrinsics() (requested by me).
- Remove "Authors:" section from file comment (requested by
  Michael Schellenberger Costa).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke 0a040975ec nir: Add unit tests for control flow graphs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke fbaa1b19d7 nir/cf: Fix dominance metadata in the dead control flow pass.
The NIR control flow modification API churns the block structure,
splitting blocks, stitching them back together, and so on.  Preserving
information about block dominance is hard (and probably not worthwhile).

This patch makes nir_cf_extract() throw away all metadata, like we do
when adding/removing jumps.

We then make the dead control flow pass compute dominance information
right before it uses it.  This is necessary because earlier work by the
pass may have invalidated it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke 6560838703 nir/cf: Fix unlink_block_successors to actually unlink the second one.
Calling unlink_blocks(block, block->successors[0]) will successfully
unlink the first successor, but then will shift block->successors[1]
down to block->successor[0].  So the successors[1] != NULL check will
always fail.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 11:00:00 -07:00
Kenneth Graunke 024e5ec977 nir/cf: Alter block successors before adding a fake link.
Consider the case of "while (...) { break }".  Or in NIR:

        block block_0 (0x7ab640):
        ...
        /* succs: block_1 */
        loop {
                block block_1:
                /* preds: block_0 */
                break
                /* succs: block_2 */
        }
        block block_2:

Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
Unfortunately, it would mangle the predecessors and successors.

Here, block_2->predecessors->entries == 1, so we would create a fake
link, setting block_1->successors[1] = block_2, and adding block_1 to
block_2's predecessor set.  This is illegal: a block cannot specify the
same successor twice.  In particular, adding the predecessor would have
no effect, as it was already present in the set.

We'd then call unlink_block_successors(), which would delete the fake
link and remove block_1 from block_2's predecessor set.  It would then
delete successors[0], and attempt to remove block_1 from block_2's
predecessor set a second time...except that it wouldn't be present,
triggering an assertion failure.

The fix appears to be simple: simply unlink the block's successors and
recreate them to point at the correct blocks first.  Then, add the fake
link.  In the above example, removing the break would cause block_1 to
have itself as a successor (as it becomes an infinite loop), so adding
the fake link won't cause a duplicate successor.

v2: Add comments (requested by Connor Abbott) and fix commit message.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke 0991b2eb35 nir/cf: Conditionally do block_add_normal_succs() in unlink_jump();
There is a bug where we mess up predecessors/successors due to the
ordering of unlinking/recreating edges/adding fake edges.  In order to
fix that, I need everything in one routine.

However, calling block_add_normal_succs() isn't safe from
cleanup_cf_node() - it would crash trying to insert phi undefs.
So unfortunately I need to add a parameter.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke 9674c76c0e nir/cf: Don't break outer-block successors in split_block_beginning().
Consider the following NIR:

   block block_0;
   /* succs: block_1 block_2 */
   if (...) {
      block block_1;
      ...
   } else {
      block block_2;
   }

Calling split_block_beginning() on block_1 would break block_0's
successors:  link_block() sets both successors of a block, so calling
link_block(block_0, new_block, NULL) would throw away the second
successor, leaving only /* succ: new_block */.  This is invalid: the
block before an if statement must have two successors.

Changing the call to link_block(pred, new_block, pred->successors[0])
would correctly leave both successors in place, but because unlink_block
may shift successor[1] to successor[0], it may not preserve the original
order.  NIR maintains a convention that successor[0] must point to the
"then" block, while successor[1] points to the "else" block, so we need
to take care to preserve this ordering.

This patch creates a new function that swaps out one successor for
another, preserving the ordering.  It then uses this to fix the issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke e2637db618 nir/cf: Make a helper function for removing a predecessor.
I need to do this in a second place, and I'd rather make a helper
function than cut and paste the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Kenneth Graunke 6a67ede6b3 nir: Validate that a block doesn't have two identical successors.
This is invalid, and causes disasters if we try to unlink successors:
removing the first will work, but removing the second copy will fail
because the block isn't in the successor's predecessor set any longer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-23 10:59:59 -07:00
Jason Ekstrand 8dcbca5957 nir/lower_vec_to_movs: Don't emit unneeded movs
It's possible that, if a vecN operation is involved in a phi node, that we
could end up moving from a register to itself.  If swizzling is involved,
we need to emit the move but.  However, if there is no swizzling, then the
mov is a no-op and we might as well not bother emitting it.

Shader-db results on Haswell:

   total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
   instructions in affected programs:     184780 -> 181802 (-1.61%)
   helped:                                838
   HURT:                                  0

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 10:12:39 -07:00
Jason Ekstrand 65e80ce5b5 nir/lower_vec_to_movs: Properly handle source modifiers on vecN ops
I don't know of any piglit tests that are currently broken.  However, there
is nothing stopping a vecN instruction from getting source modifiers and
lower_vec_to_movs is run after we lower to source modifiers.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-23 10:12:39 -07:00
Jason Ekstrand 999ff3c77d nir/lower_alu_to_scalar: Add support for nir_op_fdph
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand e5a9346d00 nir: Add fdph and fdph_replicated opcodes
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand 0f9bf64770 nir/lower_alu_to_scalar: Return after lower_reduction
We don't use any of the code after the switch anyway.  Since we check for
num_components == 1 and early-return, it doesn't get executed so
everything's ok.  However, it makes it much clearer what's going on if we
simply do an early return.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Jason Ekstrand 2b79db2c02 nir/lower_alu_to_scalar: Use the builder
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-22 20:37:35 -07:00
Kenneth Graunke 5cede90f62 nir: Report progress from nir_normalize_cubemap_coords().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:54:34 -07:00
Kenneth Graunke d7ffd90ecb nir: Add braces around multi-line loop.
This was correct but not our usual style.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:47:01 -07:00
Kenneth Graunke 0a1adaf11d nir: Report progress from nir_lower_system_values().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:47:00 -07:00
Kenneth Graunke dc18b9357b nir: Report progress from nir_split_var_copies().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:59 -07:00
Kenneth Graunke cfae0f8a3a nir: Report progress from nir_lower_locals_to_regs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:57 -07:00
Kenneth Graunke 1adde5b87e nir: Report progress from nir_remove_dead_variables().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:55 -07:00
Jason Ekstrand 9f5e7ae9d8 nir: Report progress from lower_vec_to_movs().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:54 -07:00
Kenneth Graunke 967a5ddb88 nir: Report progress from nir_lower_globals_vars_to_local().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-21 13:46:45 -07:00
Jason Ekstrand 46362db4a6 nir/builder: Don't use designated initializers
Designated initializers are not allowed in C++ (not even C++11).  Since
nir_lower_samplers is now using nir_builder, and nir_lower_samplers is in
C++, this breaks the build on some compilers.  Aparently, GCC 5 allows it
in some limited extent because mesa still builds on my system without this
patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92052
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 10:41:43 -07:00
Jason Ekstrand d513388c8a nir: Move system value -> intrinsic mapping into nir.c
This way they're right next to the map going the other direction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-21 09:49:40 -07:00
Emil Velikov de7ffdb383 nir: rename nir_lower_samplers.c{pp,}
With the only C++ function having its own wrapper we can 'demote' this
file to a normal C one. This allows us to get rid of extern C { #include
<foo.h> } 'hacks'. Plus some of the headers may use C99 initializers,
which are not supported by the ISO standard.

This may cause build issue on incremental builds. If so run the
following:

sed -i -e 's|samplers\.cpp|samplers.c|' src/glsl/nir/.deps/nir_lower_samplers.Plo

Fixes: ef8eebc6ad5(nir: support indirect indexing samplers in struct arrays)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Gottfried Haider <gottfried.haider@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:02:06 +01:00
Emil Velikov d130cda453 nir: add C wrapper around glsl_type::record_location_offset
This will allow us to convert nir_lower_sampler.cpp to C.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:01:56 +01:00
Emil Velikov bdb1faf44e nir: move stdio.h inclusion before extern C
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-21 17:01:32 +01:00
Rob Clark b65f91dd32 nir/print: fix coverity error
Not something actually hit in real life (now state is never non-null,
but only case state->syms is null is if nir_print_instr() path).  But it
was something I overlooked the first time, so might as well fix it.

    *** CID 1324642:  Null pointer dereferences  (REVERSE_INULL)
    /src/glsl/nir/nir_print.c: 299 in print_var_decl()
    293
    294           fprintf(fp, " (%s, %u)", loc, var->data.driver_location);
    295        }
    296
    297        fprintf(fp, "\n");
    298
    >>>     CID 1324642:  Null pointer dereferences  (REVERSE_INULL)
    >>>     Null-checking "state" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
    299        if (state) {
    300           _mesa_set_add(state->syms, name);
    301           _mesa_hash_table_insert(state->ht, var, name);
    302        }
    303     }
    304

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-20 14:04:06 -04:00
Rob Clark e13ed3ffb4 nir: add two-sided-color lowering pass
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-18 21:07:50 -04:00
Rob Clark e4dfcdcbec nir/build: add nir_vec() helper
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-18 21:07:50 -04:00
Rob Clark 3745c38425 nir/lower_tex: add support to clamp texture coords
Some hardware needs to clamp texture coordinates to [0.0, 1.0] in the
shader to emulate GL_CLAMP.  This is added to lower_tex_proj since, in
the case of projected coords, the clamping needs to happen *after*
projection.

v2: comments/suggestions from Ilia and Eric, use txs to get texture size
and clamp RECT textures to their dimensions rather than [0.0, 1.0] to
avoid having to lower RECT textures to 2D.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark 1ce8060c25 nir/lower_tex: support for lowering RECT textures
v2: comments/suggestions from Ilia and Eric, split out get_texture_size()
helper so we can use it in the next commit for clamping RECT textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark faf5f174dd nir/lower_tex: support projector lowering per sampler type
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types.  In this case we want more fine grained control over
which texture projectors get lowered.

v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark f83ba7bc41 nir/lower_tex: split out project_src() helper
Split this out to reduce noise in later patches.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark d9b9ff76f1 nir: rename nir_lower_tex_projector
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-18 21:07:49 -04:00
Rob Clark 2e4ab489b5 nir/builder: fix c++11 compiler warning
Fixes:

   In file included from nir/nir_lower_samplers.cpp:27:0:
   nir/nir_builder.h: In function 'nir_ssa_def* nir_channel(nir_builder*, nir_ssa_def*, int)':
   nir/nir_builder.h:222:37: warning: narrowing conversion of 'c' from 'int' to 'unsigned int' inside { } is ill-formed in C++11 [-Wnarrowing]
       unsigned swizzle[4] = {c, c, c, c};

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 21:08:25 -04:00
Rob Clark 7c72f593ad nir: really actually fix comment this time
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-17 21:06:11 -04:00
Rob Clark 5305603b9d nir/print: print variable names
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-17 20:26:12 -04:00
Rob Clark ba78260b0f nir: some comment fixups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-09-17 20:25:33 -04:00
Rob Clark 509e0c4505 nir: add lowering stage for user-clip-planes / clipdist
The vertex shader lowering adds calculation for CLIPDIST, if needed
(ie. user-clip-planes), and the frag shader lowering adds conditional
kills based on CLIPDIST value (which should be treated as a normal
interpolated varying by the driver).

Note that this won't quite do the right thing in the face of MSAA plus
user-clip-planes, since all the samples would be killed or not (rather
than potentially only a portion of them).  But it's better than no UCP
support at all for drivers that don't have this in hw.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-17 19:57:21 -04:00
Rob Clark 53671a3723 nir: add sysval for user-clip-planes
For lowering user-clip-planes, we need a way to pass the enabled/used
user-clip-planes in to shader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2015-09-17 19:55:43 -04:00
Jason Ekstrand a6c467d6c5 nir: Add a pass to rewrite uses of vecN sources to the vecN destination
v2 (Jason Ekstrand):
 - Handle non-SSA sources and destinations

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-17 08:19:48 -07:00
Jason Ekstrand ddffe30f40 nir: Add comments to nir_index_instrs and nir_index_ssa_defs
The provided indices have the very nice property that if A dominates B then
A->index <= B->index.  We should document that somewhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-17 08:16:01 -07:00
Jason Ekstrand 8ecaef967d nir: Add a generic instruction index
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-17 08:16:01 -07:00
Timothy Arceri ef8eebc6ad nir: support indirect indexing samplers in struct arrays
As a bonus we get indirect support for arrays of arrays for free.

V5: couple of small clean-ups suggested by Jason.

V4: fix struct member location caclulation, use nir_ssa_def rather than
nir_src for the indirect as suggested by Jason

V3: Use nir_instr_rewrite_src() with empty src rather then clearing
the use_link list directly for the old indirects as suggested by Jason

V2: Fixed validation error in debug build

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:34 +10:00
Timothy Arceri dcd9cd0383 glsl: store uniform slot id in var location field
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.

V3: remove line wrap change from this patch

V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-17 11:28:14 +10:00
Rob Clark aecbc93f2d nir/print: print symbolic names from shader-enum
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 10:15:35 -04:00
Rob Clark 840df72f93 nir/print: bit of state refactoring
Rename print_var_state to print_state, and stuff FILE ptr into the state
object.  This avoids passing around an extra parameter everywhere.

v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-09-16 10:15:17 -04:00
Rob Clark d9efe40dc9 nir: add lowering for ffract
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-16 08:27:36 -04:00
Jason Ekstrand cb503c3227 nir/builder: Use a normal temporary array in nir_channel
C++ gets cranky if we take references of temporaries.  This isn't a problem
yet in master because nir_builder is never used from C++.  However, it will
be in the future so we should fix it now.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-09-15 14:51:05 -07:00
Jason Ekstrand 29348631fe nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions.  We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction.  With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.

Shader-db results for vec4 programs on Haswell:

   total instructions in shared programs: 1747753 -> 1746280 (-0.08%)
   instructions in affected programs:     143274 -> 141801 (-1.03%)
   helped:                                667
   HURT:                                  0

It turns out that dot-products matter...

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand 47739c7df4 nir: Add a fdot instruction that replicates the result to a vec4
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:48 -07:00
Jason Ekstrand 2458ea95c5 nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed.  This adds code to try and
coalesce into the destination of the instruction providing the value.

Shader-db results for vec4 shaders on Haswell:

   total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
   instructions in affected programs:     231230 -> 224563 (-2.88%)
   helped:                                1017
   HURT:                                  2

This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>.  Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 12:38:07 -07:00
Jason Ekstrand 2b2f1f16a0 nir/lower_vec_to_movs: Get rid of start_idx and swizzle compacting
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable.  I think this was a relic
of the way that GLSL IR implements writemasks.  In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing.  In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone.  We can simplify things a good deal (and
make them correct) by removing this "compacting" step.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-09-15 11:13:48 -07:00
Jason Ekstrand c3f8cde964 nir/lower_vec_to_movs: Handle partially SSA shaders
v2 (Jason Ekstrand):
 - Use nir_instr_rewrite_dest
 - Pass the impl directly into lower_vec_to_movs_block

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:45 -07:00
Jason Ekstrand b7eeced3c7 nir/lower_vec_to_movs: Pass the shader around directly
Previously, we were passing the shader around, we were just calling it
"mem_ctx".  However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-15 11:13:40 -07:00
Jordan Justen 4f178f0d8b nir: Add gl_WorkGroupID system variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Jordan Justen 62e011d593 nir: Add gl_LocalInvocationID variable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2015-09-13 09:53:16 -07:00
Rob Clark b88aeff4f5 nir: add nir_channel() to get at single components of vec's
Rather than make yet another copy of channel(), let's move it into nir.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-13 11:08:27 -04:00
Jason Ekstrand ca11c3c0a4 nir/from_ssa: Use instr_rewrite_dest
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand cee29220e3 nir: Add a function for rewriting instruction destinations
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand 106a3b2cc3 nir: Only unlink sources that are actually valid
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand a4aa25be1e nir: Remove the mem_ctx parameter from ssa_def_rewrite_uses
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:20 -07:00
Jason Ekstrand 8c8fc5f833 nir: Fix a bunch of ralloc parenting errors
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader.  In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.

We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2015-09-11 09:21:04 -07:00
Jason Ekstrand 794355e771 nir/lower_outputs_to_temporaries: Reparent the output name
We copy the output, make the old output the temporary, and give the
temporary a new name.  The copy keeps the pointer to the old name.  This
works just fine up until the point where we lower things to SSA and delete
the old variable and, with it, the name.  Instead, we should re-parent to
the copy.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2015-09-11 08:55:51 -07:00
Kenneth Graunke b811085b79 nir: Store some geometry shader data in nir_shader.
This makes it possible for NIR shaders to know the number of output
vertices and the number of invocations.  Drivers could also access
these directly without going through gl_program.

We should probably add InputType and OutputType here too, but currently
those are stored as GL_* enums, and I wanted to avoid using those in
NIR, as I suspect Vulkan/SPIR-V will use different enums.  (We should
probably make our own.)

We could add VerticesIn, but it's easily computable from the input
topology, so I'm not sure whether it's worth it.  It's also currently
not stored in gl_shader (only gl_shader_program), which would require
changes to the glsl_to_nir interface or require us to store it there.

This is a bit of duplication of data...ideally, we would factor these
substructs out of gl_program, gl_shader_program, and nir_shader, creating
a gl_geometry_info class...but it would need to go in a new place (in
src/glsl?) that isn't mtypes.h nor nir.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-11 00:05:09 -07:00
Kenneth Graunke cb2b118e40 nir/builder: Add nir_load_var() and nir_store_var() helpers.
These provide a convenient way to do simple variable loads and stores.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-11 00:04:17 -07:00
Ilia Mirkin 56238305e5 nir: convert glsl imageSamples into a new intrinsic
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:52 -04:00
Ilia Mirkin 1807a08e4f nir: add nir_texop_texture_samples and convert from glsl
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-10 17:38:33 -04:00
Rhys Kidd 32cdb49fe2 glsl: Resolve GCC sign-compare warning.
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:63:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = 0; i < tex->num_srcs; i++) {
                         ^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:114:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (int i = proj_index + 1; i < tex->num_srcs; i++) {
                                      ^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:53:39: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       for (proj_index = 0; proj_index < tex->num_srcs; proj_index++) {
                                       ^
mesa/src/glsl/nir/nir_lower_tex_projector.c:57:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (proj_index == tex->num_srcs)
                      ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:84:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (int i = 0; i < num_components; ++i)
                      ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:110:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          for (int i = 0; i < num_components; ++i) {
                            ^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:139:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             if (i < num_components)
                   ^
mesa/src/glsl/nir/nir_opt_peephole_ffma.c: In function 'get_mul_for_src':
mesa/src/glsl/nir/nir_opt_peephole_ffma.c:130:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
    for (unsigned i = 0; i < num_components; i++)
                           ^

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-10 14:56:41 +01:00
Jason Ekstrand b828f7a27b nir/glsl: Use lower_outputs_to_temporaries instead of relying on GLSL IR
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:29:38 -07:00
Jason Ekstrand 1dbe4af9c9 nir: Add a pass to lower outputs to temporary variables
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:29:21 -07:00
Jason Ekstrand f5e08ab6b1 nir/cursor: Add a constructor for the end of a block but before the jump
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-09-09 12:28:51 -07:00
Kenneth Graunke d5d74d0b86 nir: Add a nir_system_value_from_intrinsic() function.
This converts NIR intrinsics that load system values into Mesa's
SYSTEM_VALUE_* enumerations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-09-08 18:02:08 -07:00
Iago Toral Quiroga 205ff843ff nir: UBO loads no longer use const_index[1]
Commit 2126c68e5c killed the array elements parameter on load/store
intrinsics that was stored in const_index[1]. It looks like that
patch missed to remove this assignment in the UBO path.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-09-08 09:06:34 +02:00
Connor Abbott aec6744501 nir/dead_cf: add support for removing useless loops
v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
    checking for values live after the loop to catch const_load
    instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-09-01 00:58:17 -07:00
Connor Abbott 019eea1c4f nir: add a helper for iterating over blocks in a cf node
We were already doing this internally for iterating over a function
implementation, so just expose it directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott 89dc0626bd nir: add nir_block_get_following_loop() helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott f649afc9dd nir/dead_cf: delete code that's unreachable due to jumps
v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Connor Abbott 1e6ad4b027 nir: add an optimization for removing dead control flow
v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-09-01 00:58:17 -07:00
Jason Ekstrand e16531fbe3 nir/builder: Use nir_after_instr to advance the cursor
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node.  Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node.  By just resetting to after_instr, we
avoid all these problems.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-31 18:17:07 -07:00
Kenneth Graunke 0a913a9d85 nir: Convert the builder to use the new NIR cursor API.
The NIR cursor API is exactly what we want for the builder's insertion
point.  This simplifies the API, the implementation, and is actually
more flexible as well.

This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.

v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
2015-08-27 13:36:57 -07:00
Kenneth Graunke 3e3cb77901 nir: Convert the NIR instruction insertion API to use cursors.
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point.  It then reworks the existing API to
simply be a wrapper around that for compatibility.

This largely involves moving the existing code into a new function.

Suggested by Connor Abbott.

v2: Make the legacy functions static inline in nir.h (requested by
    Connor Abbott).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke f90c6b1ce0 nir: Move nir_cursor to nir.h.
We want to use this for normal instruction insertion too, not just
control flow.  Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke c44d507752 nir: Strengthen "no jumps" assertions in instruction insertion API.
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.

Previously, we only checked this when the new instruction being inserted
was a jump.  This is a red herring - inserting *any* kind of instruction
after a jump is illegal.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-27 13:36:57 -07:00
Kenneth Graunke 5f14c417c8 nir: Use nir_shader::stage rather than passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Kenneth Graunke d4d5b430a5 nir: Store gl_shader_stage in nir_shader.
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.

Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-25 11:12:35 -07:00
Jason Ekstrand c999a58f50 nir/lower_io: Remove assign_var_locations_direct_first
This is no longer used so we might as well get rid of it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand ce5e9139aa nir/lower_io: Separate driver_location and base offset for uniforms
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Jason Ekstrand 0db8e87b4a nir/intrinsics: Add a second const index to load_uniform
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path.  In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load.  Previously, it was completely flattened by
nir_lower_io which made things difficult.  This adds more information to
the intrinsic to make this easier for us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke 6c33d6bbf9 nir: Pass a type_size() function pointer into nir_lower_io().
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.

In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play.  The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.

This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer.  This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.

v2 (Jason Ekstrand):
 - One side-effect of passing in a function pointer is that nir_lower_io is
   now aware of and properly allocates space for image uniforms, allowing
   us to drop hacks in the backend

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-25 10:18:27 -07:00
Kenneth Graunke 4f2cdd8497 nir: Use !block_ends_in_jump() in a few places rather than open-coding.
Connor introduced this helper recently; we should use it here too.

I had to move the function earlier in the file for it to be available.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-24 15:10:55 -07:00
Connor Abbott d7971b41ce nir/cf: reimplement nir_cf_node_remove() using the new API
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott fc7f2d2364 nir/cf: add new control modification API's
These will help us do a number of things, including:

- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:

if (foo) {
    ...
}
if (!foo) {
    ...
}

with:

if (foo) {
    ...
} else {
    ...
}

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 476eb5e4a1 nir/cf: use a cursor for inserting control flow
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott d356f84d4c nir/cf: add split_block_cursor()
This is a helper that will be shared between the new control flow
insertion and modification code.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 58a360c6b8 nir/cf: add split_block_before_instr()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 6e47a34b29 nir/cf: add a cursor structure
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().

In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 6f5c81f86f nir/cf: fix link_blocks() when there are no successors
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 6d028749ac nir/cf: clean up jumps when cleaning up CF nodes
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 211c79515d nir/cf: remove uses of SSA definitions that are being deleted
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 633cbbc068 nir/cf: handle jumps better in stitch_blocks()
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 940873bf22 nir/cf: handle jumps in split_block_end()
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott f596e4021c nir/cf: add block_ends_in_jump()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 788d45cb47 nir/cf: handle phi nodes better in split_block_beginning()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 747ddc3cdd nir/cf: split up and improve nir_handle_remove_jumps()
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 13482111d0 nir/cf: add remove_phi_src() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott f41e108d8b nir: add nir_foreach_phi_src_safe()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott 762ae436ea nir/cf: add insert_phi_undef() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott b49371b8ed nir: move control flow modification to its own file
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott 1c53f89696 nir: make cleanup_cf_node() not use remove_defs_uses()
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott 9d5944053c nir: inline block_add_pred() a few places
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott c7df141c71 nir/validate: check successors/predecessors more carefully
We should be checking almost everything now.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Kenneth Graunke 8e0d4ef341 nir: Delete the nir_function_impl::start_block field.
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly.  Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.

Thanks to Connor Abbott for suggesting this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-24 13:31:41 -07:00
Martin Peres 80b1707e26 nir: convert the glsl intrinsic image_size to nir_intrinsic_image_size
v2, review from Francisco Jerez:
 - make the destination variable as large as what the nir instrinsic
   defines (4) instead of the size of the return variable of glsl. This
   is still safe for the already existing code because all the intrinsics
   affected returned the same amount of components as expected by glsl IR.
   In the case of image_size, it is not possible to do so because the
   returned number of component depends on the image type and this case
   is not well handled by nir.

v3:
- Style fix

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-08-20 14:07:46 +03:00
Kenneth Graunke ab83be590d nir: Use nir_builder in nir_lower_io's get_io_offset().
Much more readable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-19 19:29:39 -07:00
Kenneth Graunke ed2afec3fc nir: Pull nir_lower_io's load_op selection into a helper function.
Makes the function a bit smaller.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-19 19:29:22 -07:00
Thomas Helland 49d0a36bd6 nir: Simplify feq(fneg(a), a)) -> feq(a, 0.0)
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-18 11:34:44 -07:00
Thomas Helland a39167d594 nir: Simplify fne(fneg(a), a) -> fne(a, 0.0)
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.

Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs:     15785 -> 15585 (-1.27%)
helped:                                36
HURT:                                  0
GAINED:                                0
LOST:                                  0

Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26

Make that:
vec1 ssa_27 = fne ssa_21, 0.0f

This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-18 11:34:44 -07:00
Kenneth Graunke afccbd7256 nir: Add a glsl_uint_type() wrapper.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-16 21:44:19 -07:00
Eric Anholt a6e75e3cd7 nir: Add support for CSE on textures.
NIR instruction count results on i965:
total instructions in shared programs: 1261954 -> 1261937 (-0.00%)
instructions in affected programs:     455 -> 438 (-3.74%)

One in yofrankie, two in tropics.  Apparently i965 had also optimized all
of these out anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-14 11:39:18 -07:00
Eric Anholt fb2425a641 nir: Zero out texture instructions when creating them.
There are so many flags in textures, that the CSE pass would have a hard
time referencing the correct set when figuring out if two texture ops are
the same.  By zeroing, we can avoid that fragility.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-14 11:39:18 -07:00
Eric Anholt d50c182671 nir: Don't try to scalarize unpack ops.
Avoids regressions in vc4 when trying to do our blending in NIR.

v2: Add the other unpack ops I meant to when writing the original commit
    message.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-14 11:39:18 -07:00
Eric Anholt 9e6dc5b64d nir: Add a nir_opt_undef() to handle csels with undef.
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening.  vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.

total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs:     37 -> 23 (-37.84%)

v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
    Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
    over to src[0], too (by anholt).

Reviewed-by: Thomas Helland <thomashelland90@gmail.com> (v2)
Tested-by: Thomas Helland <thomashelland90@gmail.com> (v2)
2015-08-14 11:39:18 -07:00
Timothy Arceri 2c61d583f8 nir: add missing type to type_size_vec4()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-08-05 21:16:45 +10:00
Eric Anholt 6c28ee2041 nir: Add a nir_lower_load_const_to_scalar() pass.
This is useful to increase the CSE opportunities for a scalar backend.  It
avoids regressions when dropping vc4's custom CSE implementation.

v2: Cleanups by Matt (decl in the for loop, and unreachable()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-04 20:03:10 -07:00
Eric Anholt a70f63ab20 nir: Add algebraic opt for no-op iand.
I lazily generated some of these in VC4 NIR lowering.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-08-04 17:19:25 -07:00
Eric Anholt eae9c3286e Revert "nir: Use a single bit for the dual-source blend index"
This reverts commit ab5b7a0fe6.  We use more
than one bit of value in tgsi_to_nir.
2015-08-04 17:19:01 -07:00
Samuel Iglesias Gonsalvez 418c004f80 nir: Fix output swizzle in get_mul_for_src
Avoid copying an overwritten swizzle, use the original values.

Example:

   Former swizzle[] = xyzw
   src->swizzle[] = zyxx

The expected output swizzle = zyxx but if we reuse swizzle in the loop,
then output swizzle would be zyzz.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-03 09:40:50 -07:00
Iago Toral Quiroga 01f6235020 nir/nir_lower_io: Add vec4 support
The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.

This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-03 09:40:47 -07:00
Timothy Arceri ab5b7a0fe6 nir: Use a single bit for the dual-source blend index
The only values allowed are 0 and 1, and the value is checked before
assigning.

This is a copy of 8eeca7a56c that seems to have been made to the glsl
ir type after it was copied for use in nir but before nir landed.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-03 21:36:50 +10:00
Matt Turner 4251ccb47b nir: Avoid double promotion.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-07-29 09:34:51 -07:00
Matt Turner 5c7fd67045 glsl: Remove MSVC implementations of copysign and isnormal.
Non-Gallium parts of Mesa require MSVC 2013 which provides these.
2015-07-29 09:34:51 -07:00
Dave Airlie 80511d176a i965: add support for ARB_shader_subroutine
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-07-24 10:25:08 +10:00
Dave Airlie 57f24299b7 glsl/types: add new subroutine type (v3.2)
This type will be used to store the name of subroutine types

as in subroutine void myfunc(void);
will store myfunc into a subroutine type.

This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.

Also add contains_subroutine method.

v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>

tests: fix warnings
2015-07-23 17:25:25 +10:00
Connor Abbott eaf799ddff nir: add nir_foreach_instr_safe_reverse()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
2015-07-17 09:49:53 -07:00
Connor Abbott 8eea091747 nir: add nir_instr_is_first() and nir_instr_is_last() helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
2015-07-17 09:47:22 -07:00
Iago Toral Quiroga 6b09598d63 nir: add nir_var_shader_storage
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-07-14 07:04:03 +02:00
Kenneth Graunke efb36271a9 nir: Fix comment above nir_convert_from_ssa() prototype.
Connor renamed the parameter, inverting the sense.
Update the comment accordingly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-07-08 11:28:08 -07:00
Rob Clark 959b47262b nir/lower_phis_to_scalar: undef is trivially scalarizable
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-07-03 08:56:09 -04:00
Jason Ekstrand 89bd5ee64c nir: Don't allow copying SSA destinations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-07-02 15:42:33 -07:00
Connor Abbott aa7d4cecec nir: remove parent_instr from nir_register
It's no longer used.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-30 11:18:27 -07:00
Connor Abbott f49e51ef44 nir: remove nir_src_get_parent_instr()
It's now unused.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-30 11:18:27 -07:00
Connor Abbott 2b1a1d8b12 nir/from_ssa: add a flag to not convert everything from SSA
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.

The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.

v2: rename and flip meaning of flag (Jason)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-30 11:18:27 -07:00
Rob Clark dc7e6463d3 nir: cleanup open-coded instruction casts
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2015-06-30 12:13:44 -04:00
Kenneth Graunke 6026f7e8fb nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).

shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs:     11928 -> 11670 (-2.16%)
helped:                                64
HURT:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-06-25 02:12:32 -07:00
Kenneth Graunke 147cdb53ec nir: Use a switch statement for detecting move-like operations.
Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-06-24 10:35:04 -07:00
Kenneth Graunke 1762568fd3 nir: Allow vec2/vec3/vec4 instructions in the select peephole pass.
These are basically just moves, so they should be safe as well.

When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:

        if ssa_21 {
                block block_1:
                /* preds: block_0 */
                vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
                /* succs: block_3 */
        } else {
                block block_2:
                /* preds: block_0 */
                /* succs: block_3 */
        }
        block block_3:
        /* preds: block_1 block_2 */
        vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2

Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass.  But with
the vec4 operation, they were not.  We want to keep getting selects.

Normal i965 on Broadwell:
instructions in affected programs:     200 -> 176 (-12.00%)
helped:                                4

With brw_fs_channel_expressions() disabled:
instructions in affected programs:     1832 -> 1646 (-10.15%)
helped:                                30

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-22 14:08:36 -07:00
Jordan Justen 2867f2e8cd nir: Add barrier intrinsic function
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-06-12 15:12:40 -07:00
Chris Forbes e7f628c2fc glsl: Add ir node for barrier
v2:
 * Changes suggested by mattst88

[jordan.l.justen@intel.com: Add nir support]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-06-12 15:12:39 -07:00
Timothy Arceri 86a74e9b6b nir: use src for ssa helper
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-03 06:50:39 +10:00
Timothy Arceri 5f7b8fa481 nir: remove extra semicolon
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-03 06:50:33 +10:00
Eduardo Lima Mitev 5b226a1242 nir: prevent use-after-free condition in should_lower_phi()
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.

This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.

The crash can be reproduced ~20% of times with the dEQP test:

dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-02 20:21:49 +02:00
Iago Toral Quiroga 2231cf0ba3 nir: Fix output swizzle in get_mul_for_src
When we compute the output swizzle we want to consider the number of
components in the add operation. So far we were using the writemask
of the multiplication for this instead, which is not correct.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-28 18:25:37 +02:00
Matt Turner 5614bcc416 nir: Remove sRGB colorspace conversion round-trip.
Some shaders in Civilization V and Beyond Earth do

   pow(pow(x, 2.2), 0.454545)

which is converting to and from sRGB colorspace.

A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.

instructions in affected programs:     934 -> 886 (-5.14%)
helped:                                16
2015-05-22 11:26:36 -07:00
Jason Ekstrand 2126c68e5c nir: Get rid of the array elements parameter on load/store intrinsics
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics.  However, this set to 1
by every pass that ever creates a load/store intrinsic.  Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1.  Let's just delete it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-05-20 09:28:06 -07:00
Francisco Jerez d91d6b3f03 nir: Translate memory barrier intrinsics from GLSL IR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez f8f8b31847 nir: Translate image load, store and atomic intrinsics from GLSL IR.
v2: Undefine coordinate components not applicable to the target.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez 6de78e6b0c nir: Fix indexing of atomic counter arrays with a constant value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez f1269a3e01 nir: Add memory barrier intrinsic.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez d9e930997f nir: Define image load, store and atomic intrinsics.
v2: Undefine coordinate components not applicable to the target.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Tapani Pälli 95774ca258 nir: fix sampler lowering pass for arrays
This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.

I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:

	ES3-CTS.shaders.struct.uniform.sampler_array_fragment

v2: remove unnecessary comment (Topi)
    simplify changes and the overall code (Jason)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
2015-05-12 14:28:16 +03:00
Kenneth Graunke d6fb155f30 nir: Fix aggressive typos in nir_from_ssa.c.
s/agressive/aggressive/g

Trivial.
2015-05-08 19:38:14 -07:00
Jason Ekstrand fb5f411248 nir/search: Save/restore the variables_seen bitmask when matching
Shader-db results on Broadwell:

   total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
   instructions in affected programs:     1330548 -> 1315224 (-1.15%)
   helped:                                5797
   HURT:                                  76
   GAINED:                                0
   LOST:                                  8

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:29:15 -07:00
Jason Ekstrand e0cfe59c37 nir/search: Assert that variable id's are in range
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:29:15 -07:00
Jason Ekstrand 13facfbd5b nir/search: handle explicitly sized sources in match_value
Previously, this case was being handled in match_expression prior to
calling match_value.  However, there is really no good reason for this
given that match_value has all of the information it needs.  Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:29:14 -07:00
Jason Ekstrand f752effa08 nir/nir: Use a linked list instead of a hash set for use/def sets
This commit switches us from the current setup of using hash sets for
use/def sets to using linked lists.  Doing so should save us quite a bit of
memory because we aren't carrying around 3 hash sets per register and 2 per
SSA value.  It should also save us CPU time because adding/removing things
from use/def sets is 4 pointer manipulations instead of a hash lookup.

Running shader-db 50 times with USE_NIR=0, NIR, and NIR + use/def lists:

   GLSL IR Only:        586.4 +/- 1.653833
   NIR with hash sets:  675.4 +/- 2.502108
   NIR + use/def lists: 641.2 +/- 1.557043

I also ran a memory usage experiment with Ken's patch to delete GLSL IR and
keep NIR.  This patch cuts an aditional 42.9 MiB of ralloc'd memory over
and above what we gained by deleting the GLSL IR on the same dota trace.

On the code complexity side of things, some things are now much easier and
others are a bit harder.  One of the operations we perform constantly in
optimization passes is to replace one source with another.  Due to the fact
that an instruction can use the same SSA value multiple times, we had to
iterate through the sources of the instruction and determine if the use we
were replacing was the only one before removing it from the set of uses.
With this patch, uses are per-source not per-instruction so we can just
remove it safely.  On the other hand, trying to iterate over all of the
instructions that use a given value is more difficult.  Fortunately, the
two places we do that are the ffma peephole where it doesn't matter and GCM
where we already gracefully handle duplicates visits to an instruction.

Another aspect here is that using linked lists in this way can be tricky to
get right.  With sets, things were quite forgiving and the worst that
happened if you didn't properly remove a use was that it would get caught
in the validator.  With linked lists, it can lead to linked list corruption
which can be harder to track.  However, we do just as much validation of
the linked lists as we did of the sets so the validator should still catch
these problems.  While working on this series, the vast majority of the
bugs I had to fix were caught by assertions.  I don't think the lists are
going to be that much worse than the sets.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand ecc2cfc8b6 nir: Use nir_instr_rewrite_src in copy propagation
We were rolling our own rewrite_src variant in copy-propagation.  Let's
stop doing that and use the ones in core NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand f72a8d1cf0 nir: Add a function for rewriting the condition of an if statement
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand 300d729436 nir: Add and use initializer #defines for nir_src and nir_dest
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand 6702ebce57 nir: Modernize the out-of-SSA pass
The out-of-SSA pass was one of the first passes written when getting SSA
up-and-going (for obvious reasons).  As such, it came before a lot of the
nifty SSA-based helpers were introduced.  This commit modernizes it so that
we're no longer doing nearly as much manual banging on use/def sets.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand 7ee0216e2d nir/validate: Validate SSA def parent instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Ian Romanick 3bdbc1e436 nir: Delete all traces of nir_op_flog
Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick ad51f9b421 nir: Don't produce nir_op_flog from GLSL IR
All paths that produce GLSL IR for NIR lower ir_unop_log.  All paths
that consume NIR will explode if they geta nir_op_flog.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick e0a17f6e31 nir: Delete all traces of nir_op_fexp
Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick a45d55f17c nir: Don't produce nir_op_fexp from GLSL IR
All paths that produce GLSL IR for NIR lower ir_unop_exp.  All paths
that consume NIR will explode if they geta nir_op_fexp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Matt Turner 8e029105c2 nir: Allow feq/fne/ieq/ine to be optimized with inot.
instructions in affected programs:     380 -> 376 (-1.05%)
helped:                                2

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner f5cf74d8ba nir: Recognize (a < c || b < c) as min(a, b) < c.
... and (a >= c) || (b >= c) as max(a, b) >= c.

Similar to commit 97e6c1b9.

total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
instructions in affected programs:     6400 -> 6304 (-1.50%)
helped:                                68
HURT:                                  4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner ceb8b739ce nir: Recognize trivial min/max.
No changes, but does prevent some regressions in the next commit.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner 8ae559971a nir: Recognize i2b(b2i(x)) as x.
Helps the same set of programs as the previous commit.

instructions in affected programs:     4490 -> 4346 (-3.21%)
helped:                                8

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner 74697e2844 nir: Recognize imul(b2i(a), b2i(b)) as a logical AND.
Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16
because we avoid an integer multiplication.

instructions in affected programs:     2353 -> 2245 (-4.59%)
helped:                                4
GAINED:                                4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Zoë Blade 05e7f7f438 Fix a few typos
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-04-27 17:28:29 +03:00
Matt Turner f251ea393b nir: Transform pow(x, 4) into (x*x)*(x*x). 2015-04-24 11:39:01 -07:00
Jason Ekstrand 125574d1ef nir/lower_source_mods: Don't propagate register sources
The nir_lower_source_mods pass does a weak form of copy propagation to
clean up all of the mov-with-negate's that get generated.  However, we
weren't properly checking that the sources were SSA and so we could end up
moving a register read which is not, in general, valid.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 296131f467 nir: Rewrite instr_rewrite_src
The old code wasn't correctly handling the case where the new value of the
source contains an indirect.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand d61bd972d8 nir/locals_to_regs: Hanadle indirect accesses of length-1 arrays
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 06f3c98b9d nir/locals_to_regs: Initialize registers with constant initializers
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 4e9b376594 nir/locals_to_regs: Pass around the nir_shader rather than a void * mem_ctx
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand f50f59d3d9 nir: Add a simple growing array data structure
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 8b900e7405 nir/types: Make glsl_get_length smarter
Previously, this function returned the number of elements for structures
and arrays and 0 for everything else.  In NIR, this is almost never what
you want because we also treat matricies as arrays so you have to
special-case constantly.  This commit  glsl_get_length treat matrices
as an array of columns by returning the number of columns instead of 0

This also fixes a bug in locals_to_regs caused by not checking for the
matrix case in one place.

v2: Only special-case for matrices and return a length of 0 for vectors as
    we did before.  This was needed to not break the TGSI-based drivers and
    doesn't really affect NIR at the moment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 7e1d21edbf nir: Move get_const_initializer_load from vars_to_ssa to NIR core
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand ba88760202 nir/lower_vars_to_ssa: Pass around the nir_shader instead of a void mem_ctx
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand e79120afdc nir/print: Print the closing paren on load_const instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 02f03fc0f1 nir/tex: Use the correct return size for query_levels and lod
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 94669cb534 nir: Refactor tex_instr_dest_size to use a switch statement
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 73cc76362d nir/lower_vars_to_ssa: Actually look for indirects when determining aliasing
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:39 -07:00
Matt Turner 4dacb212fd nir: Allow abs/neg in select peephole pass.
total instructions in shared programs: 4314531 -> 4308949 (-0.13%)
instructions in affected programs:     429085 -> 423503 (-1.30%)
helped:                                1680
HURT:                                  0
GAINED:                                0
LOST:                                  111

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-17 11:01:34 -07:00