Commit Graph

665 Commits

Author SHA1 Message Date
Connor Abbott 6e47a34b29 nir/cf: add a cursor structure
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().

In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 6f5c81f86f nir/cf: fix link_blocks() when there are no successors
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 6d028749ac nir/cf: clean up jumps when cleaning up CF nodes
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 211c79515d nir/cf: remove uses of SSA definitions that are being deleted
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 633cbbc068 nir/cf: handle jumps better in stitch_blocks()
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 940873bf22 nir/cf: handle jumps in split_block_end()
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott f596e4021c nir/cf: add block_ends_in_jump()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 788d45cb47 nir/cf: handle phi nodes better in split_block_beginning()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 747ddc3cdd nir/cf: split up and improve nir_handle_remove_jumps()
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:42 -07:00
Connor Abbott 13482111d0 nir/cf: add remove_phi_src() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott f41e108d8b nir: add nir_foreach_phi_src_safe()
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott 762ae436ea nir/cf: add insert_phi_undef() helper
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott b49371b8ed nir: move control flow modification to its own file
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott 1c53f89696 nir: make cleanup_cf_node() not use remove_defs_uses()
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott 9d5944053c nir: inline block_add_pred() a few places
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Connor Abbott c7df141c71 nir/validate: check successors/predecessors more carefully
We should be checking almost everything now.

Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-08-24 13:31:41 -07:00
Kenneth Graunke 8e0d4ef341 nir: Delete the nir_function_impl::start_block field.
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly.  Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.

Thanks to Connor Abbott for suggesting this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-08-24 13:31:41 -07:00
Martin Peres 80b1707e26 nir: convert the glsl intrinsic image_size to nir_intrinsic_image_size
v2, review from Francisco Jerez:
 - make the destination variable as large as what the nir instrinsic
   defines (4) instead of the size of the return variable of glsl. This
   is still safe for the already existing code because all the intrinsics
   affected returned the same amount of components as expected by glsl IR.
   In the case of image_size, it is not possible to do so because the
   returned number of component depends on the image type and this case
   is not well handled by nir.

v3:
- Style fix

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-08-20 14:07:46 +03:00
Kenneth Graunke ab83be590d nir: Use nir_builder in nir_lower_io's get_io_offset().
Much more readable.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-19 19:29:39 -07:00
Kenneth Graunke ed2afec3fc nir: Pull nir_lower_io's load_op selection into a helper function.
Makes the function a bit smaller.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-19 19:29:22 -07:00
Thomas Helland 49d0a36bd6 nir: Simplify feq(fneg(a), a)) -> feq(a, 0.0)
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-18 11:34:44 -07:00
Thomas Helland a39167d594 nir: Simplify fne(fneg(a), a) -> fne(a, 0.0)
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.

Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs:     15785 -> 15585 (-1.27%)
helped:                                36
HURT:                                  0
GAINED:                                0
LOST:                                  0

Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26

Make that:
vec1 ssa_27 = fne ssa_21, 0.0f

This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8

Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-18 11:34:44 -07:00
Kenneth Graunke afccbd7256 nir: Add a glsl_uint_type() wrapper.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
2015-08-16 21:44:19 -07:00
Eric Anholt a6e75e3cd7 nir: Add support for CSE on textures.
NIR instruction count results on i965:
total instructions in shared programs: 1261954 -> 1261937 (-0.00%)
instructions in affected programs:     455 -> 438 (-3.74%)

One in yofrankie, two in tropics.  Apparently i965 had also optimized all
of these out anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-14 11:39:18 -07:00
Eric Anholt fb2425a641 nir: Zero out texture instructions when creating them.
There are so many flags in textures, that the CSE pass would have a hard
time referencing the correct set when figuring out if two texture ops are
the same.  By zeroing, we can avoid that fragility.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-14 11:39:18 -07:00
Eric Anholt d50c182671 nir: Don't try to scalarize unpack ops.
Avoids regressions in vc4 when trying to do our blending in NIR.

v2: Add the other unpack ops I meant to when writing the original commit
    message.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-14 11:39:18 -07:00
Eric Anholt 9e6dc5b64d nir: Add a nir_opt_undef() to handle csels with undef.
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening.  vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.

total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs:     37 -> 23 (-37.84%)

v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
    Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
    over to src[0], too (by anholt).

Reviewed-by: Thomas Helland <thomashelland90@gmail.com> (v2)
Tested-by: Thomas Helland <thomashelland90@gmail.com> (v2)
2015-08-14 11:39:18 -07:00
Timothy Arceri 2c61d583f8 nir: add missing type to type_size_vec4()
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-08-05 21:16:45 +10:00
Eric Anholt 6c28ee2041 nir: Add a nir_lower_load_const_to_scalar() pass.
This is useful to increase the CSE opportunities for a scalar backend.  It
avoids regressions when dropping vc4's custom CSE implementation.

v2: Cleanups by Matt (decl in the for loop, and unreachable()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-04 20:03:10 -07:00
Eric Anholt a70f63ab20 nir: Add algebraic opt for no-op iand.
I lazily generated some of these in VC4 NIR lowering.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-08-04 17:19:25 -07:00
Eric Anholt eae9c3286e Revert "nir: Use a single bit for the dual-source blend index"
This reverts commit ab5b7a0fe6.  We use more
than one bit of value in tgsi_to_nir.
2015-08-04 17:19:01 -07:00
Samuel Iglesias Gonsalvez 418c004f80 nir: Fix output swizzle in get_mul_for_src
Avoid copying an overwritten swizzle, use the original values.

Example:

   Former swizzle[] = xyzw
   src->swizzle[] = zyxx

The expected output swizzle = zyxx but if we reuse swizzle in the loop,
then output swizzle would be zyzz.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-03 09:40:50 -07:00
Iago Toral Quiroga 01f6235020 nir/nir_lower_io: Add vec4 support
The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.

This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-08-03 09:40:47 -07:00
Timothy Arceri ab5b7a0fe6 nir: Use a single bit for the dual-source blend index
The only values allowed are 0 and 1, and the value is checked before
assigning.

This is a copy of 8eeca7a56c that seems to have been made to the glsl
ir type after it was copied for use in nir but before nir landed.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-08-03 21:36:50 +10:00
Matt Turner 4251ccb47b nir: Avoid double promotion.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-07-29 09:34:51 -07:00
Matt Turner 5c7fd67045 glsl: Remove MSVC implementations of copysign and isnormal.
Non-Gallium parts of Mesa require MSVC 2013 which provides these.
2015-07-29 09:34:51 -07:00
Dave Airlie 80511d176a i965: add support for ARB_shader_subroutine
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-07-24 10:25:08 +10:00
Dave Airlie 57f24299b7 glsl/types: add new subroutine type (v3.2)
This type will be used to store the name of subroutine types

as in subroutine void myfunc(void);
will store myfunc into a subroutine type.

This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.

Also add contains_subroutine method.

v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>

tests: fix warnings
2015-07-23 17:25:25 +10:00
Connor Abbott eaf799ddff nir: add nir_foreach_instr_safe_reverse()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
2015-07-17 09:49:53 -07:00
Connor Abbott 8eea091747 nir: add nir_instr_is_first() and nir_instr_is_last() helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
2015-07-17 09:47:22 -07:00
Iago Toral Quiroga 6b09598d63 nir: add nir_var_shader_storage
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-07-14 07:04:03 +02:00
Kenneth Graunke efb36271a9 nir: Fix comment above nir_convert_from_ssa() prototype.
Connor renamed the parameter, inverting the sense.
Update the comment accordingly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-07-08 11:28:08 -07:00
Rob Clark 959b47262b nir/lower_phis_to_scalar: undef is trivially scalarizable
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-07-03 08:56:09 -04:00
Jason Ekstrand 89bd5ee64c nir: Don't allow copying SSA destinations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-07-02 15:42:33 -07:00
Connor Abbott aa7d4cecec nir: remove parent_instr from nir_register
It's no longer used.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-30 11:18:27 -07:00
Connor Abbott f49e51ef44 nir: remove nir_src_get_parent_instr()
It's now unused.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-30 11:18:27 -07:00
Connor Abbott 2b1a1d8b12 nir/from_ssa: add a flag to not convert everything from SSA
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.

The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.

v2: rename and flip meaning of flag (Jason)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-30 11:18:27 -07:00
Rob Clark dc7e6463d3 nir: cleanup open-coded instruction casts
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2015-06-30 12:13:44 -04:00
Kenneth Graunke 6026f7e8fb nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).

shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs:     11928 -> 11670 (-2.16%)
helped:                                64
HURT:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-06-25 02:12:32 -07:00
Kenneth Graunke 147cdb53ec nir: Use a switch statement for detecting move-like operations.
Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-06-24 10:35:04 -07:00
Kenneth Graunke 1762568fd3 nir: Allow vec2/vec3/vec4 instructions in the select peephole pass.
These are basically just moves, so they should be safe as well.

When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:

        if ssa_21 {
                block block_1:
                /* preds: block_0 */
                vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
                /* succs: block_3 */
        } else {
                block block_2:
                /* preds: block_0 */
                /* succs: block_3 */
        }
        block block_3:
        /* preds: block_1 block_2 */
        vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2

Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass.  But with
the vec4 operation, they were not.  We want to keep getting selects.

Normal i965 on Broadwell:
instructions in affected programs:     200 -> 176 (-12.00%)
helped:                                4

With brw_fs_channel_expressions() disabled:
instructions in affected programs:     1832 -> 1646 (-10.15%)
helped:                                30

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-22 14:08:36 -07:00
Jordan Justen 2867f2e8cd nir: Add barrier intrinsic function
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-06-12 15:12:40 -07:00
Chris Forbes e7f628c2fc glsl: Add ir node for barrier
v2:
 * Changes suggested by mattst88

[jordan.l.justen@intel.com: Add nir support]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-06-12 15:12:39 -07:00
Timothy Arceri 86a74e9b6b nir: use src for ssa helper
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-03 06:50:39 +10:00
Timothy Arceri 5f7b8fa481 nir: remove extra semicolon
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-06-03 06:50:33 +10:00
Eduardo Lima Mitev 5b226a1242 nir: prevent use-after-free condition in should_lower_phi()
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.

This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.

The crash can be reproduced ~20% of times with the dEQP test:

dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-06-02 20:21:49 +02:00
Iago Toral Quiroga 2231cf0ba3 nir: Fix output swizzle in get_mul_for_src
When we compute the output swizzle we want to consider the number of
components in the add operation. So far we were using the writemask
of the multiplication for this instead, which is not correct.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-28 18:25:37 +02:00
Matt Turner 5614bcc416 nir: Remove sRGB colorspace conversion round-trip.
Some shaders in Civilization V and Beyond Earth do

   pow(pow(x, 2.2), 0.454545)

which is converting to and from sRGB colorspace.

A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.

instructions in affected programs:     934 -> 886 (-5.14%)
helped:                                16
2015-05-22 11:26:36 -07:00
Jason Ekstrand 2126c68e5c nir: Get rid of the array elements parameter on load/store intrinsics
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics.  However, this set to 1
by every pass that ever creates a load/store intrinsic.  Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1.  Let's just delete it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-05-20 09:28:06 -07:00
Francisco Jerez d91d6b3f03 nir: Translate memory barrier intrinsics from GLSL IR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez f8f8b31847 nir: Translate image load, store and atomic intrinsics from GLSL IR.
v2: Undefine coordinate components not applicable to the target.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez 6de78e6b0c nir: Fix indexing of atomic counter arrays with a constant value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez f1269a3e01 nir: Add memory barrier intrinsic.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Francisco Jerez d9e930997f nir: Define image load, store and atomic intrinsics.
v2: Undefine coordinate components not applicable to the target.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 15:47:57 +03:00
Tapani Pälli 95774ca258 nir: fix sampler lowering pass for arrays
This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.

I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:

	ES3-CTS.shaders.struct.uniform.sampler_array_fragment

v2: remove unnecessary comment (Topi)
    simplify changes and the overall code (Jason)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
2015-05-12 14:28:16 +03:00
Kenneth Graunke d6fb155f30 nir: Fix aggressive typos in nir_from_ssa.c.
s/agressive/aggressive/g

Trivial.
2015-05-08 19:38:14 -07:00
Jason Ekstrand fb5f411248 nir/search: Save/restore the variables_seen bitmask when matching
Shader-db results on Broadwell:

   total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
   instructions in affected programs:     1330548 -> 1315224 (-1.15%)
   helped:                                5797
   HURT:                                  76
   GAINED:                                0
   LOST:                                  8

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:29:15 -07:00
Jason Ekstrand e0cfe59c37 nir/search: Assert that variable id's are in range
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:29:15 -07:00
Jason Ekstrand 13facfbd5b nir/search: handle explicitly sized sources in match_value
Previously, this case was being handled in match_expression prior to
calling match_value.  However, there is really no good reason for this
given that match_value has all of the information it needs.  Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:29:14 -07:00
Jason Ekstrand f752effa08 nir/nir: Use a linked list instead of a hash set for use/def sets
This commit switches us from the current setup of using hash sets for
use/def sets to using linked lists.  Doing so should save us quite a bit of
memory because we aren't carrying around 3 hash sets per register and 2 per
SSA value.  It should also save us CPU time because adding/removing things
from use/def sets is 4 pointer manipulations instead of a hash lookup.

Running shader-db 50 times with USE_NIR=0, NIR, and NIR + use/def lists:

   GLSL IR Only:        586.4 +/- 1.653833
   NIR with hash sets:  675.4 +/- 2.502108
   NIR + use/def lists: 641.2 +/- 1.557043

I also ran a memory usage experiment with Ken's patch to delete GLSL IR and
keep NIR.  This patch cuts an aditional 42.9 MiB of ralloc'd memory over
and above what we gained by deleting the GLSL IR on the same dota trace.

On the code complexity side of things, some things are now much easier and
others are a bit harder.  One of the operations we perform constantly in
optimization passes is to replace one source with another.  Due to the fact
that an instruction can use the same SSA value multiple times, we had to
iterate through the sources of the instruction and determine if the use we
were replacing was the only one before removing it from the set of uses.
With this patch, uses are per-source not per-instruction so we can just
remove it safely.  On the other hand, trying to iterate over all of the
instructions that use a given value is more difficult.  Fortunately, the
two places we do that are the ffma peephole where it doesn't matter and GCM
where we already gracefully handle duplicates visits to an instruction.

Another aspect here is that using linked lists in this way can be tricky to
get right.  With sets, things were quite forgiving and the worst that
happened if you didn't properly remove a use was that it would get caught
in the validator.  With linked lists, it can lead to linked list corruption
which can be harder to track.  However, we do just as much validation of
the linked lists as we did of the sets so the validator should still catch
these problems.  While working on this series, the vast majority of the
bugs I had to fix were caught by assertions.  I don't think the lists are
going to be that much worse than the sets.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand ecc2cfc8b6 nir: Use nir_instr_rewrite_src in copy propagation
We were rolling our own rewrite_src variant in copy-propagation.  Let's
stop doing that and use the ones in core NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand f72a8d1cf0 nir: Add a function for rewriting the condition of an if statement
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand 300d729436 nir: Add and use initializer #defines for nir_src and nir_dest
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand 6702ebce57 nir: Modernize the out-of-SSA pass
The out-of-SSA pass was one of the first passes written when getting SSA
up-and-going (for obvious reasons).  As such, it came before a lot of the
nifty SSA-based helpers were introduced.  This commit modernizes it so that
we're no longer doing nearly as much manual banging on use/def sets.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Jason Ekstrand 7ee0216e2d nir/validate: Validate SSA def parent instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-05-08 17:16:13 -07:00
Ian Romanick 3bdbc1e436 nir: Delete all traces of nir_op_flog
Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick ad51f9b421 nir: Don't produce nir_op_flog from GLSL IR
All paths that produce GLSL IR for NIR lower ir_unop_log.  All paths
that consume NIR will explode if they geta nir_op_flog.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick e0a17f6e31 nir: Delete all traces of nir_op_fexp
Nothing produces it, and nothing can consume it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Ian Romanick a45d55f17c nir: Don't produce nir_op_fexp from GLSL IR
All paths that produce GLSL IR for NIR lower ir_unop_exp.  All paths
that consume NIR will explode if they geta nir_op_fexp.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-05-08 12:12:54 -07:00
Matt Turner 8e029105c2 nir: Allow feq/fne/ieq/ine to be optimized with inot.
instructions in affected programs:     380 -> 376 (-1.05%)
helped:                                2

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner f5cf74d8ba nir: Recognize (a < c || b < c) as min(a, b) < c.
... and (a >= c) || (b >= c) as max(a, b) >= c.

Similar to commit 97e6c1b9.

total instructions in shared programs: 6182276 -> 6182180 (-0.00%)
instructions in affected programs:     6400 -> 6304 (-1.50%)
helped:                                68
HURT:                                  4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner ceb8b739ce nir: Recognize trivial min/max.
No changes, but does prevent some regressions in the next commit.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner 8ae559971a nir: Recognize i2b(b2i(x)) as x.
Helps the same set of programs as the previous commit.

instructions in affected programs:     4490 -> 4346 (-3.21%)
helped:                                8

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Matt Turner 74697e2844 nir: Recognize imul(b2i(a), b2i(b)) as a logical AND.
Four shaders in Unreal 4's Sun Temple are helped, and gain SIMD16
because we avoid an integer multiplication.

instructions in affected programs:     2353 -> 2245 (-4.59%)
helped:                                4
GAINED:                                4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-05-07 10:51:05 -07:00
Zoë Blade 05e7f7f438 Fix a few typos
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-04-27 17:28:29 +03:00
Matt Turner f251ea393b nir: Transform pow(x, 4) into (x*x)*(x*x). 2015-04-24 11:39:01 -07:00
Jason Ekstrand 125574d1ef nir/lower_source_mods: Don't propagate register sources
The nir_lower_source_mods pass does a weak form of copy propagation to
clean up all of the mov-with-negate's that get generated.  However, we
weren't properly checking that the sources were SSA and so we could end up
moving a register read which is not, in general, valid.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 296131f467 nir: Rewrite instr_rewrite_src
The old code wasn't correctly handling the case where the new value of the
source contains an indirect.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand d61bd972d8 nir/locals_to_regs: Hanadle indirect accesses of length-1 arrays
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 06f3c98b9d nir/locals_to_regs: Initialize registers with constant initializers
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 4e9b376594 nir/locals_to_regs: Pass around the nir_shader rather than a void * mem_ctx
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand f50f59d3d9 nir: Add a simple growing array data structure
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:41 -07:00
Jason Ekstrand 8b900e7405 nir/types: Make glsl_get_length smarter
Previously, this function returned the number of elements for structures
and arrays and 0 for everything else.  In NIR, this is almost never what
you want because we also treat matricies as arrays so you have to
special-case constantly.  This commit  glsl_get_length treat matrices
as an array of columns by returning the number of columns instead of 0

This also fixes a bug in locals_to_regs caused by not checking for the
matrix case in one place.

v2: Only special-case for matrices and return a length of 0 for vectors as
    we did before.  This was needed to not break the TGSI-based drivers and
    doesn't really affect NIR at the moment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 7e1d21edbf nir: Move get_const_initializer_load from vars_to_ssa to NIR core
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand ba88760202 nir/lower_vars_to_ssa: Pass around the nir_shader instead of a void mem_ctx
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand e79120afdc nir/print: Print the closing paren on load_const instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 02f03fc0f1 nir/tex: Use the correct return size for query_levels and lod
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 94669cb534 nir: Refactor tex_instr_dest_size to use a switch statement
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:40 -07:00
Jason Ekstrand 73cc76362d nir/lower_vars_to_ssa: Actually look for indirects when determining aliasing
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-22 18:10:39 -07:00
Matt Turner 4dacb212fd nir: Allow abs/neg in select peephole pass.
total instructions in shared programs: 4314531 -> 4308949 (-0.13%)
instructions in affected programs:     429085 -> 423503 (-1.30%)
helped:                                1680
HURT:                                  0
GAINED:                                0
LOST:                                  111

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-17 11:01:34 -07:00
Rob Clark e14af4c067 nir/builder: add nir_builder_insert_after_instr()
For lowering if/else, I need a way to insert at the end of the previous
block.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:34:15 -04:00
Ian Romanick 94aab6cde6 nir: Convert the if-test for num_inputs == 2 to an assertion
Suggested by Jason on a different patch after some comments /
questions by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabott0@gmail.com>
2015-04-16 09:56:49 -07:00
Ian Romanick 4cf5ca5ca5 nir: Try commutative sources in CSE
Shader-db results:

GM45 NIR:
total instructions in shared programs: 4082044 -> 4081919 (-0.00%)
instructions in affected programs:     27609 -> 27484 (-0.45%)
helped:                                44

Iron Lake NIR:
total instructions in shared programs: 5678776 -> 5678646 (-0.00%)
instructions in affected programs:     27406 -> 27276 (-0.47%)
helped:                                45

Sandy Bridge NIR:
total instructions in shared programs: 7329995 -> 7329096 (-0.01%)
instructions in affected programs:     142035 -> 141136 (-0.63%)
helped:                                406
HURT:                                  19

Ivy Bridge NIR:
total instructions in shared programs: 6769314 -> 6768359 (-0.01%)
instructions in affected programs:     140820 -> 139865 (-0.68%)
helped:                                423
HURT:                                  2

Haswell NIR:
total instructions in shared programs: 6183693 -> 6183298 (-0.01%)
instructions in affected programs:     96538 -> 96143 (-0.41%)
helped:                                303
HURT:                                  4

Broadwell NIR:
total instructions in shared programs: 7501711 -> 7498170 (-0.05%)
instructions in affected programs:     266403 -> 262862 (-1.33%)
helped:                                705
HURT:                                  5
GAINED:                                4

v2: Rebase on top of Connor's fix.

v3: Convert the if-test for num_inputs == 2 to an assertion.  Suggested
by Jason after some comments / questions by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Connor Abbott <cwabbott0@gmail.com>
2015-04-15 18:15:59 -07:00
Ian Romanick bc672e261c nir: Fix typo in "ushr by 0" algebraic replacement
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "10.5" <mesa-stable@lists.freedestkop.org>
2015-04-14 16:41:04 -07:00
Ian Romanick 67a8610caf nir: Silence unused parameter warnings
nir/nir.h: In function 'nir_validate_shader':
nir/nir.h:1567:56: warning: unused parameter 'shader' [-Wunused-parameter]
 static inline void nir_validate_shader(nir_shader *shader) { }
                                                        ^
nir/nir_opt_cse.c: In function 'src_is_ssa':
nir/nir_opt_cse.c:165:32: warning: unused parameter 'data' [-Wunused-parameter]
 src_is_ssa(nir_src *src, void *data)
                                ^
nir/nir_opt_cse.c: In function 'dest_is_ssa':
nir/nir_opt_cse.c:171:35: warning: unused parameter 'data' [-Wunused-parameter]
 dest_is_ssa(nir_dest *dest, void *data)
                                   ^

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-14 16:41:04 -07:00
Connor Abbott 47a1b4841d nir/cse: fix bug with comparing non-per-component sources
We weren't comparing the right number of components when checking
swizzles. Use nir_ssa_alu_instr_num_src_components() to do the right
thing.

No piglit regressions, and no fixes either.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-14 19:07:44 -04:00
Kenneth Graunke b3e286c457 nir: Store num_direct_uniforms in the nir_shader.
Storing this here is pretty sketchy - I don't know if any driver other
than i965 will want to use it.  But this will make it a lot easier to
generate NIR code at link time.  We'll probably rework it anyway.

(Ian suggested making nir_assign_var_locations_scalar_direct_first
 simply modify the nir_shader's fields, rather than passing pointers
 to them.  If this stays long term, we should do that.  But Jason and
 I suspect we'll be reworking this area again in the near future.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-11 11:39:48 -07:00
Rob Clark f596135616 nir: fix bit of cargo-culting in lower_idiv
I guess I was looking too much at how lower_system_values worked when
writing lower_idiv.

Since ttn wasn't emitting load_var for sysvals and the only drivers
using lower_idiv were using ttn, I think nothing was broken as a result.
But might as well fix this before it becomes a problem.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-11 10:43:16 -04:00
Rob Clark 58add76791 nir: split out lower_sub from lower_negate
Originally you had to have one or the other.  But actually I don't want
either.  (Or rather I want whatever is the minimum # of instructions.)

TODO: not sure where the best place to insert a check that driver hasn't
set *both* lower_negate and lower_sub?

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 10:43:16 -04:00
Kenneth Graunke 500da98e0b nir: Constify nir_lower_sampler's gl_shader_program pointer.
Now that we're not generating linker errors, we don't actually modify
this.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-04-10 02:16:33 -07:00
Kenneth Graunke 709b88ccd8 nir: Remove linker_error calls from nir_lower_samplers().
These should never happen.  Plus, NIR passes really shouldn't be
reporting linker errors - this is past link time.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-04-10 02:16:31 -07:00
Kenneth Graunke 99264b7f37 nir: Make nir_lower_samplers take a gl_shader_stage, not a gl_program *.
We don't actually need a gl_program struct.  We only used it to
translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage
(i.e. MESA_SHADER_VERTEX).  We may as well just pass that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-04-10 02:16:29 -07:00
Kenneth Graunke 4b27391cad nir: Move gl_shader_stage enum from mtypes.h to shader_enums.h.
I want to use this in some code that doesn't currently include mtypes.h.
It seems like a better place for it anyway.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-04-10 02:16:27 -07:00
Jason Ekstrand 11694737fc nir: Make nir_*_instr_create take a nir_shader instead of a void * context
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-04-07 14:34:21 -07:00
Kenneth Graunke a10d493715 nir: Implement a nir_sweep() pass.
This pass performs a mark and sweep pass over a nir_shader's associated
memory - anything still connected to the program will be kept, and any
dead memory we dropped on the floor will be freed.

The expectation is that this will be called when finished building and
optimizing the shader.  However, it's also fine to call it earlier, and
many times, to free up memory earlier.

v2: (feedback from Jason Ekstrand)
- Skip sweeping impl->start_block, as it's already in the CF list.
- Don't sweep SSA defs (they're owned by their defining instruction)
- Don't steal phi sources (they're owned by nir_phi_instr).
- Don't steal tex->src (it's owned by the tex_inst itself)
- Don't sweep dereference chains (top-level dereferences are owned by
  the instruction; sub-dereferences are owned by the parent deref).
- Don't sweep sources and destinations (SSA defs are handled as part of
  the defining instruction, and registers are handled as part of
  function implementations).
- Just steal instructions; don't walk them (no longer required).

v3: (feedback from Jason Ekstrand)
- Steal indirect sources from nir_src/nir_dest.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:14 -07:00
Kenneth Graunke de2014cf1e nir: Allocate dereferences out of their parent instruction or deref.
Jason pointed out that variable dereferences in NIR are really part of
their parent instruction, and should have the same lifetime.

Unlike in GLSL IR, they're not used very often - just for intrinsic
variables, call parameters & return, and indirect samplers for
texturing.  Also, nir_deref_var is the top-level concept, and
nir_deref_array/nir_deref_record are child nodes.

This patch attempts to allocate nir_deref_vars out of their parent
instruction, and any sub-dereferences out of their parent deref.
It enforces these restrictions in the validator as well.

This means that freeing an instruction should free its associated
dereference chain as well.  The memory sweeper pass can also happily
ignore them.

v2: Rename make_deref to evaluate_deref and make it take a nir_instr *
    instead of void *.  This involves adding &instr->instr everywhere.
    (Requested by Jason Ekstrand.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:14 -07:00
Kenneth Graunke 4f4b04b7c7 nir: Allocate nir_ssa_def::uses/if_uses out of the instruction.
We can't allocate them out of the nir_ssa_def itself, because it may not
be ralloc'd (for example, nir_dest embeds a nir_ssa_def).

However, allocating them out of the instruction should work.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:13 -07:00
Kenneth Graunke 900498bd11 nir: Allocate nir_phi_src values out of the nir_phi_instr.
Phi sources are part of the phi instruction and should have the same
lifetime.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:13 -07:00
Kenneth Graunke b05d53404c nir: Allocate nir_call_instr::params out of the nir_call itself.
The lifetime of the params array needs to be match the nir_call_instr
itself.  So, allocate it using the instruction itself as the context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-07 14:34:13 -07:00
Jason Ekstrand 2e3b35a1cb nir/lower_tex_projector: Don't use designated initializers
These don't work in MSVC or in older versions of GCC

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89899
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-07 11:49:39 -07:00
Matt Turner d131630c08 nir: Remove fsin_reduced/fcos_reduced.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-06 10:13:22 -07:00
Matt Turner 5c71cf8531 glsl: Remove never used sin_reduced/cos_reduced.
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-06 10:13:22 -07:00
Rob Clark f2ecc95e44 nir: add lowering for idiv/udiv/umod
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).

A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:

  http://people.freedesktop.org/~robclark/div-lowering.py

I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Eric Anholt <eric@anholt.net> (vc4)
2015-04-05 09:20:35 -04:00
Rob Clark 7880bea2fb nir: fix typo for f2b/i2b/b2i expressions (v2)
v2: discovered that i2b/b2i are also confused

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-05 08:56:24 -04:00
Rob Clark 6829d76e02 nir: add option to lower slt/sge/seq/sne
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0.  But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).

v2: use b2f rather than u2f

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-05 08:56:24 -04:00
Jason Ekstrand 9c53e80b9b nir/lower_samplers: Use the right memory context for realloc'ing tex sources
As of da5ec2a, we allocate instruction sources out of the instruction
itself.  When we realloc the texture sources we need to use the right
memory context or ralloc will get angry and assert-fail

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-04-03 17:02:20 -07:00
Jason Ekstrand 52e718097f nir: Add a cubemap normalizing pass
This commit adds a pass to L1-normalize cube-map coordinates.  Some hardware
such as i965 requires that largest cube-map coordinate is +-1.  We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

v2 (Suggested by Eric):
 - Do a vector fabs and split into components later
 - Move to core NIR

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-03 14:12:49 -07:00
Jason Ekstrand dccc57eaba nir/from_ssa: Don't set reg->parent_instr for ssa_undef instructions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-04-03 14:04:31 -07:00
Jason Ekstrand 7bdba4a245 nir: Add a src_get_parent_instr function
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-04-03 14:04:12 -07:00
Eric Anholt ea811b7868 nir: Add a lowering pass for texture projectors.
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-03 11:50:24 -07:00
Eric Anholt 64bdfc698d nir: Add an interface to turn a nir_src into a nir_ssa_def.
We use nir_ssa_defs for nir_builder args, so this takes a nir_src and
makes one so it can be passed in.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-03 11:50:22 -07:00
Eric Anholt ec02970205 nir: Add an interface for the builder to insert instructions before.
So far we'd only used nir_builder to build brand new programs.  But if
we're doing modifications to instructions (like in a lowering pass), then
we want to generate new stuff before the instruction we're modifying.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-03 11:50:18 -07:00
Kenneth Graunke da5ec2ac0b nir: Allocate nir_tex_instr::sources out of the instruction itself.
The lifetime of the sources array needs to be match the nir_tex_instr
itself.  So, allocate it using the instruction itself as the context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:03 -07:00
Kenneth Graunke 7380c641b1 nir: Allocate predecessor and dominance frontier sets from block itself.
These sets are part of the block, and their lifetime needs to match the
block itself.  So, allocate them using the block itself as the context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:02 -07:00
Kenneth Graunke 131444e1c5 nir: Allocate register fields out of the register itself.
The lifetime of each register's use/def/if_use sets needs to match the
register itself.  So, allocate them using the register itself as the
context.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:01 -07:00
Kenneth Graunke 587b3a20a1 nir: Make nir_create_function() strdup the function name.
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.

We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:20:00 -07:00
Kenneth Graunke f61b6c3e48 nir: Free dead variables when removing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:19:58 -07:00
Kenneth Graunke f4e4491080 nir: Combine remove_dead_local_vars() and remove_dead_global_vars().
We can just pass a pointer to the list of variables, and reuse the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-02 14:19:56 -07:00
Jason Ekstrand ca3b4d6d17 nir/opt_peephole_ffma: Fix a couple typos in a comment
Acked-by: Matt Turner <mattst88@gmail.com>
2015-04-02 11:09:37 -07:00
Jason Ekstrand 0573d0e484 nir/print: Correctly print swizzles for explicitly sized alu sources
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-02 10:21:18 -07:00
Matt Turner 781badee7a nir: Remove useless ftrunc inside f2i/f2u.
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 97e6c1b957 nir: Recognize (a < b || a < c) as a < max(b, c).
Doesn't work for analogous && cases, because of NaNs.

total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs:     42000 -> 41117 (-2.10%)
helped:                                403

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner a2b6e908cf nir: Add addition/multiplication identities of exp/log.
instructions in affected programs:     2858 -> 2808 (-1.75%)
helped:                                12

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 099c729b4c nir: Add identities for the log function.
The rcp(log(x)) pattern affects instruction counts.

instructions in affected programs:     144 -> 138 (-4.17%)
helped:                                6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 8a6ae384b2 nir: Add identities for the exponential function.
No changes in shader-db.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner e26783d445 nir: Recognize another open coded lrp.
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs:     4876 -> 4720 (-3.20%)
helped:                                58
HURT:                                  10

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner e82437e141 nir: Recognize open coded lrp.
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs:     34773 -> 33083 (-4.86%)
helped:                                147
HURT:                                  6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Jason Ekstrand 7f344721b1 nir/peephole_ffma: Be less agressive about fusing multiply-adds
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs:     355876 -> 349811 (-1.70%)
helped:                                1455
HURT:                                  14
GAINED:                                5
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand a8c8b3b872 nir: Add a dedicated ffma peephole optimization
i965/nir: Use the dedicated ffma peephole

total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs:     1292790 -> 1268660 (-1.87%)
helped:                                5999
HURT:                                  457
GAINED:                                4
LOST:                                  9

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand e06a3d0282 nir: Move the compare-with-zero optimizations to the late section
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs:     4230 -> 4286 (1.32%)
helped:                                0
HURT:                                  12

While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand da294f9b2f nir/algebraic: Add a seperate section for "late" optimizations
i965/nir: Use the late optimizations

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 1779dc060f nir/algebraic: Remove a duplicate optimization
This optimization is repeated verbatim above

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 22ee7eeb4e nir/algebraic: #define around structure definitions
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions.  To solve this, we play the
age-old header file trick and just #define around it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 793a94d6b5 nir/print: Don't print extra swizzzle components
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw.  This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.

Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
2015-04-01 12:49:49 -07:00
Eric Anholt 15b03b7964 nir: Recognize a pattern of bool frobbing from TGSI KILL_IF.
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value.  Only, in NIR we want a proper
bool once again, so we compare with 0.  This is a lot of pointless extra
instructions.

total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs:     1342 -> 1309 (-2.46%)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt 6e8d4a2f80 nir: Recognize a pattern for doing b2f without the opcode.
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand.  This is common when generating NIR from TGSI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Kenneth Graunke 72b06fb08e nir: Fix copy and pasted error message in nir_validate.
These are nir_cf_nodes, not ALU instructions.
Also, use unreachable() to preempt said review feedback.

v2: Do it right (thanks Ilia).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-28 09:36:46 -07:00
Kenneth Graunke bf2c3bc316 nir: Lower subtraction to add with negation when !lower_negate.
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.

Suggested by Connor Abbott.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:34 -07:00
Kenneth Graunke 06f7bea96a nir: Add builder helpers for MOVs with ALU sources and swizzling MOVs.
These will be useful for prog->nir and tgsi->nir.

v2: Don't forget to mark nir_swizzle as inline (Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:33 -07:00
Kenneth Graunke 75c922e0fe nir: Add nir_builder helpers for creating load_const intrinsics.
Both prog->nir and tgsi->nir will want to use these.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-27 21:16:33 -07:00
Eric Anholt afa9fc1561 nir: Add optional lowering of flrp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-27 13:29:48 -07:00
Kenneth Graunke 3120345f40 nir: Add glsl_float_type() wrapper.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-03-25 16:17:19 -07:00
Matt Turner babd0fa3e2 nir: Fix typo. 2015-03-24 19:14:40 -07:00
Matt Turner 3fb56805f0 nir: Recognize sat(add(b2f(a), b2f(b))) as a logical OR.
Transform this into b2f(or(a, b)).

instructions in affected programs:     432 -> 430 (-0.46%)
helped:                                2

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-24 14:43:37 -07:00
Matt Turner c31158d2cb nir: Recognize mul(b2f(a), b2f(b)) as a logical AND.
Transform this into b2f(and(a, b)).

total instructions in shared programs: 6205448 -> 6204391 (-0.02%)
instructions in affected programs:     284030 -> 282973 (-0.37%)
helped:                                903
HURT:                                  6

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-24 14:43:37 -07:00
Matt Turner 95729d2458 nir: Handle mixed scalar/vector arguments to logical and/or/xor.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-24 14:43:37 -07:00
Jason Ekstrand 8a33f95b7a nir/lower_io: Add a assign_locations function that sorts by [in]direct use
v2: Delete the set of indirectly accessed variables when we're done with it
v3: Rename from _packed to _scalar

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-19 13:18:39 -07:00
Jason Ekstrand 25db44a845 nir/lower_io: Make variable location assignment a manual operation
Previously, we just assigned variable locations in nir_lower_io.  Now, we
force the user to assign variable locations for us.  This gives the backend
a bit more control over where variables are placed.

v2: Rename from _packed to _scalar

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-19 13:18:39 -07:00
Jason Ekstrand 639115123e nir: Use a list instead of a hash_table for inputs, outputs, and uniforms
We never did a single hash table lookup in the entire NIR code base that I
found so there was no real benifit to doing it that way.  I suppose that
for linking, we'll probably want to be able to lookup by name but we can
leave building that hash table to the linker.  In the mean time this was
causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us
unique names of uniforms, etc.  This was causing massive rendering isues in
the unreal4 Sun Temple demo.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-19 13:18:38 -07:00
Matt Turner dd0d3a2c0f mesa: Replace _mesa_round_to_even() with _mesa_roundeven().
Eric's initial patch adding constant expression evaluation for
ir_unop_round_even used nearbyint. The open-coded _mesa_round_to_even
implementation came about without much explanation after a reviewer
asked whether nearbyint depended on the application not modifying the
rounding mode. Of course (as Eric commented) we rely on the application
not changing the rounding mode from its default (round-to-nearest) in
many other places, including the IROUND function used by
_mesa_round_to_even!

Worse, IROUND() is implemented using the trunc(x + 0.5) trick which
fails for x = nextafterf(0.5, 0.0).

Still worse, _mesa_round_to_even unexpectedly returns an int. I suspect
that could cause problems when rounding large integral values not
representable as an int in ir_constant_expression.cpp's
ir_unop_round_even evaluation. Its use of _mesa_round_to_even is clearly
broken for doubles (as noted during review).

The constant expression evaluation code for the packing built-in
functions also mistakenly assumed that _mesa_round_to_even returned a
float, as can be seen by the cast through a signed integer type to an
unsigned (since negative float -> unsigned conversions are undefined).

rint() and nearbyint() implement the round-half-to-even behavior we want
when the rounding mode is set to the default round-to-nearest. The only
difference between them is that nearbyint() raises the inexact
exception.

This patch implements _mesa_roundeven{f,}, a function similar to the
roundeven function added by a yet unimplemented technical specification
(ISO/IEC TS 18661-1:2014), with a small difference in behavior -- we
don't bother raising the inexact exception, which I don't think we care
about anyway.

At least recent Intel CPUs can quickly change a subset of the bits in
the x87 floating-point control register, but the exception mask bits are
not included. rint() does not need to change these bits, but nearbyint()
does (twice: save old, set new, and restore old) in order to raise the
inexact exception, which would incur some penalty.

Reviewed-by: Carl Worth <cworth@cworth.org>
2015-03-18 21:06:26 -07:00
Jason Ekstrand 27bf37ba05 nir/peephole_select: Allow uniform/input loads and load_const
Shader-db results on HSW:

total instructions in shared programs: 4174156 -> 4157291 (-0.40%)
instructions in affected programs:     145397 -> 128532 (-11.60%)
helped:                                383
HURT:                                  0
GAINED:                                20
LOST:                                  22

There are two more tests lost than gained.  However, comparing this with
GLSL IR vs. NIR results, the overall delta is reduced from 85/44
gained/lost on current master to 71/32 with this commit.  Therefore, I
think it's probably a boon since we are getting "closer" to where we were
before.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-17 17:11:05 -07:00
Jason Ekstrand 1be862c0c4 nir/peephole_select: Copy instructions into the block before the if
Previously we tried to do poor-man's copy propagation as we created the
select instructions.  Instead, this commit just moves the instructions from
the blocks inside the if into the block before.  Copy propagation will take
care of making sure we don't have any extra mov's in there for us.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-17 17:11:05 -07:00
Jason Ekstrand 8cf40ed05d nir/peephole_select: Rename are_all_move_to_phi and use a switch
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-17 17:11:05 -07:00
Kenneth Graunke f3e4b2c9d2 nir: Fix non-determinism in nir_lower_vars_to_ssa().
Previously, we stored derefs in a hash table, using the malloc'd pointer
as the key.  Then, we walked through the hash table and generated code,
based on the order of the hash table's elements.

Memory addresses returned by malloc are pretty much random, which meant
that the hash was random, and the hash table's elements would be walked
in some random order.  This led to successive compiles of the same
shader using different variable names and slightly different orderings
of phi-nodes.  Code could not be diff'd, and the final assembly would
sometimes change slightly too.

It turns out the only point of the hash table was to avoid inserting
the same node multiple times for different dereferences.  We never
actually searched the hash table!  This patch uses an intrusive
linked list instead.  Since exec_list uses head and tail sentinels,
checking prev or next against NULL will tell us whether the node is
already in the list.

Pair programming with Jason Ekstrand.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-12 13:25:39 -07:00
Kenneth Graunke 2c79f6f9c3 nir: Add intrinsics for SYSTEM_VALUE_BASE_VERTEX and VERTEX_ID_ZERO_BASE
Ian and I added these around the time Connor was developing NIR.  Now
that both exist, we should make them work together!

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-12 08:29:48 -07:00
Jason Ekstrand 90e50908d7 nir/worklist: Don't change the start index when computing the tail index
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-03-11 15:18:16 -07:00
Thomas Helland 8fb8fe46fa nir: Optimize a + neg(a)
Shader-db i965 instructions:
total instructions in shared programs: 1711180 -> 1711159 (-0.00%)
instructions in affected programs:     825 -> 804 (-2.55%)
helped:                                9
HURT:                                  0
GAINED:                                3
LOST:                                  3

Shader-db NIR instructions:
total instructions in shared programs: 606187 -> 606179 (-0.00%)
instructions in affected programs:     298 -> 290 (-2.68%)
helped:                                4
HURT:                                  0
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
2015-03-11 14:21:05 -07:00
Thomas Helland 0525f2e851 nir: Optimize (a*b)+(a*c) -> a*(b+c)
Shader-db i965 instructions:
total instructions in shared programs: 1715894 -> 1710802 (-0.30%)
instructions in affected programs:     443080 -> 437988 (-1.15%)
helped:                                1502
HURT:                                  13
GAINED:                                4
LOST:                                  4

Shader-db NIR instructions:
total instructions in shared programs: 607710 -> 606187 (-0.25%)
instructions in affected programs:     208285 -> 206762 (-0.73%)
helped:                                769
HURT:                                  8
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
2015-03-11 14:21:05 -07:00
Kenneth Graunke b9c2fa15e3 nir: Make the printer include nir_variable::location too.
Being able to see both location and driver_location can be useful when
debugging IO mistakes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-03-09 01:34:03 -07:00
Kenneth Graunke 8dcc1f2c10 nir: Only do gl_FrontFacing workaround in glsl_to_nir for the FS.
Vertex shaders can have shader inputs where location happens to be
VARYING_SLOT_FACE.  Without predicating this on the shader stage,
we suddenly end up with load_front_face intrinsics in vertex shaders,
which is nonsensical.

Fixes spec/arb_vertex_buffer_object/pos-array when using NIR for VS.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-08 20:04:02 -07:00
Kenneth Graunke c6f2abe67e nir: Plumb the shader stage into glsl_to_nir().
The next commit needs to know the shader stage in glsl_to_nir().
To facilitate that, we pass the gl_shader rather than the raw exec_list
of instructions.  This has both the exec_list and the stage.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-08 20:04:01 -07:00
Kenneth Graunke b200cbb0a4 nir: Add native_integers to nir_shader_compiler_options.
glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the
driver supports native integers.  Presumably other passes may as well.

Adding this to nir_shader_compiler_options is an easy way to provide
that information, as it's accessible via nir_shader::options.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-08 20:03:57 -07:00
Kenneth Graunke a55da73be4 nir: Try to make sense of the nir_shader_compiler_options code.
The code in glsl_to_nir is entirely dead, as we translate from GLSL to
NIR at link time, when there isn't a _mesa_glsl_parse_state to pass,
so every caller passes NULL.

glsl_to_nir seems like the wrong place to try and create the shader
compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other
translators all would have to duplicate that code.  The driver should
set this up once with whatever settings it wants, and pass it in.

Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[]
and left a comment saying: "The memory for the options is expected to be
kept in a single static copy by the driver."  This suggests the plan was
to do exactly that.  That pointer was not marked const, however, and the
dead code used a mix of static structures and ralloced ones.

This patch deletes the dead code in glsl_to_nir, instead making it take
the shader compiler options as a mandatory argument.  It creates an
(empty) options struct in the i965 driver, and makes NirOptions point
to that.  It marks the pointer const so that we can actually do so
without generating "discards const qualifier" compiler warnings.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-03-08 20:03:46 -07:00
Kenneth Graunke 2561aea6b3 nir: Delete nir_shader::user_structures and num_user_structures.
Nothing actually uses these, and the only caller of glsl_to_nir()
(brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state
pointer, meaning they'll always be NULL and 0, respectively.

Just delete them.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-08 20:03:44 -07:00
Jose Fonseca 40a4797384 nir: Use helper macros for dealing with VLAs.
v2:
- Single statement, by using memset return value as suggested by Ian
Romanick.
- No internal declaration, as suggested by Jason Ekstrand.
- Move macros to a header.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-03-04 10:52:02 +00:00
Jose Fonseca f320ecf218 nir: Use alloca instead of variable length arrays.
This is to enable the code to build with -Werror=vla in the short term,
and enable the code to build with MSVC2013 soon after.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-27 14:30:36 +00:00
Kenneth Graunke 8e62bd52f8 nir: Introduce nir_intrinsic_discard_if.
This is a conditional discard, which takes a boolean source.

Note that we don't generate ir_discard::condition today, so this
shouldn't break drivers (since none implement this intrinsic yet).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-02-24 15:24:52 -08:00
Jason Ekstrand c750ecaa12 nir/register: Add a parent_instr field
This adds a parent_instr field similar to the one for ssa_def.  The
difference here is that the parent_instr field on a nir_register can be
NULL if the register does not have a unique definition or if that
definition does not dominate all its uses.  We set this field in the
out-of-SSA pass so that backends can get SSA-like information even after
they have gone out of SSA.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-02-24 14:08:04 -08:00
Jason Ekstrand 9b9ef2aeee nir/gcm: Add some missing break statements
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-23 13:20:13 -08:00
Jason Ekstrand cb4b2ad44a nir: Copy-propagate vecN operations that are actually moves
We were already do this for ALU operations but we haven't for non-ALU
operations.  This changes that.

total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%)
NIR instructions in affected programs:     1768850 -> 1751305 (-0.99%)
helped:                                    14244
HURT:                                      124

total FS instructions in shared programs: 4083960 -> 4084036 (0.00%)
FS instructions in affected programs:     7302 -> 7378 (1.04%)
helped:                                   12
HURT:                                     51

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-23 13:19:05 -08:00
Eric Anholt 4359954d84 nir: Generalize the optimization of subs of subs from 0.
I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful.  In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.

No shader-db effects.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt 345c2b288a nir: Collapse repeated bcsels on the same argument.
vc4 results:
total instructions in shared programs: 39881 -> 39794 (-0.22%)
instructions in affected programs:     6302 -> 6215 (-1.38%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt a38038ca5e nir: When faced with a csel on !condition, just flip the arguments.
total NIR instructions in shared programs: 39426 -> 39411 (-0.04%)
NIR instructions in affected programs:     3748 -> 3733 (-0.40%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt 8e1152cb33 nir: Allow nir_opt_algebraic to see booleanness through &&, ||, ^, !.
We have some useful optimizations to drop things like 'ine a, 0' on a
boolean argument, but if 'a' came from logical operations on bools, it
couldn't tell.  These kinds of constructs appear as a result of TGSI->NIR
quite frequently (at least with if flattening), so being a little more
aggressive in detecting booleans can pay off.

v2: Add ixor as a booleanness-preserving op (Suggestion by Connor).

vc4 results:
total instructions in shared programs: 40207 -> 39881 (-0.81%)
instructions in affected programs:     6677 -> 6351 (-4.88%)

Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Eric Anholt dc982f4a85 nir: Add a couple of simplifications of csel operations.
vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-21 14:57:14 -08:00
Kenneth Graunke b6393d7040 nir: Fix the Mesa build without -DDEBUG.
With -DDEBUG -UNDEBUG, this assert uses reg_state::stack_size, which
doesn't exist, breaking the build:

assert(state->states[index].index < state->states[index].stack_size);

Switch it to ifndef NDEBUG, so the field will exist if the assertion
actually generates code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-02-20 13:43:44 -08:00
Eric Anholt bef38f62e0 nir: Drop dependency on mtypes.h for core NIR.
One less new directory necessary for gallium code that wants to interact
with NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-02-20 11:36:34 -08:00
Eric Anholt b53d035825 util: Move Mesa's bitset.h to util/.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-02-20 11:36:34 -08:00
Jason Ekstrand c7002fad90 nir/GCM: Pull unpinned instructions out of blocks while pinning
This lets us be slightly more efficient by not walking the CFG extra times.
Also, it may make it easier to ensure that GVN happens on only unpinned
instructions.

Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 8dfe6f672f nir/GCM: Use pass_flags instead of bitsets for tracking visited/pinned
Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 190073c737 nir: Add a global code motion (GCM) pass
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Use nir_dominance_lca for computing least common anscestors
 - Use the block index for comparing dominance tree depths
 - Pin things that do partial derivatives

Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand a52a4b5223 nir/instr: Change "live" to a more generic "pass_flags" field
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 3d25afc51c nir: Make nir_[cf_node/instr]_[prev/next] return null if at the end
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 902b0ccc9a nir/from_ssa: Don't try to read an invalid instruction
Right now, the nir_instr_prev function function blindly looks up the
previous element in the exec list and casts it to an instruction even if
it's the tail sentinel.  The next commit will change this to return null if
it's the first instruction.  Making this change first avoids getting a
segfault between commits.  The only reason we never noticed is that, thanks
to the way things are laid out in nir_block, the casted instruction's type
was never parallal_copy.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 0281fd0786 nir/validate: Validate SSA defs the same way we do for registers
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 34952b5671 nir/validate: Validate if_uses on registers
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 98ecb25f89 nir: Properly clean up CF nodes when we remove them
Previously, if you remved a CF node that still had instructions in it, none
of the use/def information from those instructions would get cleaned up.
Also, we weren't removing if statements from the if_uses of the
corresponding register or SSA def.  This commit fixes both of these
problems

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand e025943134 nir: use nir_foreach_ssa_def for indexing ssa defs
This is both simpler and more correct.  The old code didn't properly index
load_const instructions.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand 0167c38cac nir/from_ssa: Use the nir_block_dominance function instead of our own
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand f481a9425c nir/dominance: Add a constant-time mechanism for comparing blocks
This is mostly thanks to Connor.  The idea is to do a depth-first search
that computes pre and post indices for all the blocks.  We can then figure
out if one block dominates another in constant time by two simple
comparison operations.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:17 -08:00
Jason Ekstrand b4c5489c8a nir/dominance: Expose the dominance intersection function
Being able to find the least common anscestor in the dominance tree is a
useful thing that we may want to do in other passes.  In particular, we
need it for GCM.

v2: Handle NULL inputs by returning the other block

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-19 17:06:16 -08:00
Brian Paul 2f5597787c nir: add missing GLSL_TYPE_DOUBLE case in type_size()
To silence compiler warning about unhandled switch case.
v2: move GLSL_TYPE_DOUBLE to the "not reached" section, per Ilia.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-19 15:36:59 -07:00
Eric Anholt 2a135c470e nir: Add an ALU op builder kind of like ir_builder.h
v2: Rebase on the nir_opcodes.h python code generation support.
v3: Use SSA values, and set an appropriate writemask on dot products.
v4: Make the arguments be SSA references as well.  This lets you stack up
    expressions in the arguments of other expressions, at the cost of
    having to insert a fmov/imov if you want to swizzle.  Also, add
    the generated file to NIR_GENERATED_FILES.
v5: Use more pythonish style for iterating the list.
v6: Infer the size of the dest from the size of the srcs, and auto-swizzle
    a single small src out to the appropriate size.
v7: Add little helpers for initializing the struct, add a typedef for the
    struct like other nir types have.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v6)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v7)
2015-02-18 22:28:42 -08:00
Eric Anholt 6eadde51bb nir: Recognize and reduce duplicated fsats.
No effect on vc4 shader-db.

v2: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:51 -08:00
Eric Anholt 1907a3a7ee nir: Add a flag for lowering fsat.
vc4 cse/algebraic-disabled stats:
total instructions in shared programs: 44356 -> 44354 (-0.00%)
instructions in affected programs:     55 -> 53 (-3.64%)

v2: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:51 -08:00
Eric Anholt e5ecf8e427 nir: Add a flag for lowering ffma.
vc4 cse/algebraic-disabled stats:
total uniforms in shared programs: 13966 -> 13791 (-1.25%)
uniforms in affected programs:     435 -> 260 (-40.23%)
total instructions in shared programs: 44732 -> 44356 (-0.84%)
instructions in affected programs:     9599 -> 9223 (-3.92%)

v2: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:51 -08:00
Eric Anholt 42a8ace66e nir: Add a flag for lowering fneg/ineg.
vc4 cse/algebraic-disabled stats:
total instructions in shared programs: 44911 -> 44732 (-0.40%)
instructions in affected programs:     11371 -> 11192 (-1.57%)

v2: Fix broken iabs(isub(0, a)) transformation.
v3: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:51 -08:00
Eric Anholt cb95a228e8 nir: Add a flag for lowering fsqrt(x) to frcp(frsqrt(x)).
vc4 cse/algebraic-disabled stats:
total uniforms in shared programs: 13972 -> 13966 (-0.04%)
uniforms in affected programs:     408 -> 402 (-1.47%)
total instructions in shared programs: 44973 -> 44911 (-0.14%)
instructions in affected programs:     1551 -> 1489 (-4.00%)

v2: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:50 -08:00
Eric Anholt ccf14bca4b nir: Add lowering of POW instructions if the lower flag is set.
This could be done in a separate pass like we do in GLSL IR, but it seems
to me like having the definitions of the transformations in the two
directions next to each other makes a lot of sense.

v2: Reorder the comment about the transformation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-18 14:47:50 -08:00
Eric Anholt 8e9dbfff17 nir: Conditionalize the POW reconstruction on shader compiler options.
Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic
and other passes should try and generate certain types of opcodes or
patterns.  Extend that to NIR by defining our own struct, which is
automatically generated from the Mesa struct in glsl_to_nir and provided
directly by the driver in TGSI-to-NIR.

v2: Split out the previous two prep patches.
v3: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
2015-02-18 14:47:50 -08:00
Eric Anholt 955a6bb57d nir: Add an optional expression controlling nir_algebraic xforms.
This will be used so that we can customize the transforms for the target
GPU, so we don't un-lower expressions that had already been lowered (or
introduce new lowering transformations that not all GPUs want)

v2: Drop the complication of having the condition->index dictionary, since
    we don't actually expect there to be many different conditions (change
    by Kenneth).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-18 14:47:50 -08:00
Eric Anholt f90bb54734 nir: Add a nir_shader_compiler_options struct pointed to by the shaders.
This will be used to give the optimization passes a chance to customize
behavior for the particular target device.

v2: Rebase to master (no TGSI->NIR present)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2015-02-18 14:47:50 -08:00
Jason Ekstrand dd110cdfd8 nir: Make gl_FrontFacing a system_value
GLSL IR labels gl_FrontFacing as an input variable and not a system value.
This commit makes NIR silently translate gl_FrontFacing to a system value
so that it properly gets translated into a load_system_value intrinsic.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-02-14 13:47:16 -08:00
Jason Ekstrand 929f43851e nir/lower_phis_to_scalar: Fix some logic in is_phi_scalarizable
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-14 13:46:59 -08:00
Matt Turner 4c42e1116b nir: Recognize open-coded fmin/fmax.
And unfortunately other shaders do the same thing but with >=/<= which
we can't apply this optimization to because of NaNs.

instructions in affected programs:     23309 -> 22938 (-1.59%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-11 13:50:19 -08:00
Eric Anholt 56e21647e2 nir: Add algebraic opt for int comparisons with identical operands.
No change on shader-db on i965.

v2: Reword the comment due to feedback from Erik Faye-Lund

Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)
2015-02-11 11:52:38 -08:00
Eric Anholt 2919bdf466 nir: Fix load_const comparisons for CSE.
We want the size of a float per component, not the size of a whole vec4.

NIR instructions on i965:
total instructions in shared programs: 1261937 -> 1261929 (-0.00%)
instructions in affected programs:     114 -> 106 (-7.02%)

Looking at one of these examples (tesseract), it's from vec4 load_consts
for a MRT solid fill, which do get CSEed now that we don't memcmp off the
end of the const value and into the SSA def.  For the 1-component loads
that are common in i965, we were only memcmping off into the rest of the
usually zero-filled const_value.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-11 11:52:38 -08:00
Matt Turner a9065cef48 nir: Remove casts from void*.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-10 17:48:42 -08:00
Matt Turner bb1e007157 nir: Replace assert(0) with unreachable().
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-10 17:48:31 -08:00
Matt Turner 942b56ad05 nir: Remove unused has_indirect variable.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-10 17:48:16 -08:00
Kenneth Graunke 480ee1f0b4 nir: Mark nir_print_instr's instr pointer as const.
Printing instructions doesn't modify them, so we can mark the parameter
const.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-10 03:37:55 -08:00
Eric Anholt bff4cbdafa nir: Fix broken fsat recognizer.
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-06 15:57:55 -08:00
Eric Anholt 6706537dd4 nir: Slightly simplify algebraic code generation by reusing a struct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-02-06 15:57:55 -08:00
Connor Abbott a135f34080 nir: add an optimization to remove useless phi nodes
This removes phi nodes whose sources all point to the same thing.

Shader-db results:

total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%)
NIR instructions in affected programs:     126564 -> 122480 (-3.23%)
helped:                                615
HURT:                                  0

total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%)
FS instructions in affected programs:     24622 -> 23174 (-5.88%)
helped:                                138
HURT:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 16:00:13 -05:00
Jason Ekstrand 572d1f6e41 nir/validate: Ensure that phi sources are SSA-only
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 12:52:42 -08:00
Jason Ekstrand 5420774510 nir/validate: Validate that only float ALU outputs are saturated
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 12:46:55 -08:00
Jason Ekstrand c0df85cca4 nir/lower_source_mods: Don't lower saturate for non-float outputs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-03 12:46:38 -08:00
Jason Ekstrand f2adcd36cb nir: Add a pass to lower vector phi nodes to scalar phi nodes
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Add better comments
 - Use nir_ssa_dest_init and nir_src_for_ssa more places
 - Fix some void * casts

v3 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Rework the way we determine whether or not to sccalarize a phi node to
   make the recursion non-bogus
 - Treat load_const instructions as scalarizable

v4 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Allow uniform and input loads to be scalarizable

v5 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Also consider loads of inputs (varying, uniform, or ubo) to be
   scalarizable.  We were already doing this for load_var on uniforms and
   inputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-03 12:33:11 -08:00
Jason Ekstrand 604ae33c8b nir/opt_algebraic: Add some constant bcsel reductions
total instructions in shared programs: 5998190 -> 5997603 (-0.01%)
instructions in affected programs:     54276 -> 53689 (-1.08%)
helped:                                293

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:11:13 -08:00
Jason Ekstrand 7f19cd5a56 nir/opt_algebraic: Add some boolean simplifications
total instructions in shared programs: 5998321 -> 5998287 (-0.00%)
instructions in affected programs:     4520 -> 4486 (-0.75%)
helped:                                8

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:11:10 -08:00
Jason Ekstrand 70273c5cd5 nir/algebraic: Support specifying variable as constant or by type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand 81f77e4f3a nir/algebraic: Fail to compile of a variable is used in a replace but not the search
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand 026b5cc792 nir/search: Allow for matching variables based on types
This allows you to match on an unknown value but only if it is of a given
type.  90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.

nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand d8999bcdce nir/search: Add support for matching unknown constants
There are some algebraic transformations that we want to do but only if
certain things are constants.  For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.

nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Jason Ekstrand 5ab1489ae6 nir: Add an invalid type
This allows us to indicate a concept of an invalid type.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-29 17:07:45 -08:00
Eric Anholt fc884eadf1 nir: Add variants of some of the comparison simplifications.
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not.  vc4
results:

total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs:     4253 -> 3960 (-6.89%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-29 11:44:06 -08:00
Eric Anholt 9a3a60cb13 nir: Don't try to to-SSA ALU instructions that are already SSA.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:43:33 -08:00
Eric Anholt 68d476167c nir: Fix a bit of broken indentation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:42:08 -08:00
Eric Anholt 36c604c824 nir: Add a couple of helpers for glsl types.
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.

v2: Add a missing space.

Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-29 11:41:17 -08:00
Eric Anholt dd4d9a4e62 nir: Make vec-to-movs handle src/dest aliasing.
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends.  This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.

Fixes fs-swap-problem with my scalarizing patches.

v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
2015-01-28 16:33:34 -08:00
Jason Ekstrand bb26ebac13 nir/opcodes: Use a return type of tfloat for ldexp
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:21:40 -08:00
Jason Ekstrand f0340ff625 Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"
This reverts commit d7d340fb2f.

We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806

Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:19:47 -08:00
Jason Ekstrand d7d340fb2f nir/opcodes: Use fpclassify() instead of isnormal() for ldexp
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28 03:42:41 -08:00
Connor Abbott f1a9252def nir: fix a bug with constant folding non-per-component instructions
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.

v2: use new helper name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Connor Abbott 816f0515a2 nir: add a helper function for getting the number of source components
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.

v2: use new name nir_ssa_alu_instr_src_components()

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Jason Ekstrand dd74369a0a nir/opcodes: Don't go through doubles when constant-folding iabs
Previously, we called the abs() function in math.h.  However, this involves
unnecessarily going through double.  This commit changes it to use integers
directly with a ternary.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-26 11:25:02 -08:00
Jason Ekstrand 9bd28fe3a3 nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 11:25:02 -08:00
Jason Ekstrand 27c6e3e4ca nir: Use pointers for nir_src_copy and nir_dest_copy
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 11:24:58 -08:00
Connor Abbott 0aa31bf9c3 nir/constant_folding: use the new constant folding infrastructure
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-24 21:35:35 -08:00
Jason Ekstrand 89285e4d47 nir: add new constant folding infrastructure
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.

The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.

v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - Use mako to generate one function per opcode instead of doing piles of
   string splicing

v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - More comments and better indentation in the mako
 - Add a description of the constant expression language in nir_opcodes.py
 - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-24 21:35:35 -08:00
Connor Abbott fa4bc6c130 nir: use Python to autogenerate opcode information
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before.  In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.

v2:
 - make Opcode derive from object (Dylan)
 - don't use assert like it's a function (Dylan)
 - style fixes for fnoise, use xrange (Dylan)
 - use iterkeys() in nir_opcodes_h.py (Dylan)
 - use pydoc-style comments (Jason)
 - don't make fmin/fmax commutative and associative yet (Jason)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

v3 Jason Ekstrand <jason.ekstrand@intel.com>
 - Alphabetize source file lists
 - Generate nir_opcodes.h in the builddir instead of the source dir
 - Include $(builddir)/src/glsl/nir in the i965 build
 - Rework nir_opcodes.h generation so it generates a complete header file
   instead of one that has to be embedded inside an enum declaration
2015-01-24 21:33:56 -08:00
Eric Anholt 0680d170d1 nir: Expose nir_print_instr() for debug prints
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.

v2: Just pass a NULL state, now that it won't crash when you do.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 17:30:11 -08:00
Eric Anholt 6445a40520 nir: When asked to print with a NULL state, just use bare variable names.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 17:30:01 -08:00
Eric Anholt 447ddfc137 nir: Add nir_lower_alu_to_scalar.
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.

v2: Use the nir_src_for_ssa() helper, and another instance of
    nir_alu_src_copy().
v3: Drop the non-SSA support.  All intended callers will have SSA-only ALU
    ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
    unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
    about weird input_sizes[].

Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
2015-01-23 16:37:23 -08:00
Eric Anholt b200127816 nir: Make some helpers for copying ALU src/dests.
There aren't many users yet, but I wanted to do this from my scalarizing
pass.

v2: Constify the src arguments.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 16:37:16 -08:00
Kenneth Graunke 15063d2ad0 nir: Add algebraic optimizations for division and reciprocal.
These also exist in opt_algebraic.cpp.

total NIR instructions in shared programs: 2011430 -> 2011211 (-0.01%)
NIR instructions in affected programs:     42221 -> 42002 (-0.52%)
helped:                                    198

total i965 instructions in shared programs: 6020553 -> 6020116 (-0.01%)
i965 instructions in affected programs:     84322 -> 83885 (-0.52%)
helped:                                     394
HURT:                                       1 (by 1 instruction)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke bbd60f6d79 nir: Add algebraic optimizations for exponential/logarithmic functions.
Most of these exist in the GLSL IR algebraic pass already.  However,
SSA allows us to find more instances of the patterns.

total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs:     124189 -> 120026 (-3.35%)
helped:                                    604

total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs:     261295 -> 254507 (-2.60%)
helped:                                     1295
HURT:                                       3

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke 391fb32bbe nir: Add algebraic optimizations for simplifying comparisons.
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.

The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.

total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs:     411143 -> 405922 (-1.27%)
helped:                                    2233
HURT:                                      214

A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x.  We can always clean that up later.

total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs:     784980 -> 775093 (-1.26%)
helped:                                     4508
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke 551a752a59 nir: Add algebraic optimizations for pointless shifts.
The GLSL IR optimization pass contained these; we may as well include
them too.

v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).

No change in the number of NIR instructions on a shader-db run.

total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs:     542 -> 537 (-0.92%)
helped:                                     2 (in glamor)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke 3e56572c49 nir: Add a bunch of algebraic optimizations on logic/bit operations.
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that.  While there,
I decided to add a bunch more of them.

v2: Delete bogus fand/for optimizations (caught by Jason).

total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs:     149634 -> 146937 (-1.80%)
helped:                                    1032

total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs:     537 -> 542 (0.93%)
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke 978b0a9cda nir: Implement CSE on intrinsics that can be eliminated and reordered.
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value.  This
made ALU operations on those values appear distinct as well.

Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.

Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered.  I didn't implement
support for variables for the time being.

v2: Assert that info->num_variables == 0 (requested by Jason).

total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs:     2413496 -> 2001071 (-17.09%)
helped:                                    16872

total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs:     640654 -> 620094 (-3.21%)
helped:                                     2071
HURT:                                       585
GAINED:                                     14
LOST:                                       25

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 14:53:26 -08:00
Kenneth Graunke cbdd623f13 nir: Pull nir_instr_can_cse()'s SSA checks out of the switch.
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.

The next patch will introduce another case that requires SSA.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-23 14:53:26 -08:00
Connor Abbott 68a9d0b36f nir: add generated file to .gitignore
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-23 10:20:46 -08:00
Eric Anholt fc6938d23e nir: Fix setup of constant bool initializers.
brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value.  But once we start doing vector bools in a few commits,
we end up getting bad values.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-22 13:52:19 -08:00
Eric Anholt 534a4ec82f nir: Make an easier helper for setting up SSA defs.
Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do.  Just provide the
simpler helper, instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-22 13:52:19 -08:00
Matt Turner 28b7c6b285 nir: Replace assert(0) with unreachable().
Fixes a couple of warnings in the process.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-21 21:06:37 -08:00
Jason Ekstrand f88c6a4997 nir: Stop using designated initializers
Designated initializers with anonymous unions don't work in MSVC or
GCC < 4.6.  With a couple of constructor methods, we don't need them any
more and the code is actually cleaner.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467
Reviewed-by: Connor Abbot <cwabbott0@gmail.com>
2015-01-21 19:55:02 -08:00
Jason Ekstrand 7da60eca4f nir: Add src and dest constructors
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-21 12:21:10 -08:00
Jason Ekstrand 194f6235b3 nir: Add a nir_foreach_phi_src helper macro
Reviewed-by: Connor Abbott <cwabbott02gmail.com>
2015-01-20 16:53:29 -08:00
Vinson Lee 10a4f1e77a nir: s/malloc.h/stdlib.h/
Fix build error on Mac OS X.

  CC       nir_to_ssa.lo
nir_to_ssa.c:29:10: fatal error: 'malloc.h' file not found
         ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88478
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2015-01-16 16:14:51 -08:00
Jason Ekstrand bc6e57e019 nir/live_variables: Use a worklist
This is a rework of the liveness algorithm using a worklist as suggested by
Connor.  Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop.  Also, the stuff after the last loop
in the funciton will only ever get visited once.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 16:54:21 -08:00
Jason Ekstrand 4839d1aed1 nir: Add a worklist helper structure
A worklist is a common concept in optimizations.  This adds a structure
that we can reuse for many different types of optimizations.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 16:54:21 -08:00
Brian Paul 0aaaa13ec9 nir: fix incorrect argument passed to validate_src() in validate_tex_instr()
Silences a compiler warning.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 17:41:42 -07:00
Brian Paul aa479a69d6 nir: silence compiler warning from visit_src() call
v2: use proper argument

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 17:09:02 -07:00
Jason Ekstrand 153b8b3525 util/hash_set: Rework the API to know about hashing
Previously, the set API required the user to do all of the hashing of keys
as it passed them in.  Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it.  Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names.  This is
especially bad when the standard call looks something like

_mesa_set_add(ht, _mesa_pointer_hash(key), key);

In the above case, there is no reason why the hash set shouldn't do the
hashing for you.  We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed.  Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function.  This should make it harder to mess up
your hashing.

This is analygous to 94303a0750 where we did this for hash_table

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 13:21:27 -08:00
Jason Ekstrand 4c99e3ae78 util: Move main/set to util/hash_set
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 13:21:27 -08:00
Jason Ekstrand 8ed5305d28 hash_table: Rename insert_with_hash to insert_pre_hashed
We already have search_pre_hashed.  This makes the APIs match better.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-15 13:21:27 -08:00
Jason Ekstrand 0d05d1226e nir/algebraic: Only replace an instruction once
Without the break, it was possible that an instruction would match multiple
expressions.  If this happened, you could end up trying to replace it
multiple times and get a segfault.  This makes it so that, after a
successful replacement, it moves on to the next instruction.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand 0f85310975 nir/vars_to_ssa: Use the copy lowering from lower_var_copies
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand d3636da902 nir: Add a pass for lowering copy instructions
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand 700ba5daaf nir/vars_to_ssa: Refactor get_deref_node
This refactor allows you to more easily get the deref node associated with
a given variable.  We then use that new functionality in the
deref_may_be_aliased function instead of creating a 1-element deref chain.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand 55b5058e69 nir: Rename lower_variables to lower_vars_to_ssa
The original name wasn't particularly descriptive.  This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand 4aa6162f6e nir/tex_instr: Add a nir_tex_src struct and dynamically allocate the src array
This solves a number of problems.  First is the ability to change the
number of sources that a texture instruction has.  Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand dcb1acdea0 nir/validate: Only build in debug mode
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:24 -08:00
Jason Ekstrand 347ab2bf24 nir/lower_variables: Improve documentation
Additional description was added to a variety of places.  Also, we no
longer use the term "leaf" to describe fully-qualified direct derefs.
Instead, we simply use the term "direct" or spell it out completely.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:23 -08:00
Jason Ekstrand 8016fa39e1 nir/lower_variables: Use a for loop for get_deref_node
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:23 -08:00
Jason Ekstrand 0c0ca8b6ae nir: Use the actual FNV-1a hash for hashing derefs
We also switch to using loops rather than recursion.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:23 -08:00
Jason Ekstrand e4115ca9d8 nir: Make intrinsic flags into an enum
This should be much better for debugging as GDB will pick up on the fact
that it's an enum and actually tell you what you're looking at instead of
giving you some arbitrary hex value you have to go look up.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:23 -08:00
Jason Ekstrand ed13f4e716 nir: Use static inlines instead of macros for list getters
This should make debugging a lot easier as GDB handles static inlines much
better than macros.  Also, static inlines are typesafe.

Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:23 -08:00
Jason Ekstrand b95fae034f nir/variable: Remove the constant_value field
This was a left-over relic of GLSL IR that we aren't using for anything.
If we ever want that value again, we can add it back, but NIR constant
folding should be just as good as GLSL IR's if not better pretty soon, so
I'm not worried about it.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-15 07:20:23 -08:00