Commit Graph

1510 Commits

Author SHA1 Message Date
Jason Ekstrand b50f7f0011 nir: Add a better comment for INTRINSIC_RANGE 2016-03-25 14:04:05 -07:00
Jason Ekstrand add8c837b5 nir/glsl: Stop carying a pointer to the nir_shader in the visitor 2016-03-25 14:04:05 -07:00
Jason Ekstrand 2c3f95d6aa Merge remote-tracking branch 'public/master' into vulkan 2016-03-24 17:30:14 -07:00
Jason Ekstrand 22b343a8ec nir: Add a pass to inline functions
This commit adds a new NIR pass that lowers all function calls away by
inlining the functions.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand debf23ec68 nir/builder: Add helpers for easily inserting copy_var intrinsics
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 79dec93ead nir: Add return lowering pass
This commit adds a NIR pass for lowering away returns in functions.  If the
return is in a loop, it is lowered to a break.  If it is not in a loop,
it's lowered away by moving/deleting code as needed.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 8d61d72524 nir: Add a cursor helper for getting a cursor after any phi nodes
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 18b0166749 nir/builder: Add a helper for inserting jump instructions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 97b663481c nir/cf: Make extracting or re-inserting nothing a no-op
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 7022a673cd nir: Add a function for comparing cursors
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 124f229ece nir/cf: Handle relinking top-level blocks
This can happen if a function ends in a return instruction and you remove
the return.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 364212f1ed nir: Add a pass to repair SSA form
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand ea98d415e4 nir/vars_to_ssa: Use the new nir_phi_builder helper
The efficiency should be approximately the same.  We do a little more work
per phi node because we have to sort the predecessors.  However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.

As a side-benifit, the phi builder actually handles unreachable blocks
correctly.  The original vars_to_ssa code, because of the way it iterated
the blocks and added phi sources, didn't add sources corresponding to
predecessors of unreachable blocks.  The new strategy employed by the phi
builder creates a phi source for each predecessor and should correctly
handle unreachable blocks by setting those sources to SSA undefs.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 42ddfc611f nir/dominance: Handle unreachable blocks
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:

loop {
   if (...) {
      break;
   } else {
      break;
   }
}

In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this.  First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable.  Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand e4dc82cfcf nir: Add a phi node placement helper
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis.  Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.

v2: Add better documentation

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Rob Clark 0bea0e7141 nir: fix dangling ssadef->name ptrs
In many places, the convention is to pass an existing ssadef name ptr
when construction/initializing a new nir_ssa_def.  But that goes badly
(as noticed by garbage in nir_print output) when the original string
gets freed.

Just use ralloc_strdup() instead, and add ralloc_free() in the two
places that would care (not that the strings wouldn't eventually get
freed anyways).

Also fixup the nir_search code which was directly setting ssadef->name
to use the parent instruction as memctx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-24 08:30:04 -04:00
Jason Ekstrand a984e44abd nir/glsl: Propagate invariant into NIR alu ops
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:07 -07:00
Jason Ekstrand 91d6272c2b nir/alu_to_scalar: Propagate the "exact" bit
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand 5f39e3e165 nir/cse: Properly handle nir_ssa_def.exact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand 0dbda153aa nir/algebraic: Flag inexact optimizations
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe.  This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact".  It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:02 -07:00
Jason Ekstrand ed3a029e80 nir/algebraic: Fix fmin detection to match the spec
The previous transformation got the arguments to fmin backwards.  When NaNs
are involved, the GLSL min/max aren't commutative so it matters.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:00 -07:00
Jason Ekstrand 89545b1314 nir/algebraic: Get rid of an invlid fxor optimization
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:58 -07:00
Jason Ekstrand 3a7cb6534c nir/algebraic: Allow for flagging operations as being inexact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:55 -07:00
Jason Ekstrand a6f25fa7d7 nir/search: Propagate exactness into newly created expressions
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:52 -07:00
Jason Ekstrand ded3133d47 nir/builder: Add a flag for setting exact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:26:34 -07:00
Jason Ekstrand 4ff89377d9 nir: Add an "exact" bit to nir_alu_instr
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:26:34 -07:00
Jason Ekstrand f849f53990 nir/clone: Export nir_variable_clone
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:26:11 -07:00
Jason Ekstrand 5fe8959912 nir/clone: Expose nir_constant_clone
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:26:08 -07:00
Jason Ekstrand c4c373f156 nir: Fix whitespace
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:25:53 -07:00
Ian Romanick d7a25a9def nir: Don't abs slt and friends
No shader-db changes, but this is symmetric with the previous commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 2bb006af68 nir: Don't abs the result of b2f or b2i
In the results below, 2 SIMD16 shaders in Trine are lost.

G4X
total instructions in shared programs: 4012279 -> 4011108 (-0.03%)
instructions in affected programs: 116776 -> 115605 (-1.00%)
helped: 339
HURT: 0

total cycles in shared programs: 84315862 -> 84313584 (-0.00%)
cycles in affected programs: 1767232 -> 1764954 (-0.13%)
helped: 274
HURT: 81

Ironlake
total instructions in shared programs: 6399073 -> 6396998 (-0.03%)
instructions in affected programs: 218050 -> 215975 (-0.95%)
helped: 600
HURT: 0

total cycles in shared programs: 128892088 -> 128888810 (-0.00%)
cycles in affected programs: 2867452 -> 2864174 (-0.11%)
helped: 422
HURT: 137

Sandy Bridge
total instructions in shared programs: 8462174 -> 8460759 (-0.02%)
instructions in affected programs: 178529 -> 177114 (-0.79%)
helped: 596
HURT: 0

total cycles in shared programs: 117542276 -> 117534098 (-0.01%)
cycles in affected programs: 1239166 -> 1230988 (-0.66%)
helped: 369
HURT: 150

Ivy Bridge
total instructions in shared programs: 7775131 -> 7773410 (-0.02%)
instructions in affected programs: 162903 -> 161182 (-1.06%)
helped: 590
HURT: 0

total cycles in shared programs: 65759882 -> 65747268 (-0.02%)
cycles in affected programs: 1004354 -> 991740 (-1.26%)
helped: 467
HURT: 141

Haswell
total instructions in shared programs: 7107786 -> 7106327 (-0.02%)
instructions in affected programs: 140954 -> 139495 (-1.04%)
helped: 590
HURT: 0

total cycles in shared programs: 64668028 -> 64655322 (-0.02%)
cycles in affected programs: 967080 -> 954374 (-1.31%)
helped: 452
HURT: 149

LOST:   2
GAINED: 0

Broadwell
total instructions in shared programs: 8980029 -> 8978287 (-0.02%)
instructions in affected programs: 197232 -> 195490 (-0.88%)
helped: 715
HURT: 0

total cycles in shared programs: 70070448 -> 70055970 (-0.02%)
cycles in affected programs: 975724 -> 961246 (-1.48%)
helped: 471
HURT: 111

LOST:   2
GAINED: 0

Skylake
total instructions in shared programs: 9115178 -> 9113436 (-0.02%)
instructions in affected programs: 203012 -> 201270 (-0.86%)
helped: 715
HURT: 0

total cycles in shared programs: 68848660 -> 68834004 (-0.02%)
cycles in affected programs: 993888 -> 979232 (-1.47%)
helped: 473
HURT: 116

LOST:   2
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 348e5a71d8 nir: Simplify 0 < fabs(a)
Sandy Bridge / Ivy Bridge / Haswell
total instructions in shared programs: 8462180 -> 8462174 (-0.00%)
instructions in affected programs: 564 -> 558 (-1.06%)
helped: 6
HURT: 0

total cycles in shared programs: 117542462 -> 117542276 (-0.00%)
cycles in affected programs: 9768 -> 9582 (-1.90%)
helped: 12
HURT: 0

Broadwell / Skylake
total instructions in shared programs: 8980833 -> 8980826 (-0.00%)
instructions in affected programs: 626 -> 619 (-1.12%)
helped: 7
HURT: 0

total cycles in shared programs: 70077900 -> 70077714 (-0.00%)
cycles in affected programs: 9378 -> 9192 (-1.98%)
helped: 12
HURT: 0

G45 and Ironlake showed no change.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:47:56 -07:00
Ian Romanick 564a8b8a26 nir: Simplify 0 >= b2f(a)
This also prevented some regressions with other patches in my local
tree.

Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0

total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0

No changes on earlier platforms.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:44:57 -07:00
Ian Romanick bf0d60aa11 nir: Simplify i2b with negated or abs operand
This enables removing ssa_201 and ssa_202 in sequences like:

                 vec1 ssa_200 = flt ssa_199, ssa_194
                 vec1 ssa_201 = b2i ssa_200
                 vec1 ssa_202 = i2b -ssa_201

shader-db results:

Sandy Bridge
total instructions in shared programs: 8462257 -> 8462180 (-0.00%)
instructions in affected programs: 3846 -> 3769 (-2.00%)
helped: 35
HURT: 0

total cycles in shared programs: 117542934 -> 117542462 (-0.00%)
cycles in affected programs: 20072 -> 19600 (-2.35%)
helped: 20
HURT: 1

Ivy Bridge
total instructions in shared programs: 7775252 -> 7775137 (-0.00%)
instructions in affected programs: 3645 -> 3530 (-3.16%)
helped: 35
HURT: 0

total cycles in shared programs: 65760522 -> 65760068 (-0.00%)
cycles in affected programs: 21082 -> 20628 (-2.15%)
helped: 25
HURT: 2

Haswell
total instructions in shared programs: 7108666 -> 7108589 (-0.00%)
instructions in affected programs: 3253 -> 3176 (-2.37%)
helped: 35
HURT: 0

total cycles in shared programs: 64675726 -> 64675272 (-0.00%)
cycles in affected programs: 21034 -> 20580 (-2.16%)
helped: 26
HURT: 1

Broadwell / Skylake
total instructions in shared programs: 8980912 -> 8980835 (-0.00%)
instructions in affected programs: 3223 -> 3146 (-2.39%)
helped: 35
HURT: 0

total cycles in shared programs: 70077926 -> 70077904 (-0.00%)
cycles in affected programs: 21886 -> 21864 (-0.10%)
helped: 21
HURT: 6

G45 and Ironlake showed no change.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:43:28 -07:00
Ian Romanick a4079f1cb2 nir: Lower flrp with Boolean interpolator to bcsel
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse.  On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.

shader-db results:

G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1

total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53

Unsurprisingly, no changes on later platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:42:42 -07:00
Connor Abbott 58fe7837b8 nir: propagate bitsize information in nir_search
When we replace an expresion we have to compute bitsize information for the
replacement. We do this in two passes to validate that bitsize information
is consistent and correct: first we propagate bitsize from child nodes to
parent, then we do it the other way around, starting from the original's
instruction destination bitsize.

v2 (Iago):
- Always use nir_type_bool32 instead of nir_type_bool when generating
  algebraic optimizations. Before we used nir_type_bool32 with constants
  and nir_type_bool with variables.
- Fix bool comparisons in nir_search.c to account for bitsized types.

v3 (Sam):
- Unpack the double constant value as unsigned long long (8 bytes) in
nir_algrebraic.py.

v4 (Sam):
- Use helpers to get type size and base type from nir_alu_type.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:54:45 +01:00
Connor Abbott 3124ce699b nir: add a bit_size parameter to nir_ssa_dest_init
v2: Squash multiple commits addressing the new parameter in different
    files so we don't break the build (Iago)

v3: Fix tgsi (Samuel)

v4: Fix nir_clone.c (Samuel)

v5: Fix vc4 and freedreno (Iago)

v6 (Sam)
- Fix build errors in nir_lower_indirect_derefs
- Use helper to get type size from nir_alu_type.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:54:45 +01:00
Iago Toral Quiroga 084b24f558 nir: rename nir_const_value fields to include bitsize information
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-17 11:16:33 +01:00
Connor Abbott 9076c4e289 nir: update opcode definitions for different bit sizes
Some opcodes need explicit bitsizes, and sometimes we need to use the
double version when constant folding.

v2: fix output type for u2f (Iago)

v3: do not change vecN opcodes to be float. The next commit will add
    infrastructure to enable 64-bit integer constant folding so this is isn't
    really necessary. Also, that created problems with source modifiers in
    some cases (Iago)

v4 (Jason):
  - do not change bcsel to work in terms of floats
  - leave ldexp generic

Squashed changes to handle different bit sizes when constant
folding since otherwise we would break the build.

v2:
- Use the bit-size information from the opcode information if defined (Iago)
- Use helpers to get type size and base type of nir_alu_type enum (Sam)
- Do not fallback to sized types to guess bit-size information. (Jason)

Squashed changes in i965 and gallium/nir drivers to support sized types.
These functions should only see sized types, but we can't make that change
until we make sure that nir uses the sized versions in all the relevant places.
A later commit will address this.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Connor Abbott 6700d7e423 nir: add nir_{src,dest}_bit_size() helpers
v2: use a ternary (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Jason Ekstrand e172dbe5d2 nir: Add a bit_size to nir_register and nir_ssa_def
This really hacky commit adds a bit size to registers and SSA values.  It
also adds rules in the validator to validate that they do the right things.

It's still an open question as to whether or not we want a bit_size in
nir_alu_instr or if we just want to let it inherit from the destination.
I'm inclined to just let it inherit from the destination.  A similar
question needs to be asked about intrinsics.

v2 (Connor):
  - Relax validation: comparisons have explicit destination sizes
    and implicit source sizes.

v3 (Sam):
- Use helpers to get size and base types of nir_alu_type enum.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Jason Ekstrand 78f1919429 nir: Add explicitly sized types
v2: Fix size/type mask to properly handle 8-bit types.

v3: Add helpers to get the bitsize and base type of a
nir_alu_type enum.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:16:33 +01:00
Jordan Justen 3fd308a357 Merge remote-tracking branch 'origin/master' into vulkan 2016-03-17 01:44:07 -07:00
Jordan Justen b1e7cdfdcf nir: Lower shared var atomics during nir_lower_io
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen e3cbb9d37c nir: Add support for lowering load/stores of shared variables
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen 683c359c54 nir: Add atomic operations on variables
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen 3c807607df nir: Add compute shader shared variable storage class
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jordan Justen 26f8262698 nir/print: Add space after shader_storage var mode
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-17 01:23:40 -07:00
Jason Ekstrand 7f6a0cb29c Merge remote-tracking branch 'public/master' into vulkan 2016-03-15 14:09:50 -07:00
Jason Ekstrand 98d58e7320 nir/clone: Add support for cloning a single function_impl
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 036b209484 nir/validate: Better function validation
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand f86f3c90aa nir/print: Better function argument printing
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration.  This changes
nir_print to print the type along with each parameter or return variable.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 13969565f9 nir/print: Factor variable name lookup into a helper
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand e4bebe8a02 nir: Create function parameters in function_impl_create
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 066d3c115e nir: Add a helper for creating a "bare" nir_function_impl
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 2ef4754a20 nir: Add a new "param" variable mode for parameters and return variables
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 41ae553fda nir/glsl: Remove dead function parameter handling code
NIR has never been used on IR where we haven't already done function
inlining so this code has been dead from the beginning.  Let's just get rid
of it for now.  We can always put it back in if we decide to use NIR for
function inlining at some point in the future.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-12 15:48:36 -08:00
Jason Ekstrand 14b18aba89 nir: Add a pass for lower indirect variable dereferences
This new pass lowers load/store_var intrinsics that act on indirect derefs
to if-ladder of direct load/store_var intrinsics.  The if-ladders perform a
simple binary search on the indirect.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-08 10:41:54 -08:00
Matt Turner 905ff86198 nir: Recognize open-coded extract_u16.
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner 76289fbfa8 nir: Recognize open-coded extract_u8.
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:

   float((val >> 16u) & 0xffu)
   float((val >>  8u) & 0xffu)
   float((val >>  0u) & 0xffu)

Instead of shifting, masking, and type converting like this:

   shr(8)          g15<1>UD        g25<8,8,1>UD    0x00000010UD
   and(8)          g16<1>UD        g15<8,8,1>UD    0x000000ffUD
   mov(8)          g17<1>F         g16<8,8,1>UD

   shr(8)          g18<1>UD        g25<8,8,1>UD    0x00000008UD
   and(8)          g19<1>UD        g18<8,8,1>UD    0x000000ffUD
   mov(8)          g20<1>F         g19<8,8,1>UD

   and(8)          g21<1>UD        g25<8,8,1>UD    0x000000ffUD
   mov(8)          g22<1>F         g21<8,8,1>UD

i965 can simply extract a byte and convert to float in a single
instruction:

   mov(8)          g17<1>F         g25.2<32,8,4>UB
   mov(8)          g20<1>F         g25.1<32,8,4>UB
   mov(8)          g22<1>F         g25.0<32,8,4>UB

This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.

   instructions in affected programs: 28568 -> 27450 (-3.91%)
   helped: 7

   cycles in affected programs: 210076 -> 203144 (-3.30%)
   helped: 7

This patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4520 -> 4374 instructions (-3.23%)
 #1706: 3752 -> 3582 instructions (-4.53%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Kristian Høgsberg Kristensen b00b42d99b nir/spirv: Use the new bare sampler type 2016-02-28 11:24:05 -08:00
Jason Ekstrand c9564fd598 nir/spirv: Allow but warn for a few capabilities
Unfortunately, glslang gives us cull/clip distance and GS streams even if
the shader doesn't use it whenever a shader is declared as version 450.
This is a glslang bug, but we can easily enough ignore it for now.
2016-02-23 22:07:25 -08:00
Jason Ekstrand 040355b688 nir/spirv: Add more capabilities 2016-02-23 21:01:00 -08:00
Jason Ekstrand f49ba0f7d8 nir/spirv: Add support for multisampled textures 2016-02-21 22:02:38 -08:00
Jason Ekstrand 79c0781f44 nir/gather_info: Count textures and images 2016-02-18 11:42:36 -08:00
Jason Ekstrand 581e4468f9 nir/spirv: Add some more capabilities 2016-02-17 18:04:39 -08:00
Jason Ekstrand 979732fafc nir: Add a helper for getting the one function from a shader 2016-02-17 18:04:39 -08:00
Jason Ekstrand 8c05b44bbb nir: Add a nir_foreach_variable_safe helper 2016-02-17 18:04:39 -08:00
Kristian Høgsberg Kristensen b8da261dc7 spirv: Fix SpvOpFwidth, SpvOpFwidthFine and SpvOpFwidthCoarse
"Result is the same as computing the sum of the absolute values of
    OpDPdx and OpDPdy on P."

We were doing sum of absolute values of OpDPdx of P and OpDPdx of NULL.
2016-02-17 15:28:52 -08:00
Jason Ekstrand 88042b9f10 nir: Get rid of the C++ NIR_SRC/DEST_INIT macros
These were originally added to reduce compiler warnings but aren't really
needed.  Getting rid of them reduces the diff between the Vulkan branch and
master, so we might as well.
2016-02-12 21:35:02 -08:00
Jason Ekstrand 3c8dc1afd1 nir/spirv/glsl: Clean up the row-skipping swizzle logic a bit 2016-02-12 10:40:39 -08:00
Jason Ekstrand 4016619931 nir/spirv: Allow the clip distance capability. 2016-02-11 15:14:46 -08:00
Jason Ekstrand f710f3ca37 Merge remote-tracking branch 'mesa-public/master' into vulkan
This also reverts commit 1d65abfa58 because
now NIR handles texture offsets in a much more sane way.
2016-02-10 17:12:11 -08:00
Jason Ekstrand 8750299a42 nir: Remove the const_offset from nir_tex_instr
When NIR was originally drafted, there was no easy way to determine if
something was constant or not.  The result was that we had lots of
special-casing for constant values such as this.  Now that load_const
instructions are SSA-only, it's really easy to find constants and this
isn't really needed anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-02-10 16:33:50 -08:00
Jason Ekstrand 70dff4a55e nir/lower_vec_to_movs: Better report channels handled by insert_mov
This fixes two issues.  First, we had a use-after-free in the case where
the instruction got deleted and we tried to return mov->dest.write_mask.
Second, in the case where we are doing a self-mov of a register, we delete
those channels that are moved to themselves from the write-mask.  This
means that those channels aren't reported as being handled even though they
are.  We now stash off the write-mask before remove unneeded channels so
that they still get reported as handled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94073
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-02-10 16:33:14 -08:00
Jason Ekstrand 9be5a4bc29 nir/spirv: Fix handling of OpGroupMemberDecorate
We were pulling the member index from the wrong dword
2016-02-10 15:36:42 -08:00
Jason Ekstrand ac04c6de2c nir/spirv: Assert that struct member ids are in-bounds 2016-02-10 15:36:41 -08:00
Mark Janes 8179834030 nir/spirv: fix build_mat_subdet stack smasher
The sub-determinate implementation pattern fixed by
6a7e2904e0 has a second instance in the
same file.

With the previous algorithm, when row and j are both 3, the index
overruns the array.  This only impacts the stack on 32 bit builds.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-10 14:43:03 -08:00
Jason Ekstrand 09b3e30dc6 anv: Fix up spirv for new texture/sampler split stuff 2016-02-09 16:48:36 -08:00
Jason Ekstrand b14f4c1fd3 Merge remote-tracking branch 'mesa-public/master' into vulkan
This pulls in the separate texture/sampler stuff from upstream
2016-02-09 16:47:37 -08:00
Jason Ekstrand e01dd59b73 vtn: Use const_index helpers 2016-02-09 16:32:38 -08:00
Jason Ekstrand 768bd7f272 Merge commit '8b0fb1c152fe191768953aa8c77b89034a377f83' into vulkan
This pulls in Rob Clark's const_index changes for NIR
2016-02-09 15:30:39 -08:00
Jason Ekstrand 5ec456375e nir: Separate texture from sampler in nir_tex_instr
This commit adds the capability to NIR to support separate textures and
samplers.  As it currently stands, glsl_to_nir only sets the texture deref
and leaves the sampler deref alone as it did before and nir_lower_samplers
assumes this.  Backends can still assume that they are combined and only
look at only at the texture index.  Or, if they wish, they can assume that
they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir
all set both texture and sampler index whenever a sampler is required (the
two indices are the same in this case).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand ee85014b90 nir/tex_instr: Rename sampler to texture
We're about to separate the two concepts.  When we do, the sampler will
become optional.  Doing a rename first makes the separation a bit more
safe because drivers that depend on GLSL or TGSI behaviour will be fine to
just use the texture index all the time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 15:00:17 -08:00
Jason Ekstrand 3f42184994 nir: Add some braces around loops and ifs 2016-02-09 15:00:17 -08:00
Rob Clark ced8d3e773 nir: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Rob Clark b6cf98bc82 gtn: use const_index helpers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Rob Clark 1df3ecc1b8 nir: const_index helpers
Direct access to intr->const_index[n], where different slots have
different meanings, is somewhat confusing.

Instead, let's put some extra info in nir_intrinsic_infos[] about which
slots map to what, and add some get/set helpers.  The helpers validate
that the field being accessed (base/writemask/etc) is applicable for the
intrinsic opc, for some extra safety.  And nir_print can use this to
dump out decoded const_index fields.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-02-09 17:30:33 -05:00
Jason Ekstrand 1d65abfa58 nir/spirv: Better handle constant offsets in texture lookups 2016-02-09 10:29:05 -08:00
Jason Ekstrand 209820739b nir/spirv: Set the vtn_mode and interface type for sampler parameters 2016-02-09 10:29:05 -08:00
Jason Ekstrand de6c9c5f2e nir/inline_functions: Don't shadown variables when it isn't needed
Previously, in order to get things working, we just always shadowed
variables.  Now, we rewrite derefs whenever it's safe to do so and only
shadow if we have an in or out variable that we write or read to
respectively.
2016-02-09 10:29:05 -08:00
Jason Ekstrand b6c00bfb03 nir: Rework function parameters 2016-02-09 10:29:05 -08:00
Timothy Arceri 1aae5e8ced nir: remove unused nir_variable fields
These are used in GLSL IR to removed unused varyings and match
transform feedback variables. There is no need to use these in NIR.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-02-09 22:49:06 +11:00
Matt Turner 371c4b3c48 nir: Recognize open-coded bitfield_reverse.
Helps 11 shaders in UnrealEngine4 demos.

I seriously hope they would have given us bitfieldReverse() if we
exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that
anyway?).

instructions in affected programs: 4875 -> 4633 (-4.96%)
cycles in affected programs: 270516 -> 244516 (-9.61%)

I suspect there's a *lot* of room to improve nir_search/opt_algebraic's
handling of this. We'd actually like to match, e.g., step2 by matching
step1 once and then doing a pointer comparison for the second instance
of step1, but unfortunately we generate an enormous tuple for instead.

The .text size increases by 6.5% and the .data by 17.5%.

   text     data  bss    dec    hex  filename
  22957    45224    0  68181  10a55  nir_libnir_la-nir_opt_algebraic.o
  24461    53160    0  77621  12f35  nir_libnir_la-nir_opt_algebraic.o

I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is
in a GL 4.0 context once we expose GL 4.0.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 21:20:58 -08:00
Matt Turner 2d0d9755da nir: Handle large unsigned values in opt_algebraic.
The next patch adds an algebraic rule that uses the constant 0xff00ff00.

Without this change, the build fails with

   return hex(struct.unpack('I', struct.pack('i', self.value))[0])
   struct.error: 'i' format requires -2147483648 <= number <= 2147483647

The hex() function handles integers of any size, and assigning a
negative value to an unsigned does what we want in C. The pack/unpack is
unnecessary (and as we see, buggy).

Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
2016-02-08 20:38:17 -08:00
Matt Turner 7be8d07732 nir: Do opt_algebraic in reverse order.
Walking the SSA definitions in order means that we consider the smallest
algebraic optimizations before larger optimizations. So if a smaller
rule is part of a larger rule, the smaller one will happen first,
preventing the larger one from happening.

instructions in affected programs: 32721 -> 32611 (-0.34%)
helped: 106

In programs whose nir_optimize loop count changes (129 of them):

   before:  1164 optimization loops
   after:   1071 optimization loops

Of the 129 affected, 16 programs' optimization loop counts increased.

Prevents regressions and annoyances in the next commits.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner a8f0960816 nir: Recognize product of open-coded pow()s.
Prevents regressions in the next commit.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Matt Turner 9f02e3ab03 nir: Add opt_algebraic rules for xor with zero.
instructions in affected programs: 668 -> 664 (-0.60%)
helped: 4

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-02-08 20:38:17 -08:00
Francisco Jerez cec6fe2ad8 vtn: Clean up acos implementation.
Parameterize build_asin() on the fit coefficients so the
implementation can be shared while still using different polynomials
for asin and acos.  Also switch back to implementing acos in terms of
asin -- The improvement obtained from cancelling out the pi/2 terms
was negligible compared to the approximation error.
2016-02-08 15:23:43 -08:00
Francisco Jerez f50a651726 nir/spirv: Create integer types of correct signedness.
vtn_handle_type() creates a signed type regardless of the value of the
signedness flag, which usually doesn't make much of a difference
except when the type is used as base sampled type of an image type,
what will cause the base type of the NIR image variable to be
inconsistent with its format and cause an assertion failure in the
back-end (most likely only reproducible on Gen7), and may change the
semantics of the image intrinsic subtly (e.g. UMIN may become IMIN).
2016-02-08 15:23:35 -08:00
Jason Ekstrand 9401516113 Merge remote-tracking branch 'mesa-public/master' into vulkan 2016-02-05 15:21:11 -08:00
Jason Ekstrand 741744f691 Merge commit mesa-public/master into vulkan
This pulls in the patches that move all of the compiler stuff around
2016-02-05 15:03:44 -08:00
Matt Turner 955d052058 nir: Add lowering support for unpacking opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 9b8786eba9 nir: Add lowering support for packing opcodes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 68f8c5730b nir: Add opcodes to extract bytes or words.
The uint versions zero extend while the int versions sign extend.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 8709dc0713 glsl: Remove 2x16 half-precision pack/unpack opcodes.
i965/fs was the only consumer, and we're now doing the lowering in NIR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 9ce901058f nir: Add lowering of nir_op_unpack_half_2x16.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Matt Turner 140a886c41 nir: Make argument order of unop_convert match binop_convert.
Strangely the return and parameter types were reversed.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-02-01 10:43:57 -08:00
Emil Velikov eb63640c1d glsl: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:33 +00:00
Emil Velikov a39a8fbbaa nir: move to compiler/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-26 16:08:30 +00:00