Commit Graph

199 Commits

Author SHA1 Message Date
Kenneth Graunke 5297267a1c nir: Add a pass to lower TES patch_vertices intrinsics to a constant.
In Vulkan, we always have both the TCS and TES available in the same
pipeline, so we can simply use the TCS OutputVertices execution mode
value as the TES PatchVertices built-in.

For GLSL, we handle this in the linker.  But we could use this pass
in the case when both TCS and TES are linked together, if we wanted.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-10 13:21:53 -08:00
Samuel Iglesias Gonsálvez 27cf6a369f nir: add nir_type_conversion_op()
This function returns the nir_op corresponding to the conversion between
the given nir_alu_type arguments.

This function lacks support for integer-based types with bit_size != 32
and for float16 conversion ops.

v2:
- Improve readiness of the code and delete cases that don't happen now (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Samuel Iglesias Gonsálvez 3a571fcc43 nir: add nir_get_nir_type_for_glsl_type()
v2 (Jason):
- Refactor nir_get_nir_type_for_glsl_type() to avoid using unneeded helpers (Jason)

v3:
- Use return directly (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-01-09 09:10:13 +01:00
Jason Ekstrand 62332d139c nir: Add a local variable-based copy propagation pass
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-06 16:44:28 -08:00
Jason Ekstrand 47b54a6f74 nir/lower_var_copies: Use a shader rather than a void *mem_ctx
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand 134a5ad31c nir: Make nir_copy_deref follow the "clone" pattern
We rename it to nir_deref_clone, re-order the sources to match the other
clone functions, and expose nir_deref_var_clone.  This past part, in
particular, lets us get rid of quite a few lines since we no longer have
to call nir_copy_deref and wrap it in deref_as_var.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-12-30 12:38:04 -08:00
Jason Ekstrand baf1aa1334 nir: Add foreach_register helper macros 2016-12-29 16:02:44 -08:00
Jason Ekstrand fb181196de nir: Rename convert_to_ssa lower_regs_to_ssa
This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
2016-12-29 16:02:44 -08:00
Jason Ekstrand 6d9f576b56 nir: Add a pass for moving SPIR-V continue blocks to the ends of loops
When shaders come in from SPIR-V, we handle continue blocks by placing
the contents of the continue inside of a "if (!first_iteration)".  We do
this so that we can properly handle the fact that continues in SPIR-V
jump to the continue block at the end of the loop rather than jumping
directly to the top of the loop like they do in NIR.  In particular, the
increment step of a simple for loop ends up in the continue block.  This
pass looks for this case in loops that don't actually have any continues
and moves the continue contents to the end of the loop instead.  We need
this because loop unrolling doesn't work if the increment is inside of a
condition.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-22 16:27:19 -08:00
Jason Ekstrand 1111a05f90 nir: Add an optimization pass to remove trivial continues
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-22 16:27:19 -08:00
Timothy Arceri 51daccb289 nir: add a loop unrolling pass
V2:
- tidy ups suggested by Connor.
- tidy up cloning logic and handle copy propagation
 based of suggestion by Connor.
- use nir_ssa_def_rewrite_uses to fix up lcssa phis
  suggested by Connor.
- add support for complex loop unrolling (two terminators)
- handle case were the ssa defs use outside the loop is already a phi
- support unrolling loops with multiple terminators when trip count
  is know for each terminator

V3:
- set correct num_components when creating phi in complex unroll
- rewrite update remap table based on Jasons suggestions.
- remove unrequired extract_loop_body() helper as suggested by Jason.
- simplify the lcssa phi fix up code for simple loops as per Jasons suggestions.
- use mem context to keep track of hash table memory as suggested by Jason.
- move is_{complex,simple}_loop helpers to the unroll code
- require nir_metadata_block_index
- partially rewrote complex unroll to be simpler and easier to follow.

V4:
- use rzalloc() when creating nir_phi_src but not setting pred right away
 fixes regression cause by ralloc() no longer zeroing memory.

V5:
- simplify calling of complex_unroll()
- use new loop terminator fields to get the break/continue from blocks
  and simplify loop unrolling code
- handle slightly less trivial loop terminators. if branches can
  now have instructions but can only contain a single block.
- use nir print type IR snippets in unroll function descriptions
- add better explanation and variable for why we need to clone
  additional times when the second terminator it the limiting
  terminator.
- partially convert out of ssa before unrolling loops (suggested by Jason)

v6:
- remove unused nir_builder
- use Jasons new from ssa helper
- tidy/fixup cursor use
- unroll terminators that contain control flow correctly
- unroll complex loops with control flow before the terminators
  correctly

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Thomas Helland ec8423a4b1 nir: Add a LCSAA-pass
V2: Do a "depth first search" to convert to LCSSA

V3: Small comment fixup

V4: Rebase, adapt to removal of function overloads

V5: Rebase, adapt to relocation of nir to compiler/nir
    Still need to adapt to potential if-uses
    Work around nir_validate issue

V6 (Timothy):
 - tidy lcssa and stop leaking memory
 - dont rewrite the src for the lcssa phi node
 - validate lcssa phi srcs to avoid postvalidate assert
 - don't add new phi if one already exists
 - more lcssa phi validation fixes
 - Rather than marking ssa defs inside a loop just mark blocks inside
   a loop. This is simpler and fixes lcssa for intrinsics which do
   not have a destination.
 - don't create LCSSA phis for loops we won't unroll
 - require loop metadata for lcssa pass
 - handle case were the ssa defs use outside the loop is already a phi

V7: (Timothy)
- pass indirect mask to metadata call

v8: (Timothy)
- make convert to lcssa a helper function rather than a nir pass
- replace inside loop bitset with on the fly block index logic.
- remove lcssa phi validation special cases
- inline code from useless helpers, suggested by Jason.
- always do lcssa on loops, suggested by Jason.
- stop making lcssa phis special. Add as many source as the block
  has predecessors, suggested by Jason.

V9: (Timothy)
- fix regression with the is_lcssa_phi field not being initialised
  to false now that ralloc() doesn't zero out memory.

V10: (Timothy)
- remove extra braces in SSA example, pointed out by Topi

V11: (Timothy)
- add missing support for LCSSA phis in if conditions.

V12: (Timothy)
- small tidy up suggested by Jason.
- always create lcssa phi even if it just points to an lcssa
  phi from an inner loop

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Thomas Helland 6772a17acc nir: Add a loop analysis pass
This pass detects induction variables and calculates the
trip count of loops to be used for loop unrolling.

V2: Rebase, adapt to removal of function overloads

V3: (Timothy Arceri)
 - don't try to find trip count if loop terminator conditional is a phi
 - fix trip count for do-while loops
 - replace conditional type != alu assert with return
 - disable unrolling of loops with continues
 - multiple fixes to memory allocation, stop leaking and don't destroy
   structs we want to use for unrolling.
 - fix iteration count bugs when induction var not on RHS of condition
 - add FIXME for && conditions
 - calculate trip count for unsigned induction/limit vars

V4: (Timothy Arceri)
- count instructions in a loop
- set the limiting_terminator even if we can't find the trip count for
 all terminators. This is needed for complex unrolling where we handle
 2 terminators and the trip count is unknown for one of them.
- restruct structs so we don't keep information not required after
 analysis and remove dead fields.
- force unrolling in some cases as per the rules in the GLSL IR pass

V5: (Timothy Arceri)
- fix metadata mask value 0x10 vs 0x16

V6: (Timothy Arceri)
- merge loop_variable and nir_loop_variable structs and lists suggested by Jason
- remove induction var hash table and store pointer to induction information in
  the loop_variable suggested by Jason.
- use lowercase list_addtail() suggested by Jason.
- tidy up init_loop_block() as per Jasons suggestions.
- replace switch with nir_op_infos[alu->op].num_inputs == 2 in
  is_var_basic_induction_var() as suggested by Jason.
- use nir_block_last_instr() in and rename foreach_cf_node_ex_loop() as suggested
  by Jason.
- fix else check for is_trivial_loop_terminator() as per Connors suggetions.
- simplify offset for induction valiables incremented before the exit conditions is
  checked.
- replace nir_op_isub check with assert() as it should have been lowered away.

V7: (Timothy Arceri)
- use rzalloc() on nir_loop struct creation. Worked previously because ralloc()
  was broken and always zeroed the struct.
- fix cf_node_find_loop_jumps() to find jumps when loops contain
  nested if statements. Code is tidier as a result.

V8: (Timothy Arceri)
- move is_trivial_loop_terminator() to nir.h so we can use it to assert is
  the loop unroll pass
- fix analysis to not bail when looking for terminator when the break is in the else
  rather then the if
- added new loop terminator fields: break_block, continue_from_block and
  continue_from_then so we don't have to gather these when doing unrolling.
- get correct array length when forcing unrolling of variables
  indexed arrays that are the same size as the iteration count
- add support for induction variables of type float
- update trival loop terminator check to allow an if containing
  instructions as long as both branches contain only a single
  block.

V9: (Timothy)
 - bunch of tidy ups and simplifications suggested by Jason.
 - rewrote trivial terminator detection, now the only restriction is there
   must be no nested jumps, anything else goes.
 - rewrote the iteration test to use nir_eval_const_opcode().
 - count instruction properly even when forcing an unroll.
 - bunch of other tidy ups and simplifications.

V10: (Timothy)
 - some trivial tidy ups suggested by Jason.
 - conditional fix for break inside continue branch by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-23 10:15:36 +11:00
Jason Ekstrand a620f66872 nir: Add a couple quick-and-dirty out-of-SSA helpers
These are designed for use within an optimization pass when SSA becomes
more pain than it's worth.  They're very naive and don't generate
anything close to optimal register-based NIR.  Also, they may result in
shaders which do not validate because of, for instance, registers in phi
sources.  However, the register-based into-SSA pass should be pretty
efficient at cleaning up the mess.

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-12-23 10:15:35 +11:00
Iago Toral Quiroga 5be2e785b1 nir/lower_tex: add lowering for texture gradient on shadow samplers
This is ported from the Intel lowering pass that we use with GLSL IR.
This takes care of lowering texture gradients on shadow samplers other
than cube maps. Intel hardware requires this for gen < 8.

v2 (Ken):
 - Use the helper function to retrieve ddx/ddy
 - Swizzle away size components we are not interested in

v3:
- Get rid of the ddx/ddy helper and use nir_tex_instr_src_index
  instead (Ken, Eric)

v4:
- Add a 'continue' statement if the lowering makes progress because it
  replaces the original texture instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v3)
2016-12-13 10:32:52 +01:00
Iago Toral Quiroga a8e740c354 nir/lower_tex: add lowering for texture gradient on cube maps
This is ported from the Intel lowering pass that we use with GLSL IR.
The NIR pass only handles cube maps, not shadow samplers, which are
also lowered for gen < 8 on Intel hardware. We will add support for
that in a later patch, at which point we should be able to remove
the GLSL IR lowering pass.

v2:
- added a helper to retrieve ddx/ddy parameters (Ken)
- No need to make size.z=1.0, we are only using component x anyway (Iago)

v3:
- Get rid of the ddx/ddy helper and use nir_tex_instr_src_index
  instead (Ken, Eric)

v4:
- When emitting the textureLod operation, copy all texture parameters
  from the original textureGrad() (except for ddx/ddy) using a loop
- Add a 'continue' statement if the lowering makes progress because it
  replaces the original texture instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v3)
2016-12-13 10:32:00 +01:00
Ilia Mirkin fd249c803e treewide: s/comparitor/comparator/
git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'

Just happened to notice this in a patch that was sent and included one
of the tokens in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-12 22:13:07 -05:00
Jason Ekstrand 7db009b59e nir: Remove some unused fields from nir_variable
All of these are happily set from glsl_to_nir or spirv_to_nir but their
values are never used for anything.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:10 -08:00
Jason Ekstrand 50e0b0bee3 nir: Delete most of the constant_initializer support
Constant initializers have been a constant (ha!) pain for quite some time.
While they're useful from a language perspective, people writing passes or
backends really don't want deal with them most of the time.  This commit
removes most of the constant initializer support from NIR.  It is expected
that you call nir_lower_constant_initializers VERY EARLY to ensure that
they're gone before you do anything interesting.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand f5232db9e5 nir: Add a pass for lowering away constant initializers
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-05 15:40:09 -08:00
Jason Ekstrand 19a541f496 nir: Get rid of nir_constant_data
This has bothered me for about as long as NIR has been around.  Why do we
have two different unions for constants?  No good reason other than one of
them is a direct port from GLSL IR.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-12-02 10:53:32 -08:00
Kenneth Graunke 9a179f2db0 nir: add a pass to compact clip/cull distances.
v2: Use nir_is_per_vertex_io() rather than is_arrays_of_arrays().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:23 -08:00
Kenneth Graunke 663b2e9a92 nir: Add a "compact array" flag and IO lowering code.
Certain built-in arrays, such as gl_ClipDistance[], gl_CullDistance[],
gl_TessLevelInner[], and gl_TessLevelOuter[] are specified as scalar
arrays.  Normal scalar arrays are sparse - each array element usually
occupies a whole vec4 slot.  However, most hardware assumes these
built-in arrays are tightly packed.

The new var->data.compact flag indicates that a scalar array should
be tightly packed, so a float[4] array would take up a single vec4
slot, and a float[8] array would take up two slots.

They are still arrays, not vec4s, however.  nir_lower_io will generate
intrinsics using ARB_enhanced_layouts style component qualifiers.

v2: Add nir_validate code to enforce type restrictions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-22 00:29:23 -08:00
Kenneth Graunke ad9d4a4f8d nir: Generalize the "is per-vertex variable?" helpers and export them.
I want this function for nir_gather_info(), and realized it's basically
the same as the ones in nir_lower_io().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-11-11 09:17:07 +11:00
Dave Airlie b16dff2d88 nir: add conditional discard optimisation (v4)
This is ported from GLSL and converts

if (cond)
	discard;

into
discard_if(cond);

This removes a block, but also is needed by radv
to workaround a bug in the LLVM backend.

v2: handle if (a) discard_if(b) (nha)
cleanup and drop pointless loop (Matt)
make sure there are no dependent phis (Eric)
v3: make sure only one instruction in the then block.
v4: remove sneaky tabs, add cursor init (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-11-10 05:46:33 +10:00
Timothy Arceri 2e423ca147 nir: stop adjusting driver location for varying packing
As of 59864e8e02 we just use the location assigned by the front-end and
no longer need this for i965.

Since there were some issues in the logic with assigning arrays the same
driver location if they didn't start at the same location just remove it
and let other drivers implement a solution if needed when they add
ARB_enhanced_layouts support.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-10-26 14:29:36 +11:00
Timothy Arceri e1af20f18a nir/i965/anv/radv/gallium: make shader info a pointer
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Timothy Arceri 094fe3a959 nir: move nir_shader_info to a common compiler header
This will allow use to stop copying values between structs and
will also simplify handling handling these values in the shader cache.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Jason Ekstrand ae032e5ea6 nir: Remove some no longer needed asserts
Now that the NIR casting functions have type assertions, we have a bunch of
assertions that aren't needed anymore.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:39 -07:00
Jason Ekstrand 2ed17d46de nir: Make nir_foo_first/last_cf_node return a block instead
One of NIR's invariants is that control flow lists always start and end
with blocks.  There's no good reason why we should return a cf_node from
these functions since we know that it's always a block.  Making it a block
lets us remove a bunch of code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:37 -07:00
Jason Ekstrand 7a3bcadf4e nir: Add asserts to the casting functions
This makes calling nir_foo_as_bar a bit safer because we're no longer 100%
trusting in the caller to ensure that it's safe.  The caller still needs to
do the right thing but this ensures that we catch invalid casts with an
assert rather than by reading garbage data.  The one downside is that we do
use the casts a bit in nir_validate and it's not a validate_assert.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:24 -07:00
Eric Anholt 36f0f03182 nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.

This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise.  However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.

For now, just turn on aggressive flattening on vc4.  i965 will need some
tuning to avoid regressions.  It does looks like this may be useful to
replace freedreno code.

Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.

vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs:     17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs:     3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs:     26085 -> 24649 (-5.51%)

v2: Update shader-db output.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2016-09-22 11:10:21 +03:00
Dave Airlie 7bf76563e2 glsl: add subpass image type (v2)
SPIR-V/Vulkan have a special image type for input attachments
called the subpass type. It has different characteristics than
other images types.

The main one being it can only be an input image to fragment
shaders and loads from it are relative to the frag coord.

This adds support for it to the GLSL types. Unfortunately
we've run out of space in the sampler dim in types, so we
need to use another bit.

v2: Fixup subpass input name (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-16 15:16:31 +10:00
Jason Ekstrand ed65e6ef49 nir: Add a flag to lower_io to force "sample" interpolation
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-15 13:31:43 -07:00
Kenneth Graunke 2d8a3fa7ea nir: Report progress from nir_lower_phis_to_scalar.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:51 -07:00
Kenneth Graunke 32630e211e nir: Report progress from nir_lower_alu_to_scalar.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:49 -07:00
Rob Clark 1a8424ceba nir: move tex_instr_remove_src
I want to re-use this in a different pass, so move to nir.h

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-14 13:45:32 -04:00
Jason Ekstrand 88a2a2e053 nir/gcm: Add global value numbering support
Unlike the current CSE pass, global value numbering is capable of detecting
common values even if one does not dominate the other.  For instance, in
you have

if (...) {
   ssa_1 = ssa_0 + 7;
   /* use ssa_1 */
} else {
   ssa_2 = ssa_0 + 7;
   /* use ssa_2 */
}

Global value numbering doesn't care about dominance relationships so it
figures out that ssa_1 and ssa_2 are the same and converts this to

if (...) {
   ssa_1 = ssa_0 + 7;
   /* use ssa_1 */
} else {
   /* use ssa_1 */
}

Obviously, we just broke SSA form which is bad.  Global code motion,
however, will repair this for us by turning this into

ssa_1 = ssa_0 + 7;
if (...) {
   /* use ssa_1 */
} else {
   /* use ssa_1 */
}

This intended to eventually mostly replace CSE.  However, conventional CSE
may still be useful because it's less of a scorched-earth approach and
doesn't require GCM.  This makes it a bit more appropriate for use as a
clean-up in a late optimization run.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-09-08 20:53:01 -07:00
Connor Abbott 356d101af3 nir: remove some fields from nir_shader_compiler_options
I accidentally added these with 0dc4cab. Oops!
2016-09-03 00:49:58 -04:00
Connor Abbott 0dc4cabee2 nir: add nir_after_phis() cursor helper
And re-implement nir_after_cf_node_and_phis() using it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-09-03 00:37:48 -04:00
Kenneth Graunke 93bfa1d7a2 nir: Change nir_shader_get_entrypoint to return an impl.
Jason suggested adding an assert(function->impl) here.  All callers
of this function actually want ->impl, so I decided just to change
the API.

We also change the nir_lower_io_to_temporaries API here.  All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally.  Folding this change in
avoids the need to change it and change it back.

v2: Fix one call I missed in ir3_compiler (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 19:18:24 -07:00
Francisco Jerez 97ac3eba58 nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.
The NIR representation of framebuffer fetch is the same as the GLSL
IR's until interface variables are lowered away, at which point it
will be translated to load output intrinsics.  The GLSL-to-NIR pass
just needs to copy the bits over to the NIR program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:33:29 -07:00
Eric Anholt 9f1411d1ec nir: Add an IO scalarizing pass using the intrinsic's first_component.
vc4 wants to have per-scalar IO load/stores so that dead code elimination
can happen on a more granular basis, which it has been doing in the
backend using a multiplication by 4 of the intrinsic's driver_location.
We can represent it properly in the NIR using the first_component field,
though.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt 24728637e2 nir: Move the undef of nir_intrinsics.h macros to the .h.
I wanted to include this from nir_builder as well, so it also needed the
undefs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Kenneth Graunke 7603b4d3a1 nir: Make nir_alu_srcs_equal non-static.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-08-04 00:41:07 -07:00
Matt Turner d1f6f65697 glsl: Separate overlapping sentinel nodes in exec_list.
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.

No difference in OglBatch7 (n=20).

Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Jason Ekstrand d9156efc52 nir/lower_tex: Add support for lowering coordinate offsets
On i965, we can't support coordinate offsets for texelFetch or rectangle
textures.  Previously, we were doing this with a GLSL pass but we need to
do it in NIR if we want those workarounds for SPIR-V.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:48:53 -07:00
Jason Ekstrand 09135cd55a nir: Add a helper for determining the type of a texture source
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
2016-07-22 16:27:35 -07:00
Jason Ekstrand dc9f2436c3 nir: Add a nir_deref_foreach_leaf helper
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-20 15:29:55 -07:00
Kenneth Graunke 707ca00fce nir: Add nir_load_interpolated_input lowering code.
Now nir_lower_io can optionally produce load_interpolated_input
and load_barycentric_* intrinsics for fragment shader inputs.

flat inputs continue using regular load_input.

v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean
    passing (in response to review feedback from Chris Forbes).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:01:00 -07:00
Kenneth Graunke 2496462479 nir: Add new intrinsics for fragment shader input interpolation.
Backends can normally handle shader inputs solely by looking at
load_input intrinsics, and ignore the nir_variables in nir->inputs.

One exception is fragment shader inputs.  load_input doesn't capture
the necessary interpolation information - flat, smooth, noperspective
mode, and centroid, sample, or pixel for the location.  This means
that backends have to interpolate based on the nir_variables, then
associate those with the load_input intrinsics (say, by storing a
map of which variables are at which locations).

With GL_ARB_enhanced_layouts, we're going to have multiple varyings
packed into a single vec4 location.  The intrinsics make this easy:
simply load N components from location <loc, component>.  However,
working with variables and correlating the two is very awkward; we'd
much rather have intrinsics capture all the necessary information.

Fragment shader input interpolation typically works by producing a
set of barycentric coordinates, then using those to do a linear
interpolation between the values at the triangle's corners.

We represent this by introducing five new load_barycentric_* intrinsics:

- load_barycentric_pixel     (ordinary variable)
- load_barycentric_centroid  (centroid qualified variable)
- load_barycentric_sample    (sample qualified variable)
- load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample())
- load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset())

Each of these take the interpolation mode (smooth or noperspective only)
as a const_index, and produce a vec2.  The last two also take a sample
or offset source.

We then introduce a new load_interpolated_input intrinsic, which
is like a normal load_input intrinsic, but with an additional
barycentric coordinate source.

The intention is that flat inputs will still use regular load_input
intrinsics.  This makes them distinguishable from normal inputs that
need fancy interpolation, while also providing all the necessary data.

This nicely unifies regular inputs and interpolateAt functions.
Qualifiers and variables become irrelevant; there are just
load_barycentric intrinsics that determine the interpolation.

v2: Document the interp_mode const_index value, define a new
    BARYCENTRIC() helper rather than using SYSTEM_VALUE() for
    some of them (requested by Jason Ekstrand).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 11:00:45 -07:00
Kenneth Graunke ac1181ffbe compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.
Likewise, rename the enum type to glsl_interp_mode.

Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input.  Also, SPIR-V calls these "decorations" rather than "qualifiers".

Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
  -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
  -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-07-17 19:26:48 -07:00
Timothy Arceri 448adfbc67 nir: use the same driver location for packed varyings
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Timothy Arceri 0eea6b3297 nir: add new intrinsic field for storing component offset
This offset is used for packing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-07 10:26:43 +10:00
Giuseppe Bilotta 60a27ad122 Remove wrongly repeated words in comments
Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by

v2:
    * proper commit message and non-joke title;
    * replace two 'as is' followed by 'is' to 'as-is'.
v3:
    * 'a integer' => 'an integer' and similar (originally spotted by
      Jason Ekstrand, I fixed a few other similar ones while at it)

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2016-06-23 13:55:03 -07:00
Jason Ekstrand 202751fbb7 nir: Add a pass for propagating invariant decorations
This pass is similar to propagate_invariance in the GLSL compiler.  The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-20 12:02:45 -07:00
Jason Ekstrand 4d3b8318a7 nir/info: Get rid of uses_interp_var_at_offset
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-06-03 19:29:28 -07:00
Rob Clark dfbae7d64f nir/algebraic: support for power-of-two optimizations
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-03 16:05:03 -04:00
Jordan Justen 6f316c9d86 nir: Make lowering gl_LocalInvocationIndex optional
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-06-01 19:29:02 -07:00
Jason Ekstrand 15e553daf0 nir: Make nir_const_value a union
There's no good reason for it to be a struct of an anonymous union.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96221
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-05-26 16:03:44 -07:00
Kristian Høgsberg Kristensen a41b57679f nir: Add a lowering pass for YUV textures
This lowers sampling from YUV textures to 1) one or more texture
instructions to sample each plane and 2) color space conversion to RGB.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-24 10:14:56 -07:00
Kristian Høgsberg Kristensen 29921ee987 nir: Add new 'plane' texture source type
This will be used to select the plane to sample from for planar
textures.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-24 10:14:56 -07:00
Kenneth Graunke 6e5d86c07a nir: Add a simple nir_lower_wpos_center() pass for Vulkan drivers.
nir_lower_wpos_ytransform() is great for OpenGL, which allows
applications to choose whether their coordinate system's origin is
upper left/lower left, and whether the pixel center should be on
integer/half-integer boundaries.

Vulkan, however, has much simpler requirements: the pixel center
is always half-integer, and the origin is always upper left.  No
coordinate transform is needed - we just need to add <0.5, 0.5>.
This means that we can avoid using (and setting up) a uniform.

I thought about adding more options to nir_lower_wpos_ytransform(),
but making a new pass that never even touched uniforms seemed simpler.

v2: Use normal iterator rather than _safe variant (noticed by Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rob Clark <robdclark@gmail.com>
2016-05-20 14:30:00 -07:00
Rob Clark a0ef26c1c2 nir/print: add support for print annotations
Caller can pass a hashtable mapping NIR object (currently instr or var,
but I guess others could be added as needed) to annotation msg to print
inline with the shader dump.  As the annotation msg is printed, it is
removed from the hashtable to give the caller a way to know about any
unassociated msgs.

This is used in the next patch, for nir_validate to try to associate
error msgs to nir_print dump.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-05-17 10:05:20 -04:00
Juan A. Suarez Romero 80535873bb nir: add double input bitmap
This bitmap tracks which input attributes are double-precision.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-17 09:05:54 +02:00
Matt Turner 4191551262 nir: Mark nir_start_block()/nir_impl_last_block() with returns_nonnull.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-16 11:06:15 -07:00
Kenneth Graunke 6d65b0c6dc nir: Add a nir->info.uses_interp_var_at_offset flag.
I've added this to nir_gather_info(), but also to glsl_to_nir() as a
temporary measure, since the i965 GL driver today doesn't use
nir_gather_info() yet.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-15 23:50:28 -07:00
Rob Clark 79d6409a14 nir: return progress from lower_idiv
With algebraic-opt support for lowering div to shift, the driver would
like to be able to run this pass *after* the main opt-loop, and then
conditionally re-run the opt-loop if this pass actually lowered some-
thing.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-15 17:25:48 -04:00
Jason Ekstrand f47faa4316 nir: Add texture opcodes and source types for multisample compression
Intel hardware does a form of multisample compression that involves an
auxilary surface called the MCS.  When an MCS is in use, you have to first
sample from the MCS with a special opcode and then pass the result of that
operation into the next sample instrucion.  Normally, we just do this
ourselves in the back-end, but we want to expose that functionality to NIR
so that we can use MCS values directly in NIR-based blorp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:34:44 -07:00
Jason Ekstrand a2f50d87b6 nir: Add an info bit for uses_sample_qualifier
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-14 13:33:52 -07:00
Jason Ekstrand 1b72c31e1f nir/algebraic: Separate ffma lowering from fusing
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-11 11:44:35 -07:00
Rob Clark dfbabc6bad nir/lower-io: add support for lowering inputs
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:11 -04:00
Rob Clark b085016f94 nir: rename lower_outputs_to_temporaries -> lower_io_to_temporaries
Since it will gain support to lower inputs, give it a more generic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-11 12:20:11 -04:00
Rob Clark 5261947260 nir: lower-io-types pass
A pass to lower complex (struct/array/mat) inputs/outputs to primitive
types.  This allows, for example, linking that removes unused components
of a larger type which is not indirectly accessed.

In the near term, it is needed for gallium (mesa/st) support for NIR,
since only used components of a type are assigned VBO slots, and we
otherwise have no way to represent that to the driver backend.  But it
should be useful for doing shader linking in NIR.

v2: use glsl_count_attribute_slots() rather than passing a type_size
    fxn pointer

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-11 12:20:11 -04:00
Rob Clark b10cc24519 nir: passthrough-edgeflags support
Handled by tgsi_emulate for glsl->tgsi case.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark 3a939d034e nir: add lowering pass for glBitmap
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark 12c18ce476 nir: add lowering pass for glDrawPixels
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Rob Clark b26645a00f nir: add lowering pass for y-transform
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-05-11 12:20:11 -04:00
Connor Abbott 4fab8dd5ea nir: remove now-unused nir_foreach_block*_call()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-05-05 16:19:42 -07:00
Samuel Iglesias Gonsálvez 2ab2d2e588 nir: Separate 32 and 64-bit fmod lowering
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 08:07:49 +02:00
Samuel Iglesias Gonsálvez b902377a56 nir/lower_double_ops: lower mod()
There are rounding errors with the division in i965 that affect
the mod(x,y) result when x = N * y. Instead of returning '0' it
was returning 'y'.

This lowering pass fixes those cases.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-04 08:07:49 +02:00
Rob Clark 64abf6d404 nir: clamp-color-output support
Handled by tgsi_emulate for glsl->tgsi case.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-04-30 14:56:19 -04:00
Jason Ekstrand 70f89dd75e nir: Switch the arguments to nir_foreach_def
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_def(\([^,]*\),\s*\([^,]*\))/nir_foreach_def(\2, \1)/

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand 5015260a05 nir: Switch the arguments to nir_foreach_use and friends
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/

and similar expressions for nir_foreach_use_safe, etc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand 9464d8c498 nir: Switch the arguments to nir_foreach_function
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand e63766fb4b nir: Switch the arguments to nir_foreach_parallel_copy_entry
This matches the "foreach x in container" pattern found in many other
programming languages.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand 8564916d01 nir: Switch the arguments to nir_foreach_phi_src
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_phi_src(\([^,]*\),\s*\([^,]*\))/nir_foreach_phi_src(\2, \1)/

and a similar expression for nir_foreach_phi_src_safe.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-04-28 15:54:48 -07:00
Jason Ekstrand 707e72f13b nir: Switch the arguments to nir_foreach_instr
This matches the "foreach x in container" pattern found in many other
programming languages.  Generated by the following regular expression:

s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/

and similar expressions for nir_foreach_instr_safe etc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-28 15:54:48 -07:00
Samuel Iglesias Gonsálvez db07b46f2c nir: Add lrp lowering for doubles in opt_algebraic
Some hardware (i965 on Broadwell generation, for example) does not support
natively the execution of lrp instruction with double arguments.

Add 'lower_flrp64' flag to lower this instruction in that case.

v2:
   - Rename lower_flrp_double to lower_flrp64 (Jason)
   - Fix typo (Jason)
   - Adapt the code to define bit_size information in the opcodes.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Samuel Iglesias Gonsálvez 443600d51e nir: rename lower_flrp to lower_flrp32
A later patch will add lower_flrp64 option to NIR.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Iago Toral Quiroga 072613b3f3 nir/lower_double_ops: lower round_even()
At least i965 hardware does not have native support for round_even() on doubles.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-28 12:01:40 +02:00
Iago Toral Quiroga bf91df7f7f nir/lower_double_ops: lower fract()
At least i965 hardware does not have native support for fract() on doubles.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:40 +02:00
Iago Toral Quiroga 126a1ac03f nir/lower_double_ops: lower ceil()
At least i965 hardware does not have native support for ceil on doubles.

v2 (Sam):
   - Improve the lowering pass to remove one bcsel (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 12:01:36 +02:00
Iago Toral Quiroga 29541ec531 nir/lower_double_ops: lower floor()
At least i965 hardware does not have native support for floor on doubles.

v2 (Sam):
  - Improve the lowering pass to remove one bcsel (Jason)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:35 +02:00
Iago Toral Quiroga 5fab3d178b nir/lower_double_ops: lower trunc()
At least i965 hardware does not have native support for truncating doubles.

v2:
  - Simplified the implementation significantly.
  - Fixed the else branch, that was not doing what we wanted.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:13 +02:00
Connor Abbott 2ea3649c63 nir: add a pass to lower some double operations
v2: Move to compiler/nir (Iago)
v3: Use nir_imm_int() to load the constants (Sam)
v4 (Sam):
  - Undo line-wrap (Jason).
  - Fix comment (Jason).
  - Improve generated code for get_signed_inf() function (Connor).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-28 11:58:13 +02:00
Connor Abbott 122d27e998 nir: rewrite nir_foreach_block and friends
Previously, these were functions which took a callback. This meant that
the per-block code had to be in a separate function, and all the data
that you wanted to pass in had to be a single void *. They walked the
control flow tree recursively, doing a depth-first search, and called
the callback in a preorder, matching the order of the original source
code. But since each node in the control flow tree has a pointer to its
parent, we can implement a "get-next" and "get-previous" method that
does the same thing that the recursive function did with no state at
all. This lets us rewrite nir_foreach_block() as a simple for loop,
which lets us greatly simplify its users in some cases. This does
require us to rewrite every user, although the transformation from the
old nir_foreach_block() to the new nir_foreach_block() is mostly
trivial.

One subtlety, though, is that the new nir_foreach_block() won't handle
the case where the current block is deleted, which the old one could.
There's a new nir_foreach_block_safe() which implements the standard
trick for solving this. Most users don't modify control flow, though, so
they won't need it. Right now, only opt_select_peephole needs it.

The old functions are reimplemented in terms of the new macros, although
they'll go away after everything is converted.

v2: keep an implementation of the old functions around
v3 (Jason Ekstrand): A small cosmetic change and a bugfix in the loop
   handling of nir_cf_node_cf_tree_last().
v4 (Jason Ekstrand): Use the _safe macro in foreach_block_reverse_call

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-27 15:05:40 -07:00
Jason Ekstrand d800b7daa5 nir: Add a helper for figuring out what channels of an SSA def are read
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-04-26 19:55:04 -07:00
Connor Abbott b6dc940ec2 nir: rename nir_foreach_block*() to nir_foreach_block*_call()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-20 09:47:05 -07:00
Rob Clark eddfc97709 nir/lower-tex: add srgb->linear lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-19 17:13:50 -04:00