Storing this here is pretty sketchy - I don't know if any driver other
than i965 will want to use it. But this will make it a lot easier to
generate NIR code at link time. We'll probably rework it anyway.
(Ian suggested making nir_assign_var_locations_scalar_direct_first
simply modify the nir_shader's fields, rather than passing pointers
to them. If this stays long term, we should do that. But Jason and
I suspect we'll be reworking this area again in the near future.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Originally you had to have one or the other. But actually I don't want
either. (Or rather I want whatever is the minimum # of instructions.)
TODO: not sure where the best place to insert a check that driver hasn't
set *both* lower_negate and lower_sub?
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we're not generating linker errors, we don't actually modify
this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't actually need a gl_program struct. We only used it to
translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage
(i.e. MESA_SHADER_VERTEX). We may as well just pass that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I want to use this in some code that doesn't currently include mtypes.h.
It seems like a better place for it anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This pass performs a mark and sweep pass over a nir_shader's associated
memory - anything still connected to the program will be kept, and any
dead memory we dropped on the floor will be freed.
The expectation is that this will be called when finished building and
optimizing the shader. However, it's also fine to call it earlier, and
many times, to free up memory earlier.
v2: (feedback from Jason Ekstrand)
- Skip sweeping impl->start_block, as it's already in the CF list.
- Don't sweep SSA defs (they're owned by their defining instruction)
- Don't steal phi sources (they're owned by nir_phi_instr).
- Don't steal tex->src (it's owned by the tex_inst itself)
- Don't sweep dereference chains (top-level dereferences are owned by
the instruction; sub-dereferences are owned by the parent deref).
- Don't sweep sources and destinations (SSA defs are handled as part of
the defining instruction, and registers are handled as part of
function implementations).
- Just steal instructions; don't walk them (no longer required).
v3: (feedback from Jason Ekstrand)
- Steal indirect sources from nir_src/nir_dest.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).
A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:
http://people.freedesktop.org/~robclark/div-lowering.py
I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Eric Anholt <eric@anholt.net> (vc4)
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0. But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).
v2: use b2f rather than u2f
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit adds a pass to L1-normalize cube-map coordinates. Some hardware
such as i965 requires that largest cube-map coordinate is +-1. We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2 (Suggested by Eric):
- Do a vector fabs and split into components later
- Move to core NIR
Reviewed-by: Eric Anholt <eric@anholt.net>
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2: Delete the set of indirectly accessed variables when we're done with it
v3: Rename from _packed to _scalar
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we just assigned variable locations in nir_lower_io. Now, we
force the user to assign variable locations for us. This gives the backend
a bit more control over where variables are placed.
v2: Rename from _packed to _scalar
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We never did a single hash table lookup in the entire NIR code base that I
found so there was no real benifit to doing it that way. I suppose that
for linking, we'll probably want to be able to lookup by name but we can
leave building that hash table to the linker. In the mean time this was
causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us
unique names of uniforms, etc. This was causing massive rendering isues in
the unreal4 Sun Temple demo.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the
driver supports native integers. Presumably other passes may as well.
Adding this to nir_shader_compiler_options is an easy way to provide
that information, as it's accessible via nir_shader::options.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Nothing actually uses these, and the only caller of glsl_to_nir()
(brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state
pointer, meaning they'll always be NULL and 0, respectively.
Just delete them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This adds a parent_instr field similar to the one for ssa_def. The
difference here is that the parent_instr field on a nir_register can be
NULL if the register does not have a unique definition or if that
definition does not dominate all its uses. We set this field in the
out-of-SSA pass so that backends can get SSA-like information even after
they have gone out of SSA.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
One less new directory necessary for gallium code that wants to interact
with NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use nir_dominance_lca for computing least common anscestors
- Use the block index for comparing dominance tree depths
- Pin things that do partial derivatives
Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is mostly thanks to Connor. The idea is to do a depth-first search
that computes pre and post indices for all the blocks. We can then figure
out if one block dominates another in constant time by two simple
comparison operations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Being able to find the least common anscestor in the dominance tree is a
useful thing that we may want to do in other passes. In particular, we
need it for GCM.
v2: Handle NULL inputs by returning the other block
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic
and other passes should try and generate certain types of opcodes or
patterns. Extend that to NIR by defining our own struct, which is
automatically generated from the Mesa struct in glsl_to_nir and provided
directly by the driver in TGSI-to-NIR.
v2: Split out the previous two prep patches.
v3: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will be used to give the optimization passes a chance to customize
behavior for the particular target device.
v2: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Printing instructions doesn't modify them, so we can mark the parameter
const.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <jason.ekstrand@intel.com>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <jason.ekstrand@intel.com>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.
v2: Just pass a NULL state, now that it won't crash when you do.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.
v2: Use the nir_src_for_ssa() helper, and another instance of
nir_alu_src_copy().
v3: Drop the non-SSA support. All intended callers will have SSA-only ALU
ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
about weird input_sizes[].
Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
There aren't many users yet, but I wanted to do this from my scalarizing
pass.
v2: Constify the src arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do. Just provide the
simpler helper, instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should be much better for debugging as GDB will pick up on the fact
that it's an enum and actually tell you what you're looking at instead of
giving you some arbitrary hex value you have to go look up.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should make debugging a lot easier as GDB handles static inlines much
better than macros. Also, static inlines are typesafe.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This was a left-over relic of GLSL IR that we aren't using for anything.
If we ever want that value again, we can add it back, but NIR constant
folding should be just as good as GLSL IR's if not better pretty soon, so
I'm not worried about it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
parallel_copy_copy was a silly name. Also, things were getting long and
annoying, so I added a foreach macro. For historical reasons, several of
the original iterations over parallel copy entries in from_ssa used the
_safe variants of the loop. However, all of these no longer ever remove an
entry so it's ok to make them all use the normal iterator.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were doing a lazy creation of the parallel copy
instructions. This is confusing, hard to get right, and involves some
extra state tracking of the copies. This commit adds an extra walk over
the basic blocks to add the block-end parallel copies up front. This
should be much less confusing and, consequently, easier to get right. This
commit also adds more comments about parallel copies to help explain what
all is going on.
As a consequence of these changes, we can now remove the at_end parameter
from nir_parallel_copy_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Originally, this field was intended for determining if the given
instruction acted per-component or if it had mismatching source and
destination sizes that would have to be interpreted specially. However, we
can easily derive this from output_size == 0, so it's not really that
useful. Also, the values we were setting in nir_opcodes.h for this field
were completely bogus and it was never used.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit adds some algebraic properties to the metadata of each opcode
in NIR. In particular, you now know, just from the metadata, if a given
opcode is commutative or associative. This will be useful for algebraic
transformation passes that want to be able to match a + b as well as b + a
in one go.
v2: Make algebraic properties all caps. This was more consistent with the
intrinsics flags and seems better for flags in general.
Also, the enums are now declared with (1 << n) rather then hex values.
v3: fmin and fmax technically aren't commutative or associative. Things
get funny when one of the arguments is a NaN.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
As it was, we weren't ever using load_const in a non-SSA way. This allows
us to substantially simplify the load_const instruction. If we ever need a
non-SSA constant load, we can do a load_const and an imov.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There are some functions whose destinations are SSA-only and so aren't a
nir_dest. This provides a function that is capable of iterating over the
SSA definitions defined by those functions. If you want registers, you
should use the old iterator.
v2: Kenneth Graunke <kenneth@whitecape.org>:
- Fix nir_foreach_ssa_def's return value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We stopped generating predicates in glsl_to_nir some time ago. Right now,
it's all dead untested code that I'm not convinced always worked in the
first place. If we decide we want them back, we can revert this patch.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
nir_metadata_dirty was a terrible name because the parameter it takes is
the metadata to be preserved. This is really confusing because it looks
like it's doing the opposite of what it is actually doing. Now it's named
sensibly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In particular, we rename nir_tex_src_sampler_index to _sampler_offset and
add a sampler_array_size field to nir_tex_instr. This way we can pass the
size of sampler arrays through to backends even after removing the variable
information and, with it, the type.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit renames nir_instr_as_texture to nir_instr_as_tex and renames
nir_instr_type_texture to nir_instr_type_tex to be consistent with
nir_tex_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the casting operations were macros. While this is usually
fine, the casting macro used the input parameter twice leading to strange
behavior when you passed the result of another function into it. Since we
know the source and destination types explicitly, we don't loose anything
by making it a function.
Also, this gives us a nice little macro for creating cast function that
will hopefully prevent mistyping.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We used to have the number of components built into the intrinsic. This
meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4
variants. This lead to piles of switch statements to generate the correct
intrinsic names, and introspection to figure out the number of components.
We can make things much nicer by allowing "vectorized" intrinsics.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass analizes all of the load/store operations and, when a variable is
never aliased (potentially used by an indirect operation), it is lowered
directly to an SSA value. This pass translates to SSA directly and does
not require any fixup by the original to-SSA pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead, we give SSA definitions a temporary index of 0xFFFFFFFF if the
instruction does not have a block and a proper index when it actually gets
added to the list.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used a string name. It was nice for translating out of GLSL
IR (which also does that) but cumbersome the rest of the time.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is still fairly basic. It only handles ALU operations, constant
loads, and phi nodes. No texture ops or intrinsics yet.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we don't actually have an "if" instruction, this is a very common
pattern when iterating over instructions. This adds a helper function for
it to make things a little less painful.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is kind of stupidly implemented but it should be enough to get us
up and going. We probably want something better that doesn't generate all
of the redundant moves eventually. However, the i965 backend should be
able to handle the movs, so I'm not too worried about it in the short term.
These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
whitespace and automake fixes
This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Include ralloc and hash_table from the util directory
whitespace fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>