Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by
v2:
* proper commit message and non-joke title;
* replace two 'as is' followed by 'is' to 'as-is'.
v3:
* 'a integer' => 'an integer' and similar (originally spotted by
Jason Ekstrand, I fixed a few other similar ones while at it)
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Just setting builder->exact isn't sufficient because that only applies to
instructions that are built with the builder but instructions created
manually and only inserted using the builder are left alone.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This pass is similar to propagate_invariance in the GLSL compiler. The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original. For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite. However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).
This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2. This is done 6 or 8 times and summed to produce
the final result which is written to a red texture. In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes about 100 of the new Vulkan CTS tests.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two. Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we have the better nir_foreach_block macro, there's no reason to
use the archaic block version for everything.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There's no good reason for it to be a struct of an anonymous union.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96221
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now libglsl.la depends on libnir.la so putting it in libnir.la
adds a dependency on libglsl.la that goes the wrong direction.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This lowers sampling from YUV textures to 1) one or more texture
instructions to sample each plane and 2) color space conversion to RGB.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used to select the plane to sample from for planar
textures.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I think these are not strictly necessary since the floats in them
should be automatically promoted to doubles when operated with
double sources, but it makes things more explicit at least.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise we rewrote the fadd to use itself, causing crashes in
validation. Instead, start after the last use like we should.
A brown paper bag fix. Fixes crashes in several Vulkan tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
nir_lower_wpos_ytransform() is great for OpenGL, which allows
applications to choose whether their coordinate system's origin is
upper left/lower left, and whether the pixel center should be on
integer/half-integer boundaries.
Vulkan, however, has much simpler requirements: the pixel center
is always half-integer, and the origin is always upper left. No
coordinate transform is needed - we just need to add <0.5, 0.5>.
This means that we can avoid using (and setting up) a uniform.
I thought about adding more options to nir_lower_wpos_ytransform(),
but making a new pass that never even touched uniforms seemed simpler.
v2: Use normal iterator rather than _safe variant (noticed by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rob Clark <robdclark@gmail.com>
ffma is an explicitly fused multiply add with higher precision.
The optimizer will take care of promoting mul/add to fma when
it's beneficial to do so.
This fixes failures on Gen4-5 when using this pass, as those platforms
don't actually implement fma().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The return value was used for the old nir_foreach_block callback system,
but at this point it no longer means anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
gl_FragCoord is a shader input with location == VARYING_SLOT_POS.
ARB_fragment_programs have an equivalent input at VARYING_SLOT_POS,
but it isn't called gl_FragCoord. We do want to transform it.
Matching by location guarantees we catch both.
Fixes several fp tests on a branch which uses this pass on i965.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The Y-offset needs flipping as well, similar to ddy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The original value might have been swizzled. That's taken care of in
the fmul source - we don't want to reswizzle it again.
Fixes validation failures in glsl-derivs-varyings on a branch of mine
which uses this pass in i965.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
At this point, it would require a logic error in nir_validate to not
have already populated this hashtable entry, but coverity doesn't
realize that:
CID 1265547 (#1 of 1): Dereference null return value (NULL_RETURNS)3.
dereference: Dereferencing a null pointer entry.
CID 1271039 (#1 of 1): Dereference null return value (NULL_RETURNS)3.
dereference: Dereferencing a null pointer entry.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Not sure how coverity arrives at the conclusion that we can read comp[j]
unitialized (around line 204), other than not being aware that ncomp is
greater than 1 so it won't underflow in the 'if (tex->is_array)' case.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Not 100% sure, but I think being an unsigned literal will help:
CID 1358505 (#1 of 1): Unintended sign extension
(SIGN_EXTENSION)sign_extension: Suspicious implicit sign extension:
load1->def.num_components with type unsigned char (8 bits, unsigned) is
promoted in load1->def.num_components * (load1->def.bit_size / 8) to
type int (32 bits, signed), then sign-extended to type unsigned long (64
bits, unsigned). If load1->def.num_components * (load1->def.bit_size /
8) is greater than 0x7FFFFFFF, the upper bits of the result will all be
1.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Log all the errors, and at the end dump the shader w/ error annotations
to make it easier to see where the problems are.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Caller can pass a hashtable mapping NIR object (currently instr or var,
but I guess others could be added as needed) to annotation msg to print
inline with the shader dump. As the annotation msg is printed, it is
removed from the hashtable to give the caller a way to know about any
unassociated msgs.
This is used in the next patch, for nir_validate to try to associate
error msgs to nir_print dump.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
I've added this to nir_gather_info(), but also to glsl_to_nir() as a
temporary measure, since the i965 GL driver today doesn't use
nir_gather_info() yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Drop extra #include which is otherwise unneeded (and makes this header
difficult to include from outside of src/mesa).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With algebraic-opt support for lowering div to shift, the driver would
like to be able to run this pass *after* the main opt-loop, and then
conditionally re-run the opt-loop if this pass actually lowered some-
thing.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Intel hardware does a form of multisample compression that involves an
auxilary surface called the MCS. When an MCS is in use, you have to first
sample from the MCS with a special opcode and then pass the result of that
operation into the next sample instrucion. Normally, we just do this
ourselves in the back-end, but we want to expose that functionality to NIR
so that we can use MCS values directly in NIR-based blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to nir_channel except that it lets you grab more than one
channel by providing a mask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no reason for having a macro *and* a python generator. We can
easily just do the whole thing in python. This has the advantage that we
are no longer definining ALU# macros which conflict with the ones in
brw_fs_builder.h.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These cases had the parameter removed:
nir/nir_lower_vec_to_movs.c: In function ‘try_coalesce’:
nir/nir_lower_vec_to_movs.c:124:66: warning: unused parameter ‘shader’ [-Wunused-parameter]
try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
^
nir/nir_lower_io.c: In function ‘load_op’:
nir/nir_lower_io.c:147:32: warning: unused parameter ‘state’ [-Wunused-parameter]
load_op(struct lower_io_state *state,
^
These cases had the parameter (void) silenced because the parameter was
necessary for an interface:
nir/glsl_to_nir.cpp:1900:32: warning: unused parameter 'ir' [-Wunused-parameter]
nir_visitor::visit(ir_barrier *ir)
^
nir/nir.c: In function ‘remove_use_cb’:
nir/nir.c:802:35: warning: unused parameter ‘state’ [-Wunused-parameter]
remove_use_cb(nir_src *src, void *state)
^
nir/nir.c: In function ‘remove_def_cb’:
nir/nir.c:811:37: warning: unused parameter ‘state’ [-Wunused-parameter]
remove_def_cb(nir_dest *dest, void *state)
^
Number of total warnings in my build reduced from 2543 to 2538
(reduction of 5).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's what all the call-sites once, so gets rid of a bunch of inlined
glsl_get_base_type() at the call-sites.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prep work to reduce the noise in the next patch.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Going to convert this pass to parameterized lower_io_to_temporaries, and
we want the user to be able to specify whether to lower outputs or
inputs or both. The restriction of running this pass before validate
to avoid output reads no longer applies.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A pass to lower complex (struct/array/mat) inputs/outputs to primitive
types. This allows, for example, linking that removes unused components
of a larger type which is not indirectly accessed.
In the near term, it is needed for gallium (mesa/st) support for NIR,
since only used components of a type are assigned VBO slots, and we
otherwise have no way to represent that to the driver backend. But it
should be useful for doing shader linking in NIR.
v2: use glsl_count_attribute_slots() rather than passing a type_size
fxn pointer
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Handled by tgsi_emulate for glsl->tgsi case.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 99474dc29b.
-Wpedantic is too verbose, even when applied to just a few includes.
We'll just have to deal with the issues as they come.
Reviewed-by: Brian Paul <brianp@vmware.com>
There are a couple of cycle count changes in shader-db, but it's
basically a wash.
However, with the Broadwell scalar TCS backend enabled, many
Shadow of Mordor shaders benefit from this patch. Because we don't
batch up output writes for TCS, vec4 outputs might not have all
components defined. Many output writes have a value of undef,
which is useless.
With scalar TCS, stats for tessellation shaders on Broadwell:
total instructions in shared programs: 1283000 -> 1280444 (-0.20%)
instructions in affected programs: 34302 -> 31746 (-7.45%)
helped: 71
HURT: 0
total cycles in shared programs: 10798768 -> 10780682 (-0.17%)
cycles in affected programs: 158004 -> 139918 (-11.45%)
helped: 71
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
shader-db statistics on Broadwell:
total instructions in shared programs: 8963409 -> 8962455 (-0.01%)
instructions in affected programs: 60858 -> 59904 (-1.57%)
helped: 318
HURT: 0
total cycles in shared programs: 71408022 -> 71406276 (-0.00%)
cycles in affected programs: 398416 -> 396670 (-0.44%)
helped: 199
HURT: 51
GAINED: 1
The only shaders affected were in Dota 2 Reborn.
It also sets up for the next optimization.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This better reflects what it does. I plan to add other ALU
optimizations as well, so the old name would be confusing.
In preparation for that, also move the file comments about csels
above the opt_undef_csel function, and delete the ones about there
not being other optimizations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The assert was null checking dest_arr_parent twice. The intention
seems to be to check both dest_ and src_.
Added in d3636da9
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are rounding errors with the division in i965 that affect
the mod(x,y) result when x = N * y. Instead of returning '0' it
was returning 'y'.
This lowering pass fixes those cases.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Parenthesis are needed here as ! takes precedence over the &. The
check had the opposite effect than intended.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There is no sense in having the double version of ldexp take a 64-bit
integer. Instead, let's just take a 32-bit int all the time. This also
matches what GLSL does where both variants of ldexp take a regular integer
for the exponent argument.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The new expressions are more explicit in terms of where the bits go so it's
a little easier to tell what's going on. This is the way GLSL specifies
things so it's a bit easier to verify too. It also has the benifit that
the new expressions easily vectorize so we can constant-fold vector forms
of the _split versions correctly.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_def(\([^,]*\),\s*\([^,]*\))/nir_foreach_def(\2, \1)/
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/
and similar expressions for nir_foreach_use_safe, etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_phi_src(\([^,]*\),\s*\([^,]*\))/nir_foreach_phi_src(\2, \1)/
and a similar expression for nir_foreach_phi_src_safe.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/
and similar expressions for nir_foreach_instr_safe etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ideally we'd have nir.h being included with -Wpedantic too, but it fails
with:
src/compiler/nir/nir.h:754:20: warning: ISO C++ forbids zero-size array ‘src’ [-Wpedantic]
nir_alu_src src[];
^
In file included from src/compiler/nir/glsl_to_nir.cpp:42:0:
src/compiler/nir/nir.h:919:16: warning: ISO C++ forbids zero-size array ‘src’ [-Wpedantic]
nir_src src[];
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
As they are not standard C++ and are not supported by MSVC C++ compiler.
Just have nir_imm_double match nir_imm_float above.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Some hardware (i965 on Broadwell generation, for example) does not support
natively the execution of lrp instruction with double arguments.
Add 'lower_flrp64' flag to lower this instruction in that case.
v2:
- Rename lower_flrp_double to lower_flrp64 (Jason)
- Fix typo (Jason)
- Adapt the code to define bit_size information in the opcodes.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A later patch will add lower_flrp64 option to NIR.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for ceil on doubles.
v2 (Sam):
- Improve the lowering pass to remove one bcsel (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for floor on doubles.
v2 (Sam):
- Improve the lowering pass to remove one bcsel (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for truncating doubles.
v2:
- Simplified the implementation significantly.
- Fixed the else branch, that was not doing what we wanted.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Move to compiler/nir (Iago)
v3: Use nir_imm_int() to load the constants (Sam)
v4 (Sam):
- Undo line-wrap (Jason).
- Fix comment (Jason).
- Improve generated code for get_signed_inf() function (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Group num_components and bit_size together (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, these were functions which took a callback. This meant that
the per-block code had to be in a separate function, and all the data
that you wanted to pass in had to be a single void *. They walked the
control flow tree recursively, doing a depth-first search, and called
the callback in a preorder, matching the order of the original source
code. But since each node in the control flow tree has a pointer to its
parent, we can implement a "get-next" and "get-previous" method that
does the same thing that the recursive function did with no state at
all. This lets us rewrite nir_foreach_block() as a simple for loop,
which lets us greatly simplify its users in some cases. This does
require us to rewrite every user, although the transformation from the
old nir_foreach_block() to the new nir_foreach_block() is mostly
trivial.
One subtlety, though, is that the new nir_foreach_block() won't handle
the case where the current block is deleted, which the old one could.
There's a new nir_foreach_block_safe() which implements the standard
trick for solving this. Most users don't modify control flow, though, so
they won't need it. Right now, only opt_select_peephole needs it.
The old functions are reimplemented in terms of the new macros, although
they'll go away after everything is converted.
v2: keep an implementation of the old functions around
v3 (Jason Ekstrand): A small cosmetic change and a bugfix in the loop
handling of nir_cf_node_cf_tree_last().
v4 (Jason Ekstrand): Use the _safe macro in foreach_block_reverse_call
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit adds a validator that ensures that all expressions passed
through nir_algebraic are 100% non-ambiguous as far as bit-sizes are
concerned. This way it's a compile-time error rather than a hard-to-trace
C exception some time later.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is consistent with the rename done for the rest of NIR. Currently,
"bool" is the only type specifier used in nir_opt_algebraic.py so this is
really a no-op.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, if an exception was encountered anywhere, nir_algebraic would
just die in a fire with no indication whatsoever as to where the actual bug
is. This commit makes it print out the particular search-and-replace
expression that is causing problems along with the exception. Also, it
will now report all of the errors it finds and then exit at the end like a
standard C compiler would do.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
There seemed to be missing one break in nested switchcases.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No need for it not to be const, and lets caller declare it const if
desired.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
nir_variable_mode is currently a bitflag enum, while
nir_print::print_var_decl() assumes is still a numbered list.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is used to facilitate the Vulkan binding model where each resource is
described by a (descriptor set, binding, array index) tuple.
Reviewed-by: Rob Clark <robdclark@gmail.com>
While it does rely on NIR, it's not really part of the NIR core. At the
moment, it still builds as part of libnir but that can be changed later if
desired.
Not supported by MSVC, and completely unnecessary -- inline functions
work just as well.
NIR_SRC_INIT/NIR_DEST_INIT could and probably should be replaced by the
inline functions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It doesn't seem needed, and is not available on MSVC.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Not supported by MSVC and consistent through NIR.
[Emil Velikov: rebase]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The old version of the pass only worked on globals and locals and always
left inputs, outputs, uniforms, etc. alone.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The algorithm used is different from both the naive suggestion from the
GLSL spec and the one used in GLSL IR today. Unfortunately, the GLSL IR
implementation that we have today doesn't handle denormals (for those that
care) or the case where the float source is +-inf.
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are several passes where we need to specify some set of variable
modes that the pass needs top operate on. This lets us easily do that.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Dolphin uses them a lot. Range tracking would be better in the long term,
but this two lines works fine for now.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some passes may not refer to options->..., at which point the compiler
will warn about an unused variable. Just cast to void unconditionally
to shut it up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Many shaders contain expression trees of the form:
const_1 * (value * const_2)
Reorganizing these to
(const_1 * const_2) * value
will allow constant folding to combine the constants. Sometimes, these
constants are 2 and 0.5, so we can remove a multiply altogether. Other
times, it can create more immediate constants, which can actually hurt.
Finding a good balance here is tricky. While much more could be done,
this simple patch seems to have a lot of positive benefit while having
a low downside.
shader-db results on Broadwell:
total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
instructions in affected programs: 438318 -> 435919 (-0.55%)
helped: 1502
HURT: 245
total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
cycles in affected programs: 11541788 -> 11435950 (-0.92%)
helped: 3445
HURT: 1224
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No longer used as of last commit.
v2: Rebase.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
v2 (Sam):
- Use uint64 instead of float64 for sources and destinations. (Connor)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Sam):
- Use uint64 instead of float64 for sources and destinations. (Connor)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Squash the printing doubles related patches into one patch (Sam).
v3:
- Print using PRIx64 format: long is 32-bit on some 32-bit platforms but long
long is basically always 64-bit (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Make the users to give the right bit_sizes as arguments (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
%ld and %lu aren't the right format specifiers for int64_t and uint64_t
on 32-bit (x86) systems. They're %zu on Linux and %Iu on Windows.
Use the standard C99 macros in hopes that they work everywhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In the first pass of implementing exact handling, I made a mistake with
search-and-replace. In particular, we only reallly handled exact/inexact
on the root of the tree. Instead, we need to check every node in the tree
for an exact/inexact match. As an example of this, consider the following
GLSL code
precise float a = b + c;
if (a < 0) {
do_stuff();
}
In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results. The solution is to simply bail if any of the
values are exact when matching an inexact expression.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It used to be in nir_gather_info.c until I moved it out to nir.h so it
could be re-used with some linking code that never got merged. We'll move
it back out if and when we have real code to share it with.
Oddly, this did not affect the shader where I first noticed the pattern.
That particular shader doesn't get its if-statement converted to a bcsel
because there are two assignments in the else-statement. This led to me
submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747.
shader-db results:
Sandy Bridge
total instructions in shared programs: 8467384 -> 8467069 (-0.00%)
instructions in affected programs: 36594 -> 36279 (-0.86%)
helped: 46
HURT: 0
total cycles in shared programs: 117573448 -> 117568518 (-0.00%)
cycles in affected programs: 339114 -> 334184 (-1.45%)
helped: 46
HURT: 0
Ivy Bridge / Haswell / Broadwell / Skylake:
total instructions in shared programs: 7774258 -> 7773999 (-0.00%)
instructions in affected programs: 30874 -> 30615 (-0.84%)
helped: 46
HURT: 0
total cycles in shared programs: 65739190 -> 65734530 (-0.01%)
cycles in affected programs: 180380 -> 175720 (-2.58%)
helped: 45
HURT: 1
No change on G45 or Ironlake.
I also tried these expressions, but none of them affected any shaders in
shader-db:
(('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)),
(('bcsel', a, 'b@bool', False), ('iand', a, b)),
(('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)),
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SPIR-V spec says geometry shaders are supposed to have one invocation
by default. The execution mode is only required if there are multiple
invocations.
Previously, the pass assumed that the entrypoint would be whatever function
happened to have the name "main". We really shouldn't trust in the
function names.
Reviewed-by: Rob Clark <robdclark@gmail.com>
In the first pass of implementing exact handling, I made a mistake with
search-and-replace. In particular, we only reallly handled exact/inexact
on the root of the tree. Instead, we need to check every node in the tree
for an exact/inexact match. As an example of this, consider the following
GLSL code
precise float a = b + c;
if (a < 0) {
do_stuff();
}
In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results. The solution is to simply bail if any of the
values are exact when matching an inexact expression.
This commit adds a NIR pass for lowering away returns in functions. If the
return is in a loop, it is lowered to a break. If it is not in a loop,
it's lowered away by moving/deleting code as needed.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This can happen if a function ends in a return instruction and you remove
the return.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The efficiency should be approximately the same. We do a little more work
per phi node because we have to sort the predecessors. However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
As a side-benifit, the phi builder actually handles unreachable blocks
correctly. The original vars_to_ssa code, because of the way it iterated
the blocks and added phi sources, didn't add sources corresponding to
predecessors of unreachable blocks. The new strategy employed by the phi
builder creates a phi source for each predecessor and should correctly
handle unreachable blocks by setting those sources to SSA undefs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:
loop {
if (...) {
break;
} else {
break;
}
}
In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this. First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable. Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis. Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.
v2: Add better documentation
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In many places, the convention is to pass an existing ssadef name ptr
when construction/initializing a new nir_ssa_def. But that goes badly
(as noticed by garbage in nir_print output) when the original string
gets freed.
Just use ralloc_strdup() instead, and add ralloc_free() in the two
places that would care (not that the strings wouldn't eventually get
freed anyways).
Also fixup the nir_search code which was directly setting ssadef->name
to use the parent instruction as memctx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe. This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact". It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The previous transformation got the arguments to fmin backwards. When NaNs
are involved, the GLSL min/max aren't commutative so it matters.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
No shader-db changes, but this is symmetric with the previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This also prevented some regressions with other patches in my local
tree.
Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0
total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0
No changes on earlier platforms.
v2: Modify the comments to look more like a proof.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse. On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.
shader-db results:
G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1
total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53
Unsurprisingly, no changes on later platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we replace an expresion we have to compute bitsize information for the
replacement. We do this in two passes to validate that bitsize information
is consistent and correct: first we propagate bitsize from child nodes to
parent, then we do it the other way around, starting from the original's
instruction destination bitsize.
v2 (Iago):
- Always use nir_type_bool32 instead of nir_type_bool when generating
algebraic optimizations. Before we used nir_type_bool32 with constants
and nir_type_bool with variables.
- Fix bool comparisons in nir_search.c to account for bitsized types.
v3 (Sam):
- Unpack the double constant value as unsigned long long (8 bytes) in
nir_algrebraic.py.
v4 (Sam):
- Use helpers to get type size and base type from nir_alu_type.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Squash multiple commits addressing the new parameter in different
files so we don't break the build (Iago)
v3: Fix tgsi (Samuel)
v4: Fix nir_clone.c (Samuel)
v5: Fix vc4 and freedreno (Iago)
v6 (Sam)
- Fix build errors in nir_lower_indirect_derefs
- Use helper to get type size from nir_alu_type.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Some opcodes need explicit bitsizes, and sometimes we need to use the
double version when constant folding.
v2: fix output type for u2f (Iago)
v3: do not change vecN opcodes to be float. The next commit will add
infrastructure to enable 64-bit integer constant folding so this is isn't
really necessary. Also, that created problems with source modifiers in
some cases (Iago)
v4 (Jason):
- do not change bcsel to work in terms of floats
- leave ldexp generic
Squashed changes to handle different bit sizes when constant
folding since otherwise we would break the build.
v2:
- Use the bit-size information from the opcode information if defined (Iago)
- Use helpers to get type size and base type of nir_alu_type enum (Sam)
- Do not fallback to sized types to guess bit-size information. (Jason)
Squashed changes in i965 and gallium/nir drivers to support sized types.
These functions should only see sized types, but we can't make that change
until we make sure that nir uses the sized versions in all the relevant places.
A later commit will address this.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: use a ternary (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This really hacky commit adds a bit size to registers and SSA values. It
also adds rules in the validator to validate that they do the right things.
It's still an open question as to whether or not we want a bit_size in
nir_alu_instr or if we just want to let it inherit from the destination.
I'm inclined to just let it inherit from the destination. A similar
question needs to be asked about intrinsics.
v2 (Connor):
- Relax validation: comparisons have explicit destination sizes
and implicit source sizes.
v3 (Sam):
- Use helpers to get size and base types of nir_alu_type enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Fix size/type mask to properly handle 8-bit types.
v3: Add helpers to get the bitsize and base type of a
nir_alu_type enum.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration. This changes
nir_print to print the type along with each parameter or return variable.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>