No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
No shader-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
llvm/spir-v spits out some struct a { struct b {} }, but it
doesn't deref, it casts (struct a) to (struct b), reconstruct
struct derefs instead of casts for these.
v2: use ssa_def_rewrite uses, rework the type restrictions (Jason)
v3: squish more stuff into one function, drop unused temp (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will allow us to make use of the selection control support in
spirv and the GL support provided by EXT_control_flow_attributes.
Note this only supports if-statements as we dont support switches
in NIR.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
This will allow us to make use of the loop control support in
spirv and the GL support provided by EXT_control_flow_attributes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Intel, we have both bindless and bindful and we'd like to use them at
the same time if we can so we need to be able to distinguish at the NIR
level between the two. This also fixes nir_lower_tex to properly handle
bindless in its tex_texture_size and get_texture_lod helpers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2 (Topi):
- Make bit-size handling order be 16-bit, 32-bit, 64-bit
- Clamp lower exponent range at -28 instead of -30.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With vkpipelinedb Samuel discovered a regression since we stopped
stripping types at the spir-v level.
This adds a check to the var splitting for the case where it
asserts the type hasn't changed, when it has just created a bare
type, and it's different than the original type which has an explicit
stride.
This also removes a pointless assert that also triggers.
Fixes: 3b3653c4cf (nir/spirv: don't use bare types, remove assert in split vars for testing)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.
While we are at it, factorize a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only the exponent needs to be 32-bit signed integer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix this build error with GCC 4.4.7.
CC nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘load_element_from_ssa_entry_value’:
nir/nir_opt_copy_prop_vars.c:454: error: unknown field ‘ssa’ specified in initializer
nir/nir_opt_copy_prop_vars.c:455: error: unknown field ‘def’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: unknown field ‘component’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: extra brace group at end of initializer
nir/nir_opt_copy_prop_vars.c:456: error: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:456: warning: excess elements in union initializer
nir/nir_opt_copy_prop_vars.c:456: warning: (near initialization for ‘(anonymous).<anonymous>’)
Fixes: 96c32d7776 ("nir/copy_prop_vars: handle load/store of vector elements")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109810
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Rather than skipping code that looked like this:
loop {
...
if (cond) {
do_work_1();
continue;
} else {
break;
}
do_work_2();
}
Previously we would turn this into:
loop {
...
if (cond) {
do_work_1();
continue;
} else {
do_work_2();
break;
}
}
This was clearly wrong. This change checks for this case and makes
sure we now leave it for nir_opt_dead_cf() to clean up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In some cases, we can end up with varying structs that aren't split to
their member variables. nir_compact_varyings attempted to record these
as unmovable, so it would leave them be. Unfortunately, it didn't do
it right for non-vector/scalar types. It set the mask to:
((1 << (elements * dmul)) - 1) << var->data.location_frac
where elements is the number of vector elements. For structures and
other non-vector/scalars, elements is 0...so the whole mask became 0.
This caused nir_compact_varyings to assign other varyings on top of
the structure varying's location (as it appeared to take up no space).
To combat this, we just set elements to 4 for non-vector/scalar types,
so that the entire slot gets marked as unmovable.
Fixes KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in on iris.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Users of this function expect alu to be a supported comparision
if the induction variable is not NULL. Since we attempt to
override the return values if the first limit is not a const, we
must make sure we are dealing with a valid comparision before
overriding the alu instruction.
Fixes an unreachable in inverse_comparison() with the game
Assasins Creed Odyssey.
Fixes: 3235a942c1 ("nir: find induction/limit vars in iand instructions")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216
Values inside the offsets parameter of textureGatherOffsets are required to be
constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET,
GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET].
As this range is never outside [-32, 31] for all existing drivers inside mesa,
we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr.
Right now only Nvidia hardware supports this in hardware, so we can turn this
on inside Nouveau for the NIR path as it is already enabled with the TGSI one.
v2: use memcpy instead of for loops
add missing bits to nir_instr_set
don't show offsets if they are all 0
v3: default offsets aren't all 0
v4: rename offsets -> tg4_offsets
rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Not sure how ptr_stride should be taken into account if at all here
v2: reorder check to avoid src walking (Jason)
v3: remove is_cast_cast checks, keep going afterwards (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For OpenCL we never want to strip the info from the types, and it makes
type comparisons easier in later stages. We might later need a nir pass to
strip this for GLSL, but so far the only regression is the assert and Jason
said removing that is fine.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit db57db5317. When
building IR, nothing is really immutable and, since C has no concept of
constness propagating beyond the first pointer, we have to be vary
careful with how we use it. To just throw const into a function like
this is a lie.
Instead, we should just drop the unneeded const in spirv_to_nir which
this commit does along with the revert.
local memory is too small to require 64 bit pointers, so cast the array index
to a 32 bit value to save up on 64 bit operations.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This pass was originally written for lowering TCS output reads and
writes but it is also applicable just about anything including UBOs,
SSBOs, and shared variables.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This one's a tiny bit better than what we had in spirv_to_nir because it
emits a binary tree rather than a linear walk. It also doesn't leave
around unneeded bcsel instructions for a constant index and returns an
undef for constant OOB access.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When varyings was added we moved to use to dynamycally allocated
pointers, instead of allocating just one block for everything. That
breaks some assumptions of some vulkan drivers (like anv), that make
serialization and copying easier. And at the same time, varyings are
not needed for vulkan.
So this commit moves them out. Although it seems a little an overkill,
fixing the anv side would require a similar, or more, changes, so in
the end it is about to decide where do we want to put our effort.
v2: (from Jason review)
* Don't use a temp variable on the _create methods, just return
result of rzalloc_size
* Wrap some lines too long.
Fixes: cf0b2ad486 ("nir/xfb: adding varyings on nir_xfb_info and gather_info")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's just a 32-bit index and offset. We're going to want to use it in
GL as well so stop talking about Vulkan.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we get to two deref_var paths with different variables, we usually
know they don't alias. However, if both of the paths are marked
coherent, we don't have to worry about it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
With UBOs and SSBOs we have boolean types but they're actually 32-bit
values. Make the validator a little less strict so that we can do a
32-bit load/store on boolean types. We're about to add a lowering pass
called gl_nir_lower_buffers which will lower boolean load/store
operations to 32-bit and insert i2b and b2i instructions to convert
to/from 1-bit booleans. We want that to be legal.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we want to be able to use copy_deref instructions on explicitly laid
out types, we have to be a little more flexible about what types we
allow. Instead, of requiring the types to exactly match, only require
the bare types to match.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15225213 -> 15222365 (-0.02%)
instructions in affected programs: 43524 -> 40676 (-6.54%)
helped: 203
HURT: 0
Lots of shaders in Shadow Warrior had this pattern along with Deus Ex,
Civ, Shadow of Mordor, and several others.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
These are ir3 specific versions of SSBO intrinsics that add an
extra source to hold the element offset (dword), which is what the
backend instructions need.
The original byte-offset source provided by NIR is not replaced
because on a4xx and a5xx the backend still needs it.
Reviewed-by: Rob Clark <robdclark@gmail.com>
v2: (all from Jason)
Reuse existing function for the end of the block combinations.
Check the SSA values are coming from the right place in tests.
Document the case when the store to array_deref is reused.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The previous code was completely broken when it came to constructing the
undef values. I'm not sure how it ever worked. For the case of a copy
that reads an undefined value, we can just delete the copy because the
destination is a valid undefined value. This saves us the effort of
trying to construct a value for an arbitrary copy_deref intrinsic.
Fixes: e8a8937a04 "nir: add partial loop unrolling support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>