This function is big and I don't think it will won't get meaningfully
constant-propagated during inlining without LTO. Move it to a .c file so
we just have one copy, saving 2.8MB from libnir.a on an amd64 release
build.
text data bss total filename
before:
18953406 7768312 687260 27408978 build-release/driver-symlinks/iris_dri.so
9734366 5542453 481692 15758511 build-release/lib/libvulkan_intel.so
28687772 13310765 1168952 43167489 (TOTALS)
after:
15478350 7767864 687260 23933474 build-release/driver-symlinks/iris_dri.so
6810366 5541685 481692 12833743 build-release/lib/libvulkan_intel.so
22288716 13309549 1168952 36767217 (TOTALS)
No statistically significant performance difference on iris shader-db, n=8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>
For drivers that use this in fragment shaders, load_input is going to
produce incorrect results (flat-shaded values).
Fixes clipping tests on a4xx.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
Drivers expect to know the number of clip distances irrespective of
whether compact arrays are used or not.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
This fixes dEQP-VK.graphicsfuzz.cov-condition-bitfield-extract-integer.
For example, nir_ibitfield_extract(3, 1, 2) should return 1.
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13791>
Prevent nir_compact_varyings from putting per-vertex and per-primitive
output components in the same slot.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
Add this as an option to nir_lower_compute_system_values_options
instead of just relying on the shader's options.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
The nir_tex_instr_src_size helper already sorts this out correctly, no
need to do it twice, and validate_src takes care of it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13781>
Pattern match the point coord sysval and support lowering it as well.
This is required to handle flipped framebuffers on Bifrost. However,
what this pass normalizes to is the opposite of the hardware mode we
used on Bifrost before, so we need to swap modes at the same time to
prevent regressions.
Fixes Piglit glsl-fs-pointcoord and glsl-fs-pointcoord_gles2
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13073>
If you gonna view context of function parse_atomic_op,
then you gonna know that index for array (data_src)
can be unitialized. Imho this approach is cleaner
than doing stuff inside parse_atomic_op.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12995>
nir_lower_blend was written against the OpenGL ES 3.2 specification,
which does not support blending SNORM render targets. The ES spec
says that non-floating point buffers get clamped to [0, 1] before
blending. The story is not so simple: SNORM buffers are blendable in
OpenGL and must clamped to [-1, 1] rather than [0, 1]. Handle this case.
NIR does have the fsat_signed_mali instruction to clamp to [-1, 1], but
it is only implemented in Panfrost, and this pass is in common code.
Open code it instead. Panfrost optimizes the open coded version, so this
is good enough.
Fixes SNORM subtests of Piglit arb_texture_view-rendering-formats.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13499>
We are close to the limit of 512 intrinsics, make more space to
be able to support up to 1024 intrinsics.
Take one bit from packed_const_indices, they shouldn't suffer in
a common case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13456>
Less artifacts and less time running linker. The
load_store_vectorizer test is still split since we need to update
gitlab-ci scripts to skip certain tests in certain builds. Added a
TODO with the concrete suggestion.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13414>
Instead of using gsamplerND types for sampled images, use the new
gtextureND types for sampled images and reserve gsamplerND for combined
image+samplers. Combined image+sampler bindings still get a gsamplerND
type.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
This is separate from images and samplers. It's a texture (not a
storage image) without a sampler. We also add C-visible helpers to
convert between sampler and image types.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
With nir_var_image, we've now run out of bits in our packed blob for
deref instructions. We could revert to an unpacked blob or we could be
a bit more clever about how we encode deref modes and pack them into 5
bits.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
With the `gtest` protocol meson will add some extra arguments to the
test to generate better junit results, which may be useful. This
protocol is only available in meson 0.55.0+, so keep using the default
`exitcode` protocol for meson older than that.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8484>
This is broken for bindless images declared as local variables. It
turns out nir_variable::data::bindless is only used for uniforms and we
already assume anything in nir_var_function_temp or similar is bindless.
We could try to make a tricky assert but now that we have everything
else passing but now that we've got everyone converted the extra
validation probably isn't necessary.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13384>
Storage images will start using nir_var_mem_image but sampled images
still use nir_var_uniform. If we're going to rewrite types, we need to
rewrite the modes as well. Otherwise, nir_validate will get grumpy and
drivers might get confused.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Two carchase compute shaders (shader-db) and two Fallout 4 fragment
shaders (fossil-db) were helped. Based on the NIR of the shaders, all
four had structures like
for (i = 0; i < 1; i++) {
...
for (...) {
...
}
}
All HSW+ platforms had similar results. (Ice Lake shown)
total loops in shared programs: 6033 -> 6031 (-0.03%)
loops in affected programs: 4 -> 2 (-50.00%)
helped: 2
HURT: 0
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 143692018 -> 143692006 (-0.0%)
SENDs in all programs: 6947154 -> 6947154 (+0.0%)
Loops in all programs: 38285 -> 38283 (-0.0%)
Cycles in all programs: 8434822225 -> 8434476815 (-0.0%)
Spills in all programs: 191665 -> 191665 (+0.0%)
Fills in all programs: 298822 -> 298822 (+0.0%)
In the presense of loop unrolling like this, the change in cycles is not
accurate.
v2: Rearrange the logic in the if-condition to read a little better.
Suggested by Tim.
Closes: #5089
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13323>
To align with replace_varying_input_by_uniform_load and better
describe what it does.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
Varying assigned from uniform won't change after interpolation,
so move uniform load to fragment shader to eliminate the varying.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
Deeper in chain of calls, function "src_has_indirect" is used (which
reads "is_ssa" and "reg.indirect").
Fixes: d1eae6f36b ("nir: Properly clean up nir_src/dest indirects")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13317>
In order for the load to never straddle the load can't extend past 8
bytes, not 16. For example a vec2 load with align_mul = 8 and
align_offset = 4 can straddle.
Fixes assertion failures when we stop pushing UBOs in the preamble on
a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Passes generally shouldn't use nir_metadata_all unless they don't change
the program in any significant way. Some of these passes insert new
instructions so they should definitely not be preserving most of it.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13261>
Make u_vector_init a wrapper to u_vector_init_pot. Let both take
(element_count, element_size) as parameters.
Motivated by eed0fc4caf ("vulkan/wsi/wayland: fix an invalid
u_vector_init call")
v2: rename u_vector_init_pot to u_vector_init_pow2
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13201>
Allow backends to turn some sysvals into input varyings so the frontend
(in our case spirv_to_nir()) doesn't have to bother selecting which
one is expected.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
... just like other-size constants are.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13223>
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying:
fabs(fdd*_must_abs_mali(v)) = fabs(fdd*(v))
The sign of their result is undefined.
On Bifrost and Valhall, these unsigned derivatives can be implemented
more efficiently than the correctly-signed counterparts, since the sign
fixup requires extra ALU instructions. On backends where this is the
case, it is useful to optimize fabs(fdd*(v)) to
fabs(fdd*_must_abs_mali(v)). This pattern comes up with the GLSL builtin
`fwidth`.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>
Make sure the new and old sources have the same number of components,
otherwise the NIR validation pass complains.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
SCALED formats are interpreted as floats, but not in the usual [0, 1]
or [-1, 1] range, meaning that the blend lowering logic can't directly
apply to those. Assert that the format being passed is not a scaled
format.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The caller doesn't necessarily want to lower blend operations on all
render targets since some of them might be supported natively (panvk
will be in that case). Let's just skip RTs that have a format set to
PIPE_FORMAT_NONE to allow that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
nir_ssa_for_src() is not supposed to pad the src vector if
dst->num_components > src->num_components. Let's pad things explicitly
with nir_pad_vector().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The NIR validation complains if the swizzle accesses a component that's
not present in the source.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
That way we can get the address to the entry, which is needed for
some nir builtins because extra data in the entry can be used as
shader input.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>
These are I/O variables which are not going to be removed anyway.
However, get_variable_io_mask handles their location incorrectly.
Found using the GCC undefined behavior sanitizer.
Fixes the following error:
runtime error:
shift exponent 4294967258 is too large
for 64-bit type 'long unsigned int'
Closes: #5319
Fixes: cf5f8f55c3
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12719>
Most modern hardware needs the edge flag added as a hidden vertex input
and needs code added to the vertex shader to copy the input to an
output. Intel hardware is a little different. Gfx4 and Gfx5 hardware
works in the previously described mannter. Gfx6+ hardware needs the
edge flag as a specific vertex shader input, and that input is magically
processed by fixed-function hardware without need for extra shader code.
This flag signals only that the vertex shader input is needed. It would
be nice if we could decouple adding the vertex shader input from
generating the copy-to-output code, but that has proven to be
challenging. Not having that code causes other passes to want to
eliminate that shader input.
v2: Convert conditional to assertion. This pass is only called for
vertex shaders. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
Now that they're no longer ralloc'd, we have to be much more careful
about indirects. We have to make sure every time a source or
destination is overwritten, its indirect (if any) is freed. We also
have to choose a memory ownership convention for the rewrite functions.
Assuming that they will be called with the source from some other
instruction, we choose to always make a copy of the indirect (if any).
It's the responsibility of the caller to ensure its copy of the indirect
is freed.
Unfortunately, all this extra logic is going to make
nir_instr_rewrite/move_src/dest more expensive because they now have
all the logic of nir_src/dest_copy instead of a simple struct
assignment. Fortunately, the vast majority of rewrite calls are done by
nir_ssa_def_rewrite_uses which is an SSA-only fast-path.
Fixes: 879a569884 "nir: Switch from ralloc to malloc for NIR instructions."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>
By replacing the 48-byte ralloc header with our exec_node gc_node (16
bytes), runtime of shader-db on my system across this series drops
-4.21738% +/- 1.47757% (n=5).
Inspired by discussion on #5034.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Right now we're using ralloc to GC our NIR instructions, but ralloc has
significant overhead for its recursive nature so it would be nice to use a
simpler mechanism for GCing instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
We were using the ralloc parent in some places, which should work out to
be the shader I think, but to de-ralloc the instrs we should just pass the
existing shader pointer in.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
This code was being tricky with passing a mem_ctx instead of the shader,
then freeing the mem_ctx when the pass was done and all the parallel
copies had been removed from the shader. Use the right type for instr
creation and do a bit of manual list management to prepare the way for
non-ralloc NIR instrs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Calling this lower pass twice in a row would cause spurious
set_vertex_and_primitive_count(0, undef) intrinsics after the proper
set_vertex_and_primitive_count intrinsic. This pretty much turns any
geometry shader into garbage.
Fix this by treating nir_intrinsic_emit_vertex_with_counter and
nir_intrinsic_end_primitive_with_counter just like the non-_with_counter
versions. If no blocks would need set_vertex_and_primitive_count
intrinsics added, exit the pass before doing any work. This prevents
the need for DCE to do extra clean up later.
Since this pass is potentially called multiple times via multiple
invocations of a finalize_nir callback, it is (hypothetically?) possible
that control flow could be changed to add new blocks that need this
intrinsic. The check implemented in this commit should be robust
against that possibility.
v2: Add a_block_needs_set_vertex_and_primitive_count. Suggested by
Timur.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: e76ae39ae2 ("nir: add support for user defined select control")
Fixes: b56451f82c ("nir: add support for user defined loop control")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12778>
Driver like radeonsi load varying in a scalar manner, so prefer to pack
varying with different interpolation qualifier into same slot to save
space.
But driver like panfrost/bifrost can load varying in vector manner,
so prefer to pack varying with same interpolation qualifier.
Driver can add interpolation qualifiers which are able to be
packed into same varying slot to pack_varying_options nir option.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>
nir_foreach_src() bails after cb returns false for any src. Which isn't
the behavior we were looking for. Move progress flag to state struct
instead, so we don't skip visiting some sources.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>
Normal UBOs have explicit strides on them, make our lowered one behave the
same.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
Without this, there's no way to match the UBO nir_variable declarations to
the load_ubo intrinsics referencing their data.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.
Fossil DB changes on Sienna Cichlid:
Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)
Fossil DB changes on Sienna Cichlid (with NGGC enabled):
Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%
Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
These won't work since a workgroup can span more than one thread, and
the temporaries are not shared memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
Mesh shader outputs are either:
- non-array builtins
- array builtins that are either per-primitive or per-vertex
- user-defined outputs that must be either per-primitive or per-vertex
So we can identify any array output as "arrayed" for the purposes of
I/O lowering.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
Per-primitive is similar to per-vertex attributes, but applies to all
fragments of the primitive without any interpolation involved.
Because they are regular input and outputs, keep track in shader_info
of which I/O is per-primitive so we can distinguish them after deref
lowering. These fields can be used combined with the regular
`inputs_read`, `outputs_written` and `outputs_read`.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
We can't append instructions following a return/halt instruction
because the control flow helpers will modify the successor of the
block containing the return/halt. And the NIR validator enforces that
the return/halt must have the end of the function as successor.
This tends to happen following lower_shader_calls lowering which
inserts halts. This probably doesn't prevent the optimization, it'll
just happen in one of the return shaders after the halt has been
removed.
v2: Move prev block ending check earlier in the function (Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>
v2: Fix copy-and-paste bugs in lowering patterns.
v3: Add has_sudot_4x8 flag. Requested by Rhys.
v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
v2: Add and modify patterns to let constant folding do better.
v3: Remove '(is_not_zero)' from the patterns that try to combine
addends. I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in. Noticed by
Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by
Alyssa and Jason. Add missing sudot_4x8 patterns.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These
represent the combinations of integer dot-product and add that operate
on packed source vectors. That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.
Some hardware may prefer to operate on unpacked byte vectors. When such
hardware comes to Mesa, we'll have to figure out how to name things.
v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes
are not 2-source commutative.
v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur. Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts. All suggested by Jason.
v4: Fix typo previously noticed by Alyssa but missed in v2.
v5: Add has_sudot_4x8 flag. Requested by Rhys.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Without this, lowered saturating ALU instructions would only clamp to
the range of the new type instead of the range of the old type.
v2: Use nir_iclamp. Suggested by Jason. Use new
u_{int,uint}N_{min,max}() helpers.
Fixes: 090e282407 ("nir: Add a saturated unsigned integer add opcode")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
This creates a single nir_op_vecn instead of a nir_op_vecn and several
copies.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>
Be able to inline uniforms in loop for unrolling it.
Nested loop/if is also supported.
Some example:
for (i = 0; i < count; i++)
...
uniform "count" will be inlined. But note this does not
make sure the loop will be unrolled (ie. count = 1000).
for (i = 0; i < count; i++)
for (j = init; j < 10; j++)
if (type == 2)
...
uniform "count", "init" and "type" will be inlined.
It is intentional to not be too aggressive to add uniforms
to avoid false positive case while be able to support most
common usage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.
Don't bother inline uniform to deal with such kind of variable
either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Collect per vector component dependency and lower vector uniform
load to scalar if any component need to be inlined.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Unless all uniforms in the condition can be inlined we can
lower the if/loop. So we rollback added uniforms when one
of uniforms in a if condition fail to be added.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results
in a lot of shifts and MOVs. When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.
v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Not only does this eliminate a bunch of unnecessary type converting
MOVs, but it can also enable some SWAR. The
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does
something about like:
c = a.x ^ b.x;
d = a.y ^ b.y;
e = a.z ^ b.z;
After this change, it looks more like:
uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b);
c = extract_u8(t, 0);
d = extract_u8(t, 1);
e = extract_u8(t, 2);
On Ice Lake, this results in:
SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
This eliminates some spurious, size-converting moves. For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:
SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
v2: Condition two of the patterns on !options->lower_extract_byte.
Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Enabling GVN uncovered a bug where we would crash if the pass
thinking about pushing something into a loop.
Fixes: 6538b3e566 ("nir: add heuristic for instructions in loops with GCM")
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12242>
This shouldn't do anything but will make testing a later patch easier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
This reverts commit ad363913e6.
This code was added to be able to compare the output file while porting
the script from python2 to python3, but this has long been finished and
the extra complexity is not needed anymore.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
If a loop is followed by a barrier, the ordering between a load inside
the loop and other memory operations after the barrier may have to be
preserved depending on the type of memory involved. This is relevant
when the memory is writeable by other invocations. In such case, it
is not valid to completely eliminate the loop.
This commit doesn't attempt to precisely catch the barrier case, as
analysis could become too complex. It simply assumes it can't drop
the loops that contain certain types of loads unless those are known
to be safe to reorder (via the access flag).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4475
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938>
Without this stitch_blocks complains about ending in a jump with a
non-empty block after the inserted body.
I hit this with CTS raytracing tests where we tried to inline a
function that basically ended up being something like
{
ignore_ray_intersection
halt
}
I kept the nop path when possible as that does not leave a mess
for the optimization loop to optimize.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12163>
Need to avoid lowering temps when they are used by other instructions,
like the rt instructions (some of the shader call parameters get
converted to temp variables and we will lower them later with
the explicit io lowering pass as we need to guarantee they will
end up in scratch).
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12162>
gl_FragDepth default value is gl_FragCoord.z so if a shader does:
gl_FragDepth = gl_FragCoord.z
we can drop this assignment.
v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth
is wrote multiple times (Jason)
v3: - move to its own pass (Jason)
- handle var = NULL (Rhys)
v4: refactoring (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>
Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666,
the assumption fundamental to this optimization is false. Section
2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation.
The wording of the section is somewhat confusing (because it doesn't
clearly delineate between signed and unsigned integers), but the last
two rows of the table make it clear that F->UD conversion clamps
negative float values to 0.
All other hardware mentioned in that thread seems to behave the same
way.
The real problem is that, with hardware that behaves in this ways,
converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern
that would be produced from 0x80000000 to 0x7fffffff.
This reverts commit ad05920258.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
Otherwise drivers that don't use 16-bit slots for varyings will get
confused and have their driver_locations scribbled over. This has caused
multiple problems for both Panfrost and Asahi this week. Given the only
other user of the pass for varyings is radeonsi, which needs both
together, I think this is the least controversial fix.
Fixes: fb29cef8dd ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>
Both nir_opt_algebraic and nir_opt_idiv_const have optimizations for
umod/imod/irem by constants.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
This lowering is smaller and -INT64_MIN is probably UB (signed overflow).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
ineg(INT_MIN)/iabs(INT_MIN) won't work as expected.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
If "a" is a multiple of "b", then the result would have been "b" instead
of 0.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0ef5f3552f ("nir: add strength reduction pattern for imod/irem with pow2 divisor.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
The previous handling conflated RelPatchID and PrimID, which would
result in incorrect gl_PrimitiveID when doing draw splitting and didn't
work with PrimID passthrough which fills the VPC slot with the "correct"
PrimID value from the tess factor BO which we left 0. Replace PrimID in
the tess lowering pass with a new RelPatchID sysval, and relace PrimID
with RelPatchID in the VS input code in turnip/freedreno at the same
time so that there is no net change in the tess lowering code. However,
now we have to add new mechanisms for getting the user-level PrimID:
- In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
This means we have to add another register to our VS->TCS ABI. I
decided to put PrimID in r0.z, after the TCS header and RelPatchID,
because it might not be read in the TCS.
- If any stage after the TCS uses PrimID, the TCS stores it in the first
dword of the tess factor BO, and it is read by the fixed-function
tessellator and accessed in the TES via the newly-uncovered DSPRIMID
field. If we have tess and GS, the TES passes this value through to
the GS in the same way as the VS does. PrimID passthrough for reading
it in the FS when there's tess but no GS also "just works" once we
start storing it in the TCS. In particular this fixes
dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
The liveness information will be a superset of real liveness so it's
unlikely something will explode if it tries to use it. However, it is
out-of-date and should be re-run if someone really wants it.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12186>
Many places need to know the maximum or minimum possible value for a
given size integer... so everyone just open-codes their favorite
version. There is some potential to hit either undefined or
implementation-defined behavior, so having one version that Just Works
seems beneficial.
v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in
u_intmin. Noticed by CI. Lol. Rename functions
`s/u_(uint|int)(min|max)/u_\1N_\2/g`. Suggested by Jason. Add some
unit tests that would have caught the copy-and-paste bug before wasting
CI time. Change the implementation of u_intN_min to use the same
pattern as stdint.h. This avoids the integer division. Noticed by
Jason.
v3: Add changes to convert_clear_color
(src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.
Fossil DB results on Sienna Cichlid (with NGGC on):
Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%
No Fossil DB changed with NGGC off.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
For TGSI, we need the coordinate, comparator, bias, and LOD all together
in the first two vec4 args, and by doing it in the backend we were
generating extra MOVs.
softpipe shader-db results:
total instructions in shared programs: 2985416 -> 2953625 (-1.06%)
instructions in affected programs: 499937 -> 468146 (-6.36%)
total temps in shared programs: 544769 -> 565869 (3.87%)
temps in affected programs: 105469 -> 126569 (20.01%)
i915g shader-db:
total instructions in shared programs: 371625 -> 369594 (-0.55%)
instructions in affected programs: 24903 -> 22872 (-8.16%)
total tex_indirect in shared programs: 11381 -> 11365 (-0.14%)
tex_indirect in affected programs: 43 -> 27 (-37.21%)
LOST: 7
GAINED: 16
The temps increase is the pre-existing issue that we never release temps
for NIR regs, which doesn't matter much for softpipe (just memory/cache
footprint) but does for i915g as seen by shaders that no longer compile
(though overall we seem to win).
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11912>
This patch allows to shrink vecN instructions where
one or more components at any position are unused.
Stat changes for softpipe:
total instructions in shared programs: 2986101 -> 2985416 (-0.02%)
instructions in affected programs: 51216 -> 50531 (-1.34%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
ALU instructions of which not all components are read,
can be shrunk to the number of read components.
Previously, this would only remove trailing components.
This patch enables to remove components from any position.
Stat changes for softpipe:
total instructions in shared programs: 3001291 -> 2984698 (-0.55%)
instructions in affected programs: 225585 -> 208992 (-7.36%)
total loops in shared programs: 1389 -> 1358 (-2.23%)
loops in affected programs: 36 -> 5 (-86.11%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
Only fragment and some compute shaders support implicit derivatives.
They're totally meaningless without helper invocations and some
understanding of the dispatch pattern. We've got code to lower
nir_texop_tex in these shader stages to use an explicit derivative of 0
but it was pretty badly broken:
1. It only handled nir_texop_tex, not nir_texop_txb or nir_texop_lod.
2. It didn't take min_lod into account
3. It was conflated with adding a missing LOD parameter to opcodes
which expect one such as nir_texop_txf. While not really a bug,
this does make it way harder to reason about the code.
4. Unless you set a flag (which most drivers don't), it left the
opcode nir_texop_tex instead of nir_texop_txl which it should have
been.
This reworks it to go through roughly the same path as other LOD
lowering only with a constant lod of 0 instead of calling out to
nir_texop_lod. We also get rid of the lower_tex_without_implicit_lod
flag because most drivers set it and those that don't are probably
subtly broken. If someone really wants to get nir_texop_tex in their
vertex shaders, they can write a new patch to add the flag back in.
Fixes: e382890e25 "nir: set default lod to texture opcodes that..."
Fixes: d5ac5d6e83 "nir: Add option to lower tex to txl when..."
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
It's intel-specific, used to get at MSAA compression information.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
This is required for Zink where the API ballot type is a uint64_t and
the "hardware" ballot type is uvec4.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11989>
This changes the pass to extract pinned instructions and not just unpinned
instructions when rescheduling instructions. This stops pinned instructions
from being bunched together when instructions are reinserted into the blocks
which can result in regressions with regards to cycles and instruction
counts on i965 and register use/Max Waves on AMD hardware.
In order to do this we also throw away the post-order depth-first
search linearization algorithm used to re-insert the instructions, which
itself causes possible regressions when instructions are reinserted into
a less than ideal new order (of which the bunched together pinned
instructions is one example). Instead we simply insert instructions in the
reverse order they were extracted. This will simply place instructions
that were scheduled earlier onto the end of their new block and
instructions that were scheduled later to the start of their new block.
With this everything should remain in order without the need to run
over uses.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
With this pass enabled in Intel drivers, running shader-db on
shaders/unity/38.shader_test resulted in
Program received signal SIGSEGV, Segmentation fault.
gcm_schedule_early_src (src=0x555555d45348, void_state=0x7fffffffba40) at ../../SOURCE/master/src/compiler/nir/nir_opt_gcm.c:297
297 if (info->early_block->index < src_info->early_block->index)
(gdb) print src_info->early_block
$1 = (nir_block *) 0x0
I tracked this down to an early exit from gcm_schedule_early_instr on
the parent instruction because instr->pass_flags was 0x1c. That
should be an impossible value for this pass, so I inferred that
pass_flags must have dirt left from some previous pass.
Fixes: 8dfe6f672f ("nir/GCM: Use pass_flags instead of bitsets for tracking visited/pinned")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/597>
The rules here are the same as for texture instructions. The bits on
the intrinsic are the ground truth and are allowed to vary from the
deref a bit as-needed. If the intrinsic says PIPE_FORMAT_NONE, then we
can look at the variable, if visible, to get format information. This
means that we need to be careful when we rewrite intrinsics based on the
deref to only override the format from the _deref intrinsic from the
image variable unless the intrinsic is PIPE_FORMAT_NONE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>
Semantically, -1 means "Unknown; don't validate" but it's really only
used for derefs because they often need to be flexible. We don't really
need that flexibility for image intrinsics but this makes it more
consistent. More immediately useful is that this gives us the ability
to tell _deref forms of these intrinsics apart from the lowered ones.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11849>
This patch also adds has_iadd3 bit to give more control if backend
supports ternary add instruction or not.
v2:
- Add patterns in late optimization (Connor Abbott)
Suggested-by: Alyssa/Jason
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
It's a particularly relevant place for NIR bugs to occur, and if you make
a mistake in this code it gets caught in your debug build in something
like mesa/st's call to nir_split_var_copies() during finalization, which is
rather misleading.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11860>
The new intrinsics fall into the following categories:
1. New viewport intrinsics:
For missing components that we need.
RADV will emit new SGPR arguments which will contain the
viewport information for culling shaders. These are used to
compute the screen space coordinates for small primitive culling.
2. load_cull_xxx:
Load the culling settings in runtime.
These will be a new SGPR argument in RADV.
3. overwrite_xxx:
These are needed because system values such as vertex and
instance ID are not writeable, but we need to change them
after repacking shader invocations of VS and TES.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
We say that they're for debug only but we don't really have a good
policy around when to set them and when not to. In particular,
nir_lower_system_values and nir_lower_vars_to_ssa which are the chief
producers of SSA values which might reasonably have a name do not bother
to set one. We have some names set from things like BLORP and RADV's
meta shaders but AFAICT, they're setting a name more because it's there
than because they actually care.
Also, most things other than nir_clone and nir_serialize don't bother to
try and preserve them. You can see in the diffstat of this commit
exactly what passes attempt to preserve names. Notably missing from the
list is opt_algebraic which is the single largest source of SSA def
churn and it happily throws names away.
These observations lead me to question whether or not names are actually
useful at all or if they're just taking up space (8B per instruction)
and wasting CPU cycles (to ralloc_strdup on the off chance we do have
one). I don't think I can think of a single time in recent history
where I've been debugging a shader issue and a SSA value name has been
there and been useful. If anything, the few times they are there, they
just throw me off because they mess up the indentation in nir_print.
iris shader-db on my system gets runtime -2.07734% +/- 1.26933% (n=5)
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5439>
The way that the blob obtains the subgroup id on compute shaders is by
just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
expose a subgroupSize of 128, we'll have to do something a little more
flexible. Sometimes we have to fall back to a subgroup size of 64 due to
various constraints, and in that case we have to fake a subgroup size of
128 while actually using 64 under the hood, by just pretending that the
upper 64 invocations are all disabled. However when computing the
subgroup id we need to use the "real" subgroup size. For this purpose we
plumb through a driver param which exposes the real subgroup size. If
the user forces a particular subgroup size then we lower
load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
and we assume when translating to ir3 that load_subgroup_size means
"give me the *actual* subgroup size that you decided in RA" and give you
the driver param.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
On qualcomm, we have shared registers similar to SGPR's on AMD. However,
there is no readlane or readfirstlane primitive. shared registers can
only be written to when just one lane is active. This means that we have
to lower readInvocation(val, id) to something like:
if (gl_SubgroupInvocation == id) {
scalar_reg = val;
}
return scalar_reg;
However it's a bit difficult to actually get the value of
gl_SubgroupInvocation in the backend, because for compute it requires
some calculations and we don't have any CSE support in the backend. This
intrinsic lets us turn it into
"readInvocationCond(val, id == gl_SubgroupInvocation)" in NIR at which
point the backend code generation is a lot easier.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
Qualcomm has a mode with a subgroup size of 128, so just emitting larger
integer operations and then lowering them later isn't an option. This
makes the pass able to handle the lowering itself, so that we don't have
to go down to 64-thread wavefronts when ballots are used.
(The GLSL and legacy SPIR-V extensions only support a maximum of 64
threads, but I guess we'll cross that bridge when we come to it...)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
Lower it to a vote instead of a ballot. This was only used for AMD, and
in that case they're pretty much the same. However Qualcomm has a vote
builtin, which we want to use instead of ballots.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
Reduces the work that other shader passes have to do to look at dead code,
and possibly extra rounds around the optimization loop if dce wasn't the
last pass in it.
shader-db runtime -1.12919% +/- 0.264337% (n=49) on SKL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
ifind_msb_rev was introduced in a5747f8ab3.
ifind_msb_rev guards against src0 being both 0 or -1 at the same time.
That is always true. This patch changes it to check for those values
individually.
Spotted from a compile warning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: a5747f8ab3 (\"nir: add opcodes for *find_msb_rev and lowering\")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11630>
For images, variable data includes the format. For samplers, variable
data is used for OpenCL inline samplers. When converting a variable
from one to the other, zero out the data so we don't accidentally
interpret a converted image as an inline sampler.
Fixes: fa677c86 ("nir_lower_readonly_images_to_tex: Support non-CL semantics")
Acked-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11674>
This should be a subtract, not an add. The comment's proof is correct,
but the (wrong) expression we actually use isn't what it's in the
comment! Correct the discrepancy.
The lowering in nir_opt_algebraic was correctly typed.
Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11671>
In addition to register pressure benefits from getting more fp16/int16,
this avoids i2imp's from standing in the way of loop unrolling.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11545>
Compiling with clang warns about an unused variable in
nir_lower_packing.
Tracking progress was added to nir_lower_packing in
adb157ddfd but the function
will ignore the progress from impl calls and always return
false.
This patch changes it to return the progress. It fixes the
warning and should enable validation calls in NIR_PASS when
progress is made.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: adb157ddfd "nir: Return progress from nir_lower_64bit_pack()"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11615>
Static struct bitset was renamed to brw_bitset as a struct bitset
is defined in sys/_bitset.h included by pthread_np.h on FreeBSD that
is indirectly included by src/intel/compiler/brw_nir_lower_shader_calls.c
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
Sometimes you might want to find a constant source without going through
all the copy prop and constant folding to make your source be a
load_const.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11613>
We could do better if we knew the nir_address_format to obtain
addition_bits, but the only affected driver (Turnip) probably won't
benefit because it doesn't vectorize across vec4.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 2e7bceb220 ("nir/load_store_vectorizer: fix check_for_robustness() with indirect loads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4922
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11382>
Some infinite loop cases were already covered by other
restrictions (e.g. if the loop had a body), but the case with a single
block in the loop body wasn't yet.
This prevents an infinite loop when optimizing the shader in
dEQP-VK.reconvergence.subgroup_uniform_control_flow_ballot.compute.nesting2.3.2
and various others reconvergence tests.
Fixes: 0881e90c09 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11476>
This is now 100% equivalent to the new rt_resume intrinsic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
Really copying Jason's pass.
Changes:
- Instead of all the intel lowering introduce rt_{execute_callable,trace_ray,resume}
- Add the ability to use scratch intrinsics directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10339>
About half or more of the text here is actually from Connor Abbot. I've
edited it a bit to bring it up-to-date and make a few things more clear.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
Long ago, the semantics of bcsel were such that it took a single boolean
value and selected between whole vectors. These days, it takes a vector
boolean with the assumption that if you want the old behavior you can
just use a .xxxx swizzle. There currently are no opcodes which use a
output_size of 0 but have a scalar or fixed-vector input. Let's
disallow it for now to force us to think through the semantics again if
this ever comes up as something someone actually wants.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11438>
For debug on Android, it's useful to be able to print shaders to the
android log interface, since you don't usually have stdout/stderr.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
limit==0 is the signal for "don't peephole anything but a move that will
be optimized aways." limit > 0 is "up to N alu instructions may be moved
out." nir-to-tgsi uses ~0 as the indicator of "No, we really need to
eliminate all if instructions" on hardware like i915 that doesn't have
control flow.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
We need to make get it updated after we may have nir_instr_remove()d an
instruction, and when we cross blocks. This didn't really matter before
because the only builder usage was idiv, which other users of
lower_int_to_float were probably never hitting.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11329>
Use a single set and ensure dominance by checking after a equivalent
instruction is found.
Besides removing the need to copy a set, this also lets us resize the set
at the start of the pass in the next commit.
ministat (CSE only):
Difference at 95.0% confidence
-984.956 +/- 28.8559
-6.90075% +/- 0.190231%
(Student's t, pooled s = 26.9052)
ministat (entire run):
Difference at 95.0% confidence
-1246.1 +/- 257.253
-0.998972% +/- 0.205094%
(Student's t, pooled s = 239.863)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6390>
Now that there's a common NIR pass, there's no point in us doing this in
the back-end anymore. In order to use this pass in i965, we do have to
make one tiny change. Gallium runs the pass after assigning input and
output locations and so needs the pass to respect those locations and
num_inputs. i965, however, runs it before any location assignment or
I/O lowering so we don't care. We do, however, need the pass to succeed
with num_inputs == 0 because we set that later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
In theory you can rerun the info gather pass, but in practice that
doesn't always end well. Be consistent inside this pass and update the
info.
While we're here, change the inputs read to use VERT_BIT_EDGEFLAG.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
These map directly to v_perm_b32 and v_permlane_b32.
Unfortunately there is no corresponding NIR opcode or
intrinsics, and it's too tedious to puzzle these things
together from the existing NIR instructions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
NIR currently doesn't have any intrinsics for a horizontal packed add,
so this one is modeled after AMD's v_sad_u8.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072>
The helpers will be reused for per-primitive variables that are also
arrayed, so use a more general name.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11252>
At best, this is an extra instruction for NIR to optimize out. At worst,
depending on pass ordering nir_load_output could sneak into the final
NIR, even on drivers that don't support fbfetch.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11255>
if we have
if ... {
return;
} else {
// block X
}
// block Y
phi(X: ...)
then nir_lower_returns tries to move block Y into the else body,
except nir_cf_extract doesn't move the phi. As the return is removed
in the then-body the phi suddenly has the wrong number of arguments
(and the phi doesn't dominate its uses anymore).
In this case we know that the phi has to be single arg, so we can just
rewrite the users of the phis and drop them.
Hit this in my RT adventures, not sure if this is actually reachable
right now, as single arg phis tend to be kind of exceptional outside
of CSSA and we typically call nir_lower_returns pretty early.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11207>
Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
On AGX, the special register for front facing is inverted from its meaning in
APIs. We need to lower load_front_face to inot(load_back_face). Doing this in
the backend is trivial, but then we would miss out on algebraic optimizations
for the inot.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11199>
These are similar to AYUV, but the channel ordering is different... in
such a way that there's no RGBA format that will make the channels line
up right.
v2: Rebase on bc438c91d9 ("nir/lower_tex: ignore texture_index if
tex_instr has deref src")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>
The flt version could have been added in 56e21647e2, but our
collective understanding of NaN and comparisons was poor in 2015. The
new "is_a_number" predicate makes the others possible.
All of the helped shaders in shader-db are either from Mad Max or Skia.
Some of the Skia shaders just get decimated by this change:
instructions helped: shaders/skia/580-4.shader_test FS SIMD8: 81 -> 29 (-64.20%) (scheduled: top-down)
I looked at a couple of those shaders, and they had sequences like:
vec1 32 ssa_44 = flt32 ssa_32, ssa_32
vec1 32 ssa_45 = b32csel ssa_44, ssa_43, ssa_0
vec1 32 ssa_46 = fge32 ssa_32, ssa_32
vec1 32 ssa_47 = b32csel ssa_46, ssa_0, ssa_45
vec1 32 ssa_48 = iand ssa_46, ssa_44
vec1 32 ssa_49 = b32csel ssa_48, ssa_43, ssa_0
ssa_44 is replaced with False. Then ssa_47 selects between ssa_0 and
ssa_0, so ssa_47 and ssa_46 are eliminated. ssa_48 is (False && don't
care), so ssa_48 and ssa_49 are eliminated. After that, many
calculations now involve constants of zero, so they are optimized down
too. So it continues until there's not much left!
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21072238 -> 21071386 (<.01%)
instructions in affected programs: 33722 -> 32870 (-2.53%)
helped: 146
HURT: 1
helped stats (abs) min: 1 max: 62 x̄: 5.84 x̃: 2
helped stats (rel) min: 0.19% max: 62.35% x̄: 4.09% x̃: 1.07%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.20% max: 0.20% x̄: 0.20% x̃: 0.20%
95% mean confidence interval for instructions value: -7.94 -3.65
95% mean confidence interval for instructions %-change: -5.87% -2.25%
Instructions are helped.
total cycles in shared programs: 856203326 -> 856192238 (<.01%)
cycles in affected programs: 749966 -> 738878 (-1.48%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 1226 x̄: 74.92 x̃: 18
helped stats (rel) min: 0.07% max: 49.70% x̄: 2.69% x̃: 0.46%
95% mean confidence interval for cycles value: -104.82 -45.02
95% mean confidence interval for cycles %-change: -4.01% -1.37%
Cycles are helped.
LOST: 4
GAINED: 0
Fossil-db results:
Tiger Lake
Instructions in all programs: 160915223 -> 160898354 (-0.0%)
SENDs in all programs: 6812780 -> 6812780 (+0.0%)
Loops in all programs: 38340 -> 38340 (+0.0%)
Cycles in all programs: 7434144207 -> 7433978462 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304537 -> 304537 (+0.0%)
Ice Lake
Instructions in all programs: 145296298 -> 145279531 (-0.0%)
SENDs in all programs: 6863692 -> 6863692 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800257014 -> 8800088384 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)
Skylake
Instructions in all programs: 135891664 -> 135874910 (-0.0%)
SENDs in all programs: 6802946 -> 6802946 (+0.0%)
Loops in all programs: 38331 -> 38331 (+0.0%)
Cycles in all programs: 8444273433 -> 8444130932 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301114 -> 301114 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
If the values are known to be numbers, the the replacements are exact.
This is only applied to the patterns with constants. Constants should
always be numbers, and shaders with NaN constants should be handled in a
different way.
No shader-db or fossil-db changes on any Intel platform. The intention
is to make these patterns more future proof.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Haswell and later Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049056 -> 21048939 (<.01%)
instructions in affected programs: 4716 -> 4599 (-2.48%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.99% max: 5.43% x̄: 2.80% x̃: 2.51%
95% mean confidence interval for instructions value: -3.46 -2.54
95% mean confidence interval for instructions %-change: -3.22% -2.38%
Instructions are helped.
total cycles in shared programs: 855141411 -> 855141159 (<.01%)
cycles in affected programs: 54491 -> 54239 (-0.46%)
helped: 28
HURT: 5
helped stats (abs) min: 2 max: 34 x̄: 12.82 x̃: 12
helped stats (rel) min: 0.06% max: 2.73% x̄: 0.94% x̃: 0.75%
HURT stats (abs) min: 2 max: 52 x̄: 21.40 x̃: 6
HURT stats (rel) min: 0.11% max: 2.46% x̄: 0.90% x̃: 0.56%
95% mean confidence interval for cycles value: -13.72 -1.55
95% mean confidence interval for cycles %-change: -1.01% -0.31%
Cycles are helped.
Tiger Lake
Instructions in all programs: 160902191 -> 160899554 (-0.0%)
SENDs in all programs: 6812435 -> 6812435 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7428581420 -> 7428555881 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
A lot of fragment shaders in Shadow of the Tomb Raider were helped, and
a bunch of vertex shaders in Octopath Traveler were hurt.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
It seems worth the small amount of damage to give an extra cushion of
not having to debug problems later.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043197 -> 21043359 (<.01%)
instructions in affected programs: 4409 -> 4571 (3.67%)
helped: 0
HURT: 25
HURT stats (abs) min: 1 max: 16 x̄: 6.48 x̃: 5
HURT stats (rel) min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40%
95% mean confidence interval for instructions value: 4.37 8.59
95% mean confidence interval for instructions %-change: 2.93% 6.26%
Instructions are HURT.
total cycles in shared programs: 856175986 -> 856176921 (<.01%)
cycles in affected programs: 58908 -> 59843 (1.59%)
helped: 0
HURT: 25
HURT stats (abs) min: 7 max: 70 x̄: 37.40 x̃: 38
HURT stats (rel) min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39%
95% mean confidence interval for cycles value: 31.11 43.69
95% mean confidence interval for cycles %-change: 1.35% 2.39%
Cycles are HURT.
No fossil-db changes on any Intel platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
When most of these patterns were created, we believed, incorrectly, that
fsat(NaN) was NaN. We have since realized that fsat(NaN) is zero.
Originally, this changed the patterns to use is_a_number. This didn't
help any shaders, so it's easier to just drop the optimizations.
This commit crossed paths with 4c3ad4d065 ("nir/algebraic: mark more
optimization with fsat(NaN) as inexact") and bc123c396a
("nir/algebraic: mark some optimizations with fsat(NaN) as inexact").
Given that these don't impact very many shaders, it seems safer to just
remove them.
As discussed in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716, I tried
modifying these patterns to use !(b cmp a). Unfortunately, on Intel
GPUs, the results were much worse than just removing the patterns
altogether.
Some other related patterns will be addressed in later commits.
There are still a number of patterns that use the identity fsat(1-X) ==
1 - fsat(X). If X is NaN, the former is zero while the latter is 1.0.
I haven't evaluted these patterns yet. If changes are needed in these
patterns, it should be a separate commit anyway.
v2: Replace arrow `=>` with `->` in comments because the `=>` looks a
lot like `<=` comparison. Suggested by Rhys.
Fixes: 92b75c126b ("nir/algebraic: Replace checks that a value is between (or not) [0, 1]")
Fixes: a7f0c57673 ("nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel hardware had similar results. (Ice Lake shown)
total instructions in shared programs: 20029060 -> 20029670 (<.01%)
instructions in affected programs: 69236 -> 69846 (0.88%)
helped: 0
HURT: 263
HURT stats (abs) min: 1 max: 20 x̄: 2.32 x̃: 1
HURT stats (rel) min: 0.30% max: 11.11% x̄: 1.35% x̃: 0.98%
95% mean confidence interval for instructions value: 1.86 2.78
95% mean confidence interval for instructions %-change: 1.18% 1.52%
Instructions are HURT.
total cycles in shared programs: 979821278 -> 979834425 (<.01%)
cycles in affected programs: 1476848 -> 1489995 (0.89%)
helped: 49
HURT: 204
helped stats (abs) min: 1 max: 812 x̄: 102.31 x̃: 20
helped stats (rel) min: 0.01% max: 21.43% x̄: 2.23% x̃: 0.52%
HURT stats (abs) min: 2 max: 2600 x̄: 89.02 x̃: 16
HURT stats (rel) min: 0.04% max: 27.27% x̄: 1.49% x̃: 0.72%
95% mean confidence interval for cycles value: 13.18 90.75
95% mean confidence interval for cycles %-change: 0.29% 1.25%
Cycles are HURT.
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
Many fragment shaders do a discard using relatively little information
but still put the discard fairly far down in the shader for no good
reason. If the discard is moved higher up, we can possibly avoid doing
some or almost all of the work in the shader. When this lets us skip
texturing operations, it's an especially high win.
One of the biggest offenders here is DXVK. The D3D APIs have different
rules for discards than OpenGL and Vulkan. One effective way (which is
what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
wait until the very end of the shader to discard. This ends up in the
pessimal case where we always do all of the work before discarding.
This pass helps some DXVK shaders significantly.
v2 (Jason Ekstrand):
- Fix a couple of typos (Grazvydas, Ian)
- Use the new nir_instr_move helper
- Find all movable discards before moving anything so we don't
accidentally re-order anything and break dependencies
v3 (Pierre-Eric): remove the call to nir_opt_conditional_discard based
on Daniel Schürmann comment.
v4 (Pierre-Eric):
- handle demote intrinsics and drop derivatives_safe_after_discard
- add early return if discards/demotes aren't used
v5 (Pierre-Eric):
- use pass_flags instead of instr set (Daniel Schürmann)
v6 (Daniel Schürmann):
- cleanup and fix pass_flags handling
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10522>
Removes an instruction from one place and inserts it at another while
working around a weird cursor corner-case.
v2: change return value to bool (Daniel Schürmann)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> (v1)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10522>
It's perfectly legal to declare multiple SSBOs that point to the same
binding/descriptor_set with different access mask. Currently, it will
always get the first one in the list that matches binding/desc_set
regardless of the access mask, but other variables might have different
access mask.
Fix this by being conservative if another variable uses the same
binding/desc_set because we can't get it reliably without adding
a new field to vulkan_resource_index.
This fixes rendering issues in Resident Evil Village with vkd3d-proton.
This bug has been uncovered by ("spirv: Don't remove variables used by
resource indexing intrinsics") because variables are no longer removed
No fossils-db changes.
Cc: 21.1 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10692>
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
Backends that don't handle IO component precision can pack more varyings
into one slot if the linker ignores the precision. If the IO is vectorized
then this can save IO instructions.
Related: 165a69d2f7
nir: handle mediump varyings in varying compaction helpers
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10722>
VS outputs are "per vertex" but not the kind of I/O we want to match
with this helper. Change to a name that covers the "arrayness"
required by the type.
Name inspired by the GLSL spec definition of arrayed I/O.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10493>
Otherwise the lowering pass might try to lower any other load from
a deref if its data.location value happens to be zero.
Fixes: 418c4c0d7d
compiler/nir: extend lower_fragcoord_wtrans to support VARYING_SLOT_POS
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10577>
These intrinsics represent what the hardware can actually do.
Lowering our shaders to use these intrinsics will allow us to
deal with mapping the classic VS, TES, GS (and the future MS)
stages to the hardware capabilities using NIR, which makes our
backend compilers simpler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
The lowered NIR code of NGG VS shaders uses this intrinsic
when the VS has to export the primitive ID.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
These allow us to generate slightly better code in some cases,
eg. multiplications in ACO.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
These intrinsics will be used when lowering NGG shaders, including
currently supported stages like VS, TES, GS and also by mesh shaders
in the future.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
Fixes issues with upcoming CTS test testing empty structs.
v2: decorate with UNUSED as only used in assert (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10681>
This pass was originally developed for Panfrost, where it passes the
relevant dEQP tests. Upstreaming so it can be extended and then shared
with:
* Asahi, for blending
* Zink, for logic ops
* Lavapipe, for advanced blending
Note that using this with MRT in a fragment shader (as non-panfrost
drivers will) has not yet been tested. Logic ops with integer
framebuffers are probably todo. It's been enough for Panfrost, will
suffice for ES2 on Asahi, and provides an upstream base for kusma's work
on advanced blending, so overall the merge is a net benefit.
v2: Remove bogus assert that the format layout is PLAIN. We need to
render R11G11B10, which Mesa reports as layout OTHER. The code is still
correct.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10601>
These are equivalent to the 32bit opcodes if there are no more efficient
24bit opcodes available, but inputs are guaranteed to already be 24bit,
so the 24bit opcodes can be used instead if they exist and are efficient.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10549>
Move it out of the "cs" sub-struct, since the bit can be used for
other shader stages in the future.
This also removes a subtle issue in spirv_to_nir:
info.cs.shared_memory_explicit_layout was used without checking for
the CS shader stage. It ended up being "harmless" since the effects
also depended on presence of shared variables.
Fixes: 5de6c5973a ("spirv: Implement SPV_KHR_workgroup_memory_explicit_layout")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10529>
The V3D hardware allows us to pack multiple workgroups together to avoid
wasting execution lanes in shader cores.
For example, if we dispatch 16 workgroups with a local size of 1 element, we
can pack all 16 workgroups in a single 16-wide dispatch where each lane
executes a different workgroup, instead of 16 1-wide dispatches.
When we do this, we don't have a uniform workgroup id any more.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
The mode one was used before 0bc5a829dd ("nir: Remove shared support from
lower_io").
The others were used before 5f7c7c9a7f ("nir: add src and dest types
to all IO loads and stores for mediump").
All conditions now are always true, so drop them.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10533>
For unsigned comparisons with zero these ops can be eliminated.
v2: Add comparison optimizations with -1 (Rhys Perry)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10583>
Used to split up the fsin/fcos lowering for AGX between NIR and the
backend, to permit algebraic optimizations without polluting NIR with
too many hardware details. The backend NIR lowering produces an
fmul/ffma of the input so we can optimize code like sin(2*x).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10582>
For non-CL, intrinsic access isn't set, because the image type doesn't
have access qualifier. Instead, the access qualifier is set on the variable.
So, add a mode to this pass which can chase back to the variable in addition
to the intrinsic access. Also, update the variable type and the deref chain
types so everything is consistent, that the tex is accessing a sampler. Note
we can't do this for CL, because void-typed samplers don't exist.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10356>
Fixes crash in
dEQP-GLES31.functional.shaders.framebuffer_fetch.basic.last_frag_data
when using this pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10411>
To vectorize to vec8/16 or vec4 (without vec3), we can't incrementally add
components to a load/store. This patch loops vectorization so that two new
vec2/4/8 operations can be combined into a larger operation.
fossil-db (GFX10.3):
Totals from 22 (0.02% of 139391) affected shaders:
SpillVGPRs: 1749 -> 1771 (+1.26%)
CodeSize: 901212 -> 892532 (-0.96%); split: -1.19%, +0.22%
Scratch: 178176 -> 184320 (+3.45%)
Instrs: 159358 -> 158027 (-0.84%); split: -0.99%, +0.16%
Cycles: 37046772 -> 36738544 (-0.83%); split: -1.00%, +0.17%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10384>
This was renamed when I was in high school. I remember updating the
Midgard compiler while sitting in AP Physics.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10296>
Some hardware doesn't have a way to check if invocation was demoted,
in such case we have to track it ourselves.
OpIsHelperInvocationEXT is specified as:
"An invocation is currently a helper invocation if it was originally
invoked as a helper invocation or if it has been demoted to a helper
invocation by OpDemoteToHelperInvocationEXT."
Therefore we:
- Set gl_IsHelperInvocationEXT = gl_HelperInvocation
- Add "gl_IsHelperInvocationEXT = true" right before each demote
- Add "gl_IsHelperInvocationEXT = gl_IsHelperInvocationEXT || condition"
right before each demote_if
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
If a has been lowered to float16 here, then we end up trying to
construct a vector of mixed precision, which the validator asserts
about.
So let's make sure we use the same type for all arguments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10201>
The prog_to_nir->NIR-to-TGSI change ended up causing regressions on r300,
and svga against r300-class hardware, because nir_lower_uniforms_to_ubo()
introduced shifts that nir_lower_ubo_vec4() tried to reverse, but that NIR
couldn't prove are no-ops (since shifting up and back down may drop bits),
and the hardware can't do the integer ops.
Instead, make it so that nir_lower_uniforms_to_ubo can generate
nir_intrinsic_load_ubo_vec4 directly for !INTEGER hardware.
Fixes: cf3fc79cd0 ("st/mesa: Replace mesa_to_tgsi() with prog_to_nir() and nir_to_tgsi().")
Closes: #4602
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10194>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
Group mediump varyings and don't put 16-bit and 32-bit components
in the same vec4.
... and reply to the comment there.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10224>
It is useful for the precisions of varyings to match across shader
stages at link-time to enable precision lowering optimizations, which
would otherwise require costly draw-time fixups.
The goal is to enable `producer->precision == consumer->precision` to be
an invariant drivers may rely on for linked shaders.
v2: keep transform feedback outputs at mediump - mareko
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
Added:
* a pass that renumbers bases of IO intrinsics
* a pass that converts mediump IO to 16 bits, optionally using the new
packed varying slots
* a pass that sets (forces) mediump in IO intrinsics (for testing)
* a pass that remaps VARYING_SLOT_VAR[0..15]_16BIT to VARYING_SLOT_VAR[0..31]
(if some shader stages don't want packed varyings)
* a pass that folds type conversions around texture opcodes into those
opcodes (e.g. tex(f2f32(coord), ..) is changed into tex accepting f16)
* a pass that changes (legalizes) sampler src and dst types based on specified
hw constraints (e.g. derivatives must be the same type as coordinates)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
This allows mediump inputs and outputs to be trivially lowered into packed
16-bit varyings where 1 slot is occupied by 2 16-bit vec4s, without any
packing instructions in NIR and without any conflicts with 32-bit varyings.
The only thing that is changed is IO semantics in intrinsics to get packed
16-bit varyings.
This simplifies supporting 16-bit types for drivers that have 32-bit slots
everywhere except the fragment shader where they can do 16-bit interpolation
on either the low or high half of each slot.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
This breaks CLOn12's handling of CL CTS test_basic vector_creation for char3 (at least).
Removing this cast causes us to try to load from a deref with no alignment info.
Fixes: 99bb2a4d ("nir/opt_deref: Don't remove casts with alignment information")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10165>
load_global only has one source.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: dfe429eb41 ("nir/loop_unroll: unroll more aggressively if it can improve load scheduling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10186>
set_foreach()'s order on a list of nir_block * isn't deterministic, so we
need to sort the predecessor list.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3364>
I can't imagine any reasonable optimization which could break this, but
since it's lowered from an integer instructions, we shouldn't do anything
which could change the result.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.
Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
I'd like to use raw shared intrinsics already for some raytracing
stuff before this pass gets called and this was a real pitfall.
This mirrors scratch_size and constant_data_size.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10094>
The same info is in shader_info. Dedupe.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10094>
Roundtrip to a larger float and divide there. The extra details for
mod/rem are handled directly in integer space to simplify verification
of rounding details. The one issue is that the mantissa might be
rounded down which will cause issues; adding 1 unconditionally (proposed
by Jonathan Marek) fixes this. The lowerings here were tested
exhaustively on all pairs of 16-bit integers.
v2: Update idiv lowering per Rhys Perry's comment.
v3: Rewrite lowerings.
v4: Remove useless ftrunc, fix 8-bit issue, simplify code.
v5: Remove useless ffloor
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8339>
Helps deduplicate some code between the two lowering paths. In
particular, it ports the missing 32-bit? check to the precise pass. This
does not change anything immediately: drivers depending on this to lower
16-bit did not work before due to type mismatches and will not work now
since it'll refuse to lower. But that means sub-32-bit idiv can be
lowered more efficiently in an algebraic pass.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8339>
Convenient for bitsize independent lowerings, will be used in the idiv
lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8339>
Generalizes nir_convert_to_bit_size, which we implement as a
special-case.
v2: Take a sized dest type but allow unsized or sized source to address
Jason's feedback. Shorten name.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8339>
A RDR2 shader has a undef->texture cast which is eventually optimized out.
Without handling NULL from nir_deref_instr_get_variable(), compiling this
shader will result in a crash.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Fixes: bc438c91d9 ("nir/lower_tex: ignore texture_index if tex_instr has deref src")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10038>
texture_index is meaningless when a tex_instr has deref src.
Use var->data.binding instead.
This fixes the incorrect lowering on radeonsi where the same
lowering steps was applied to all tex_instr based on the needs
of the first one (since texture_index is always 0).
CC: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9931>
Avoids some copypaste and makes it easier to see how the different types
relate.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8847>
If we've added the array, then we should update the info. This is the
value that gallium drivers setting !PIPE_CAP_CLIP_PLANES have to use in
place of rasterizer->clip_planes_enabled.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9815>
It would be later used by Turnip in implementation of
VK_KHR_pipeline_executable_properties.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8877>
this is a special version of indirect deref lowering which is used by
mesa/st to remove dynamic indexing from builtin uniforms for the lowering
pass in non-packed uniform case
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9741>
r600 expect the input values to be normalited by divinding by 2 *PI, so
add an opcode to be able to lower this in nir.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>
Some backends, like r600 support a fused version of int and float compare
against zero and and csel. Adding these opcodes here makes it possible to
optimize this in nir.
v2: Add rules for float compare + csel
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>
Some hardware supports a version of find_msb where the bits are counted
starting at the high bit, and this needs some lowering to obtain the
value that is expected by *find_msb
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>
This moves the dxil pass to common code and makes dxil
use the new code.
Acked-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9643>
There's currently an MSVC optimizer bug which causes a stack overflow
in the compiler if it attempts to optimize fsat.
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9700>
For nir_address_format_64bit_global_32bit_offset and
nir_address_format_64bit_bounded_global, we use a new intrinsics which
take the base address and offset as separate parameters. For bounds-
checked access, the bound is also included in the intrinsic. This gives
the drive more control over the bounds checking so that UBOs don't
suddenly become massively more expensive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8635>
This is a global address format where you have a 64-bit base pointer and
a 32-bit offset. It's intentionally identical to 64bit_bounded_global
except nir_lower_explicit_io does no bounds checking with it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8635>
This allows us to do bounds checked A64 block load without the it being
counted as control-flow by NIR. This means that NIR optimizations like
CSE will be able to work on these the same as a regular load.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8635>
The offset is already updated with consideration to the base above under
"/* update the offset */".
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9201>
There are only a couple patterns that use is_finite, so the changes
aren't huge. Mostly shaders from Batman Arkham City and a few shaders
from Shadow of the Tomb Raider were affected.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tiger Lake
Instructions in all programs: 160902591 -> 160902489 (-0.0%)
SENDs in all programs: 6812270 -> 6812270 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7429003266 -> 7428992369 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
Ice Lake
Instructions in all programs: 145301634 -> 145301460 (-0.0%)
SENDs in all programs: 6863890 -> 6863890 (+0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798589772 -> 8798575869 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334250 (+0.0%)
Skylake
Instructions in all programs: 135892010 -> 135891836 (-0.0%)
SENDs in all programs: 6802916 -> 6802916 (+0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442597324 -> 8442583202 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301116 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
Recall that when either value is NaN, fmax will pick the other value.
This means the result range of the fmax will either be the "ideal"
result range (calculated above) or the range of the non-NaN value.
Previously, something like fmax({gt_zero}, {lt_zero, is_a_number}) would
return a range of gt_zero. However, if the "gt_zero" parameter is NaN,
the actual result will be the "lt_zero" parameter.
This analysis depends on the is_a_number analysis also added in this MR.
Assuming this doesn't cause any unforeseen problems, I believe we should
wait a bit, then nominate a subset of the series for the stable
branches.
This fixes the piglit tests
tests/spec/glsl-1.30/execution/range_analysis_fmax_of_nan.shader_test
tests/spec/glsl-1.30/execution/range_analysis_fmin_of_nan.shader_test
from https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/463.
Even with the added fsat fixes, range_analysis_fsat_of_nan.shader_test
still fails. There are some other issues there that will be addressed
in later commits (in another MR).
v2: Add fsat fixes. Suggested by Rhys.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Shader-db results:
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049290 -> 21049314 (<.01%)
instructions in affected programs: 3175 -> 3199 (0.76%)
helped: 0
HURT: 17
HURT stats (abs) min: 1 max: 3 x̄: 1.41 x̃: 1
HURT stats (rel) min: 0.20% max: 1.89% x̄: 0.97% x̃: 0.92%
95% mean confidence interval for instructions value: 1.09 1.73
95% mean confidence interval for instructions %-change: 0.75% 1.19%
Instructions are HURT.
total cycles in shared programs: 855136176 -> 855136406 (<.01%)
cycles in affected programs: 37579 -> 37809 (0.61%)
helped: 0
HURT: 17
HURT stats (abs) min: 12 max: 20 x̄: 13.53 x̃: 14
HURT stats (rel) min: 0.17% max: 1.13% x̄: 0.79% x̃: 0.91%
95% mean confidence interval for cycles value: 12.53 14.53
95% mean confidence interval for cycles %-change: 0.63% 0.94%
Cycles are HURT.
Fossil-db results:
Tiger Lake
Instructions in all programs: 160901033 -> 160902591 (+0.0%)
SENDs in all programs: 6812270 -> 6812270 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7430016795 -> 7429003266 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
Ice Lake
Instructions in all programs: 145299102 -> 145301634 (+0.0%)
SENDs in all programs: 6863890 -> 6863890 (+0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798390846 -> 8798589772 (+0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334250 (+0.0%)
Skylake
Instructions in all programs: 135889478 -> 135892010 (+0.0%)
SENDs in all programs: 6802916 -> 6802916 (+0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442624166 -> 8442597324 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301116 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
This commit is necessary to support "nir/range_analysis: Fix analysis of
fmin and fmax with NaN".
No shader-db or fossil-db changes on any Intel platform.
v2: Pack and unpack is_a_number.
v3: Don't set is_a_number of integer constants. The bit pattern might
be NaN.
v4: Update handling of b2i32. intBitsToFloat(int(true)) is
1.401298464324817e-45. Return a value consistent with that.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
The obvious changes to nir_search_helpers.h are in a separate commit to
limit the scope of this change. These additions are really only needed
to support the next commit "nir/range_analysis: Add "is a number" range
analysis tracking". This reduction in scope is intended to increase the
suitability for stable branches.
No shader-db or fossil-db changes on any Intel platform.
v2: Pack and unpack is_finite.
v3: Split nir_search_helpers.h changes into a separate commit.
v4: Remove assertion intended for the next commit. Update is_finite
comment for fsign. Both noticed by Rhys. Fix is_finite handling for
load_const vectors. If any element is not finite, set the flag to
false. This is the same way is_integral is already handled.
v5: Update handling of b2i32. intBitsToFloat(int(true)) is
1.401298464324817e-45. Return a value consistent with that.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>