This pass was originally written for d3d12, but is useful for hardware
that lacks sample compare support like some etnaviv GPU models.
Also rename the lowering pass and some surrounding code to
nir_lower_tex_shadow as suggested by Emma.
I'd like to use the pass that's already in tree.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14308>
I noticed the inefficiency in NIR-to-TGSI output while trying to debug a
failure handling some arrays in r600. While this makes reading CTS
shaders easier, the effect in the real world is pretty limited. From
softpipe shader-db:
total instructions in shared programs: 2929840 -> 2929836 (<.01%)
instructions in affected programs: 118 -> 114 (-3.39%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>
Zero isn't really a valid write mask. If it's provided, use a full write
mask.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14455>
For nir_to_tgsi, I want to be able to fold into the base from a vector
load_const, which the ad-hoc scalar chasing couldn't handle.
r300:
total instructions in shared programs: 1278731 -> 1256502 (-1.74%)
instructions in affected programs: 457909 -> 435680 (-4.85%)
total flowcontrol in shared programs: 8316 -> 8313 (-0.04%)
flowcontrol in affected programs: 5 -> 2 (-60.00%)
total temps in shared programs: 213687 -> 213774 (0.04%)
temps in affected programs: 13140 -> 13227 (0.66%)
total consts in shared programs: 952850 -> 949929 (-0.31%)
consts in affected programs: 386352 -> 383431 (-0.76%)
Fixes: #5781
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
Since we don't have 32-bit ints, these checks for 32-bit unsigned wrapping
don't help and just reduce optimization opportunities (particularly for
DX9 addressing math).
Doesn't affect any current consumers.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
This lets nir-to-tgsi fold the constant offset of addressing calculations
into the CONST[] reference, which is important for D3D9-era compatibility:
HW of that age has limited uniform space, and if we do the addressing math
as math in the shader for dynamic indexing, the nir_load_consts end up
taking up uniforms we don't have available.
r300:
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total instructions in shared programs: 1279699 -> 1279167 (-0.04%)
instructions in affected programs: 134796 -> 134264 (-0.39%)
total temps in shared programs: 213912 -> 213736 (-0.08%)
temps in affected programs: 2166 -> 1990 (-8.13%)
total consts in shared programs: 953237 -> 952973 (-0.03%)
consts in affected programs: 45980 -> 45716 (-0.57%)
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14309>
This saves a lot of pointless gl.h includes across the board,
it moves the one place that needs GLenum into a separate file
only used in those passes that require it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.
don't store the enum in shader_info as it changes size, and confuses
other things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
This fixes nir_opt_cse miss replace a non-sparse tex instruction
with a sparse tex instruction and fail the nir_validate_shader().
Fixes: 3a7972f72a ("nir,spirv: add sparse texture fetches")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14362>
Doing this for ir3 required adding a struct for limits of how much base to
fold in (which NTT wants as well for its case of shared vars), otherwise
the later work to lower to the 1<<9 word limit would emit more
instructions.
The shader-db results are that sometimes the reduction in NIR instruction
count results in the fewer sampler prefetches due to the shader being
estimated to be shorter (dota2, nexuiz):
total instructions in shared programs: 8996651 -> 8996776 (<.01%)
total cat5 in shared programs: 86561 -> 86577 (0.02%)
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14023>
This optimizations turns
loop {
...
if (cond1) {
if (cond2) {
do_work_1();
break;
} else {
do_work_2();
}
do_work_3();
break;
} else {
...
}
}
into:
loop {
...
if (cond1) {
if (cond2) {
do_work_1();
} else {
do_work_2();
do_work_3();
}
break;
} else {
...
}
}
As this optimizations moves code into the NIF statement,
it re-iterates on the branch legs in case of success.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7587>
This patch is a rewrite of nir_opt_move.
Differently from the previous version, each instruction is checked
if it can be moved downwards and then inserted before the first user
of the definition. The advantage is that less insert operations are
performed, the original order is kept if two movable instructions have
the same first user, and instructions without user in the same block
are moved towards the end.
v2: Only return true if an instruction really changed the position.
Don't care for discards, this will be handled by another MR.
v3: fix self-referring phis and update according to nir_can_move_instr().
v4: use nir_can_move_instr() and nir_instr_ssa_def()
v5: deduplicate some code
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3657>
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
Ultimately this is consumed by nir-to-tgsi and needed by virglrenderer
to correctly declare output variables.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
That's how the TGSI math opcodes work.
This lets lower_vec_to_regs coalesce the DP output into the .yzw channels,
giving an impressive shader-db win on softpipe:
total instructions in shared programs: 2929840 -> 2794036 (-4.64%)
instructions in affected programs: 1651438 -> 1515634 (-8.22%)
total temps in shared programs: 372730 -> 332744 (-10.73%)
temps in affected programs: 118151 -> 78165 (-33.84%)
and a minor one on r300:
total instructions in shared programs: 51238 -> 51149 (-0.17%)
instructions in affected programs: 2621 -> 2532 (-3.40%)
total vinst in shared programs: 15655 -> 15618 (-0.24%)
vinst in affected programs: 468 -> 431 (-7.91%)
total temps in shared programs: 9838 -> 9828 (-0.10%)
temps in affected programs: 59 -> 49 (-16.95%)
and a bigger one on i915g:
total instructions in shared programs: 398064 -> 395901 (-0.54%)
instructions in affected programs: 29271 -> 27108 (-7.39%)
total tex_indirect in shared programs: 12261 -> 12233 (-0.23%)
tex_indirect in affected programs: 98 -> 70 (-28.57%)
LOST: 0
GAINED: 5
The r300 change is less impressive because it does some backend copy-prop,
but also because intermediate storage of DPs now takes a vec4 instead of a
scalar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14200>
An if that looks like:
if (x) { } else { }
That has no phis following it is dead. Currently these are only
removed by peephole select, but that means that 'x' is considered
used until that pass is run, which can make it difficult to apply
sane lowering in the case where loading 'x' requires complex or
expensive transformations, but 'x' is *really* unused.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14400>
Extend store_combined_output_pan to take a dual source blend input in
addition to colour, depth, and stencil inputs. Use the last source for
this, and represent the type with the DEST_TYPE index. This is a hack
but there is no SRC2_TYPE and NIR doesn't seem to mind as long as we
know what we mean. This allows the backend to emit a combined "blend
render target #0" instruction taking two sources.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13714>
This pattern, found in the FSR upscaling shader,
helps the vectorization efforts by keeping the
chain of vectorized instructions intact.
Radeon can optimize it to per-component fneg modifiers.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>
SPV_EXT_demote_to_helper_invocation added OpDemoteToHelperInvocation
operation to turn an invocation into a helper invocation, but the
value of HelperInvocation (a builtin from Input storage class)
couldn't be modified dynamically without breaking compatibility.
For the extension the operation OpIsHelperInvocation was added to get
the dynamic value.
For SPIR-V 1.6, the demote operation was promoted, but now to get the
dynamic value the shader must issue a load to HelperInvocation with
Volatile memory access semantics.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14209>
Two 64-bit shifts and an addition are usually faster than the several
multiplications nir_lower_int64 creates.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
Some of the tested flags are set for other intrinsics and they are
printed only when set, so there's no point in checking exact intrinsic
name or shader stage.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14222>
This reverts commit 6eb3fe2d4f. The root cause was
a bug in Meson when using the new gtest protocol and a test failed before producing
the XML file expected by it. This was fixed in later versions of Meson, so
we've bumped the required meson version to use that feature. The failure should
now be properly identified, so re-enabling the NIR test.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14204>
For data-race safety, let's use this function to ensure NIR debug is
initialized only once.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14057>
This envvar is initialized when creating a NIR shader, but it needs to
be used before. So initialize it here.
v2 (Juan):
- Use static variable for first initialization.
Fixes: f77ccdfb4a ("nir: add NIR_DEBUG envvar")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14057>
On AMD, the hardware will return 0 for the raw LOD if the sum of the
absolute values of derivatives is 0 but Vulkan expects the value to
be in the [-inf, -22.0f] range.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14147>
Compute, task, mesh & raytracing stages don't support
ClipDistance/CullDistance as input.
This change is not needed for correctness. Just something I stumbled on.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14149>
debian-vulkan but not any other CI pipeline consistently fails with:
FileNotFoundError: [Errno 2] No such file or directory: 'nir_tests.xml'
I have to assume that either debian-vulkan is broken, or the NIR test
infrastructure is broken. That's not all. I got the same failure when
I wanted to add a new test, which means the CI is preventing us from adding
new NIR tests, which is a very serious problem with the CI or NIR tests.
The python error doesn't imply that it's a test failure, so something else
is broken. If you don't want such commits to happen again, print better
error messages.
See also the discussion in the MR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13966>
We weren't doing much motion in nir-to-tgsi because we considered all our
lowered load-ubos as unmovable.
softpipe shader-db:
total temps in shared programs: 563942 -> 563136 (-0.14%)
temps in affected programs: 9833 -> 9027 (-8.20%)
r300 shader-db:
instructions in affected programs: 22858 -> 23575 (3.14%)
temps in affected programs: 2039 -> 1813 (-11.08%)
(NIR had given r300 -19% instrs for +40% temps, so this feels like a
worthwhile trade back).
Reivewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14138>
This gets our union's size down to 22 bytes (now smaller than any of the
union's types were before we made the union!). Cuts another 48kb off of
the drivers.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each variable assuming
variables mostly don't have conditions or the conditions are mostly
reused). Reduces libvulkan_intel.so size by 49kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each expression
(assuming expressions mostly don't have conditions or the conditions are
mostly reused). Reduces libvulkan_intel.so size by 8.7kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Even with packing all 3 types into a 40-byte union (nir_search_constant
being 24 bytes and nir_search_expression having formerly been 32), and
having a single array of them, this cuts 1.7MB from each of
libvulkan_intel.so and libgallium_dri.so.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
I'm going to be adding some more tables to reduce relocations in the
generated code, so move the current tables to a struct for arg-passing
sanity.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Just remove queries that are never used or proceeded with. The latter
case leading to undefined values.
v2: Don't use nir_shader_instructions_pass() to find variables (Caio)
Simplify replacement (Caio)
v3: Don't track all the queries intrinsic effects (Caio)
Rename things to represent only read queries (Caio)
Use set instead of hash_table (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13718>
fossil-db (Sienna Cichlid):
Totals from 1 (0.00% of 134572) affected shaders:
no stat changes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14009>
Move all the NIR related debug environmental variables in a single
NIR_DEBUG one.
Use NIR_DEBUG=help to print all the available options.
v2:
- Use a macro to simplify (Marcin, Jason)
- Remove wrong changes (Marcin)
v3 (Marcin):
- Remove rendundant NIR mentioning in option descriptions.
- Unwrap option descriptions.
- Ensure the constant is unsigned.
- Use extern array to remove switch.
v4:
- Add missing kernel shader (Jason).
- Add unlikely() (Marcin).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13840>
This saves a bunch of generated code to pack up the extra NULLs to get to
4 args, and saves executing the conditions in nir_build_alu() to then skip
those NULLs.
Saves another 27kb on disk.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>
I aimed for "things that look like big switch statements, or cases where
the compiler is unlikely to be able to constant-propagate an argument into
something useful."
Saves another 80kb on disk. No perf difference on iris shader-db, n=23.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13916>
Found while running valgrind :
==3583454== Invalid read of size 4
==3583454== at 0xF48336: glsl_get_struct_field_offset (nir_types.cpp:84)
==3583454== by 0xC7CD0D: opt_replace_struct_wrapper_cast (nir_deref.c:1068)
==3583454== by 0xC7CDD9: opt_deref_cast (nir_deref.c:1087)
==3583454== by 0xC7DD8E: nir_opt_deref_impl (nir_deref.c:1369)
==3583454== by 0xC7DF4E: nir_opt_deref (nir_deref.c:1428)
==3583454== by 0xA63F3C: brw_kernel_from_spirv (brw_kernel.c:325)
==3583454== by 0xA3BC2C: main (intel_clc.c:481)
==3583454== Address 0xe4f7e88 is 24 bytes after a block of size 48 in arena "client"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13952>
In some non-trivial cases (the amber script file in the merge
request description) phi instruction has more than 32 elements
in predecessors tree and that isn't recursion, just large tree.
In that case, phis not fully converted into a register or mov,
but successfully removed.
The fix removes the counter and adds container of visited blocks.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3690
Cc: mesa-stable
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13710>
If there is a halt or return instruction right before a loop with a single
continue, we would have taken the fast path intended for loops without
continues.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 71a985d80b ("nir/dce: perform DCE for unlooped instructions in a single pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10284>
gl_ClipDistance most definitely can be read in fragment shaders since
GLSL 1.30. This is also accessible in ES with EXT_clip_cull_distance.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13917>
spirv_to_nir sometimes wraps derefs in vec2 or mov instructions as part of
its texture handling. These get in the way of
nir_rematerialize_derefs_in_use_blocks_impl. Running copy propagation
should get rid of the extra move instructions and get us back to intact
deref chains for everything except variable pointer use-cases.
fossil-db (Sienna Cichlid):
Totals from 6 (0.00% of 134572) affected shaders:
CodeSize: 92656 -> 93088 (+0.47%)
Instrs: 17060 -> 17138 (+0.46%)
Latency: 224408 -> 227539 (+1.40%)
InvThroughput: 37402 -> 37924 (+1.40%)
VClause: 408 -> 402 (-1.47%)
Copies: 1065 -> 1107 (+3.94%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5668
Fixes: 14a12b771d ("spirv: Rework our handling of images and samplers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13924>
Only for a6xx since we don't know the instructions for global
atomics on previous gens. Per Qualcomm's docs in OpenCL atomics
are only supported since a5xx together with Generic memory space.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
This function is big and I don't think it will won't get meaningfully
constant-propagated during inlining without LTO. Move it to a .c file so
we just have one copy, saving 2.8MB from libnir.a on an amd64 release
build.
text data bss total filename
before:
18953406 7768312 687260 27408978 build-release/driver-symlinks/iris_dri.so
9734366 5542453 481692 15758511 build-release/lib/libvulkan_intel.so
28687772 13310765 1168952 43167489 (TOTALS)
after:
15478350 7767864 687260 23933474 build-release/driver-symlinks/iris_dri.so
6810366 5541685 481692 12833743 build-release/lib/libvulkan_intel.so
22288716 13309549 1168952 36767217 (TOTALS)
No statistically significant performance difference on iris shader-db, n=8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>
For drivers that use this in fragment shaders, load_input is going to
produce incorrect results (flat-shaded values).
Fixes clipping tests on a4xx.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
Drivers expect to know the number of clip distances irrespective of
whether compact arrays are used or not.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
This fixes dEQP-VK.graphicsfuzz.cov-condition-bitfield-extract-integer.
For example, nir_ibitfield_extract(3, 1, 2) should return 1.
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13791>
Prevent nir_compact_varyings from putting per-vertex and per-primitive
output components in the same slot.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
Add this as an option to nir_lower_compute_system_values_options
instead of just relying on the shader's options.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
The nir_tex_instr_src_size helper already sorts this out correctly, no
need to do it twice, and validate_src takes care of it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13781>
Pattern match the point coord sysval and support lowering it as well.
This is required to handle flipped framebuffers on Bifrost. However,
what this pass normalizes to is the opposite of the hardware mode we
used on Bifrost before, so we need to swap modes at the same time to
prevent regressions.
Fixes Piglit glsl-fs-pointcoord and glsl-fs-pointcoord_gles2
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13073>
If you gonna view context of function parse_atomic_op,
then you gonna know that index for array (data_src)
can be unitialized. Imho this approach is cleaner
than doing stuff inside parse_atomic_op.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12995>
nir_lower_blend was written against the OpenGL ES 3.2 specification,
which does not support blending SNORM render targets. The ES spec
says that non-floating point buffers get clamped to [0, 1] before
blending. The story is not so simple: SNORM buffers are blendable in
OpenGL and must clamped to [-1, 1] rather than [0, 1]. Handle this case.
NIR does have the fsat_signed_mali instruction to clamp to [-1, 1], but
it is only implemented in Panfrost, and this pass is in common code.
Open code it instead. Panfrost optimizes the open coded version, so this
is good enough.
Fixes SNORM subtests of Piglit arb_texture_view-rendering-formats.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13499>
We are close to the limit of 512 intrinsics, make more space to
be able to support up to 1024 intrinsics.
Take one bit from packed_const_indices, they shouldn't suffer in
a common case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13456>
Less artifacts and less time running linker. The
load_store_vectorizer test is still split since we need to update
gitlab-ci scripts to skip certain tests in certain builds. Added a
TODO with the concrete suggestion.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13414>
Instead of using gsamplerND types for sampled images, use the new
gtextureND types for sampled images and reserve gsamplerND for combined
image+samplers. Combined image+sampler bindings still get a gsamplerND
type.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
This is separate from images and samplers. It's a texture (not a
storage image) without a sampler. We also add C-visible helpers to
convert between sampler and image types.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
With nir_var_image, we've now run out of bits in our packed blob for
deref instructions. We could revert to an unpacked blob or we could be
a bit more clever about how we encode deref modes and pack them into 5
bits.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
With the `gtest` protocol meson will add some extra arguments to the
test to generate better junit results, which may be useful. This
protocol is only available in meson 0.55.0+, so keep using the default
`exitcode` protocol for meson older than that.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8484>
This is broken for bindless images declared as local variables. It
turns out nir_variable::data::bindless is only used for uniforms and we
already assume anything in nir_var_function_temp or similar is bindless.
We could try to make a tricky assert but now that we have everything
else passing but now that we've got everyone converted the extra
validation probably isn't necessary.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13384>
Storage images will start using nir_var_mem_image but sampled images
still use nir_var_uniform. If we're going to rewrite types, we need to
rewrite the modes as well. Otherwise, nir_validate will get grumpy and
drivers might get confused.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
Two carchase compute shaders (shader-db) and two Fallout 4 fragment
shaders (fossil-db) were helped. Based on the NIR of the shaders, all
four had structures like
for (i = 0; i < 1; i++) {
...
for (...) {
...
}
}
All HSW+ platforms had similar results. (Ice Lake shown)
total loops in shared programs: 6033 -> 6031 (-0.03%)
loops in affected programs: 4 -> 2 (-50.00%)
helped: 2
HURT: 0
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 143692018 -> 143692006 (-0.0%)
SENDs in all programs: 6947154 -> 6947154 (+0.0%)
Loops in all programs: 38285 -> 38283 (-0.0%)
Cycles in all programs: 8434822225 -> 8434476815 (-0.0%)
Spills in all programs: 191665 -> 191665 (+0.0%)
Fills in all programs: 298822 -> 298822 (+0.0%)
In the presense of loop unrolling like this, the change in cycles is not
accurate.
v2: Rearrange the logic in the if-condition to read a little better.
Suggested by Tim.
Closes: #5089
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13323>
To align with replace_varying_input_by_uniform_load and better
describe what it does.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
Varying assigned from uniform won't change after interpolation,
so move uniform load to fragment shader to eliminate the varying.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12613>
Deeper in chain of calls, function "src_has_indirect" is used (which
reads "is_ssa" and "reg.indirect").
Fixes: d1eae6f36b ("nir: Properly clean up nir_src/dest indirects")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13317>
In order for the load to never straddle the load can't extend past 8
bytes, not 16. For example a vec2 load with align_mul = 8 and
align_offset = 4 can straddle.
Fixes assertion failures when we stop pushing UBOs in the preamble on
a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13142>
Passes generally shouldn't use nir_metadata_all unless they don't change
the program in any significant way. Some of these passes insert new
instructions so they should definitely not be preserving most of it.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13261>
Make u_vector_init a wrapper to u_vector_init_pot. Let both take
(element_count, element_size) as parameters.
Motivated by eed0fc4caf ("vulkan/wsi/wayland: fix an invalid
u_vector_init call")
v2: rename u_vector_init_pot to u_vector_init_pow2
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13201>
Allow backends to turn some sysvals into input varyings so the frontend
(in our case spirv_to_nir()) doesn't have to bother selecting which
one is expected.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
... just like other-size constants are.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13223>
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
Add derivative opcodes fddx_must_abs_mali/fddy_must_abs_mali satisfying:
fabs(fdd*_must_abs_mali(v)) = fabs(fdd*(v))
The sign of their result is undefined.
On Bifrost and Valhall, these unsigned derivatives can be implemented
more efficiently than the correctly-signed counterparts, since the sign
fixup requires extra ALU instructions. On backends where this is the
case, it is useful to optimize fabs(fdd*(v)) to
fabs(fdd*_must_abs_mali(v)). This pattern comes up with the GLSL builtin
`fwidth`.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12332>
Make sure the new and old sources have the same number of components,
otherwise the NIR validation pass complains.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
SCALED formats are interpreted as floats, but not in the usual [0, 1]
or [-1, 1] range, meaning that the blend lowering logic can't directly
apply to those. Assert that the format being passed is not a scaled
format.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The caller doesn't necessarily want to lower blend operations on all
render targets since some of them might be supported natively (panvk
will be in that case). Let's just skip RTs that have a format set to
PIPE_FORMAT_NONE to allow that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
nir_ssa_for_src() is not supposed to pad the src vector if
dst->num_components > src->num_components. Let's pad things explicitly
with nir_pad_vector().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
The NIR validation complains if the swizzle accesses a component that's
not present in the source.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13060>
That way we can get the address to the entry, which is needed for
some nir builtins because extra data in the entry can be used as
shader input.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12592>
These are I/O variables which are not going to be removed anyway.
However, get_variable_io_mask handles their location incorrectly.
Found using the GCC undefined behavior sanitizer.
Fixes the following error:
runtime error:
shift exponent 4294967258 is too large
for 64-bit type 'long unsigned int'
Closes: #5319
Fixes: cf5f8f55c3
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12719>
Most modern hardware needs the edge flag added as a hidden vertex input
and needs code added to the vertex shader to copy the input to an
output. Intel hardware is a little different. Gfx4 and Gfx5 hardware
works in the previously described mannter. Gfx6+ hardware needs the
edge flag as a specific vertex shader input, and that input is magically
processed by fixed-function hardware without need for extra shader code.
This flag signals only that the vertex shader input is needed. It would
be nice if we could decouple adding the vertex shader input from
generating the copy-to-output code, but that has proven to be
challenging. Not having that code causes other passes to want to
eliminate that shader input.
v2: Convert conditional to assertion. This pass is only called for
vertex shaders. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
Now that they're no longer ralloc'd, we have to be much more careful
about indirects. We have to make sure every time a source or
destination is overwritten, its indirect (if any) is freed. We also
have to choose a memory ownership convention for the rewrite functions.
Assuming that they will be called with the source from some other
instruction, we choose to always make a copy of the indirect (if any).
It's the responsibility of the caller to ensure its copy of the indirect
is freed.
Unfortunately, all this extra logic is going to make
nir_instr_rewrite/move_src/dest more expensive because they now have
all the logic of nir_src/dest_copy instead of a simple struct
assignment. Fortunately, the vast majority of rewrite calls are done by
nir_ssa_def_rewrite_uses which is an SSA-only fast-path.
Fixes: 879a569884 "nir: Switch from ralloc to malloc for NIR instructions."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>
By replacing the 48-byte ralloc header with our exec_node gc_node (16
bytes), runtime of shader-db on my system across this series drops
-4.21738% +/- 1.47757% (n=5).
Inspired by discussion on #5034.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Right now we're using ralloc to GC our NIR instructions, but ralloc has
significant overhead for its recursive nature so it would be nice to use a
simpler mechanism for GCing instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
We were using the ralloc parent in some places, which should work out to
be the shader I think, but to de-ralloc the instrs we should just pass the
existing shader pointer in.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
This code was being tricky with passing a mem_ctx instead of the shader,
then freeing the mem_ctx when the pass was done and all the parallel
copies had been removed from the shader. Use the right type for instr
creation and do a bit of manual list management to prepare the way for
non-ralloc NIR instrs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Calling this lower pass twice in a row would cause spurious
set_vertex_and_primitive_count(0, undef) intrinsics after the proper
set_vertex_and_primitive_count intrinsic. This pretty much turns any
geometry shader into garbage.
Fix this by treating nir_intrinsic_emit_vertex_with_counter and
nir_intrinsic_end_primitive_with_counter just like the non-_with_counter
versions. If no blocks would need set_vertex_and_primitive_count
intrinsics added, exit the pass before doing any work. This prevents
the need for DCE to do extra clean up later.
Since this pass is potentially called multiple times via multiple
invocations of a finalize_nir callback, it is (hypothetically?) possible
that control flow could be changed to add new blocks that need this
intrinsic. The check implemented in this commit should be robust
against that possibility.
v2: Add a_block_needs_set_vertex_and_primitive_count. Suggested by
Timur.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: e76ae39ae2 ("nir: add support for user defined select control")
Fixes: b56451f82c ("nir: add support for user defined loop control")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12778>
Driver like radeonsi load varying in a scalar manner, so prefer to pack
varying with different interpolation qualifier into same slot to save
space.
But driver like panfrost/bifrost can load varying in vector manner,
so prefer to pack varying with same interpolation qualifier.
Driver can add interpolation qualifiers which are able to be
packed into same varying slot to pack_varying_options nir option.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>
nir_foreach_src() bails after cb returns false for any src. Which isn't
the behavior we were looking for. Move progress flag to state struct
instead, so we don't skip visiting some sources.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12732>
Normal UBOs have explicit strides on them, make our lowered one behave the
same.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
Without this, there's no way to match the UBO nir_variable declarations to
the load_ubo intrinsics referencing their data.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.
Fossil DB changes on Sienna Cichlid:
Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)
Fossil DB changes on Sienna Cichlid (with NGGC enabled):
Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%
Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
These won't work since a workgroup can span more than one thread, and
the temporaries are not shared memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
Mesh shader outputs are either:
- non-array builtins
- array builtins that are either per-primitive or per-vertex
- user-defined outputs that must be either per-primitive or per-vertex
So we can identify any array output as "arrayed" for the purposes of
I/O lowering.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
Per-primitive is similar to per-vertex attributes, but applies to all
fragments of the primitive without any interpolation involved.
Because they are regular input and outputs, keep track in shader_info
of which I/O is per-primitive so we can distinguish them after deref
lowering. These fields can be used combined with the regular
`inputs_read`, `outputs_written` and `outputs_read`.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
We can't append instructions following a return/halt instruction
because the control flow helpers will modify the successor of the
block containing the return/halt. And the NIR validator enforces that
the return/halt must have the end of the function as successor.
This tends to happen following lower_shader_calls lowering which
inserts halts. This probably doesn't prevent the optimization, it'll
just happen in one of the return shaders after the halt has been
removed.
v2: Move prev block ending check earlier in the function (Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>
v2: Fix copy-and-paste bugs in lowering patterns.
v3: Add has_sudot_4x8 flag. Requested by Rhys.
v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
v2: Add and modify patterns to let constant folding do better.
v3: Remove '(is_not_zero)' from the patterns that try to combine
addends. I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in. Noticed by
Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by
Alyssa and Jason. Add missing sudot_4x8 patterns.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat. These
represent the combinations of integer dot-product and add that operate
on packed source vectors. That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.
Some hardware may prefer to operate on unpacked byte vectors. When such
hardware comes to Mesa, we'll have to figure out how to name things.
v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions. These opcodes
are not 2-source commutative.
v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur. Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts. All suggested by Jason.
v4: Fix typo previously noticed by Alyssa but missed in v2.
v5: Add has_sudot_4x8 flag. Requested by Rhys.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Without this, lowered saturating ALU instructions would only clamp to
the range of the new type instead of the range of the old type.
v2: Use nir_iclamp. Suggested by Jason. Use new
u_{int,uint}N_{min,max}() helpers.
Fixes: 090e282407 ("nir: Add a saturated unsigned integer add opcode")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
This creates a single nir_op_vecn instead of a nir_op_vecn and several
copies.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12469>
Be able to inline uniforms in loop for unrolling it.
Nested loop/if is also supported.
Some example:
for (i = 0; i < count; i++)
...
uniform "count" will be inlined. But note this does not
make sure the loop will be unrolled (ie. count = 1000).
for (i = 0; i < count; i++)
for (j = init; j < 10; j++)
if (type == 2)
...
uniform "count", "init" and "type" will be inlined.
It is intentional to not be too aggressive to add uniforms
to avoid false positive case while be able to support most
common usage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Instead of fail in trip count calculation, just don't mark such
kind of variable as induction from the beginning.
Don't bother inline uniform to deal with such kind of variable
either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Collect per vector component dependency and lower vector uniform
load to scalar if any component need to be inlined.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
Unless all uniforms in the condition can be inlined we can
lower the if/loop. So we rollback added uniforms when one
of uniforms in a if condition fail to be added.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results
in a lot of shifts and MOVs. When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.
v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Not only does this eliminate a bunch of unnecessary type converting
MOVs, but it can also enable some SWAR. The
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does
something about like:
c = a.x ^ b.x;
d = a.y ^ b.y;
e = a.z ^ b.z;
After this change, it looks more like:
uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b);
c = extract_u8(t, 0);
d = extract_u8(t, 1);
e = extract_u8(t, 2);
On Ice Lake, this results in:
SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
This eliminates some spurious, size-converting moves. For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:
SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
v2: Condition two of the patterns on !options->lower_extract_byte.
Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Enabling GVN uncovered a bug where we would crash if the pass
thinking about pushing something into a loop.
Fixes: 6538b3e566 ("nir: add heuristic for instructions in loops with GCM")
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12242>
This shouldn't do anything but will make testing a later patch easier.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
This reverts commit ad363913e6.
This code was added to be able to compare the output file while porting
the script from python2 to python3, but this has long been finished and
the extra complexity is not needed anymore.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>
If a loop is followed by a barrier, the ordering between a load inside
the loop and other memory operations after the barrier may have to be
preserved depending on the type of memory involved. This is relevant
when the memory is writeable by other invocations. In such case, it
is not valid to completely eliminate the loop.
This commit doesn't attempt to precisely catch the barrier case, as
analysis could become too complex. It simply assumes it can't drop
the loops that contain certain types of loads unless those are known
to be safe to reorder (via the access flag).
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4475
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938>
Without this stitch_blocks complains about ending in a jump with a
non-empty block after the inserted body.
I hit this with CTS raytracing tests where we tried to inline a
function that basically ended up being something like
{
ignore_ray_intersection
halt
}
I kept the nop path when possible as that does not leave a mess
for the optimization loop to optimize.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12163>
Need to avoid lowering temps when they are used by other instructions,
like the rt instructions (some of the shader call parameters get
converted to temp variables and we will lower them later with
the explicit io lowering pass as we need to guarantee they will
end up in scratch).
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12162>
gl_FragDepth default value is gl_FragCoord.z so if a shader does:
gl_FragDepth = gl_FragCoord.z
we can drop this assignment.
v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth
is wrote multiple times (Jason)
v3: - move to its own pass (Jason)
- handle var = NULL (Rhys)
v4: refactoring (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>
Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666,
the assumption fundamental to this optimization is false. Section
2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation.
The wording of the section is somewhat confusing (because it doesn't
clearly delineate between signed and unsigned integers), but the last
two rows of the table make it clear that F->UD conversion clamps
negative float values to 0.
All other hardware mentioned in that thread seems to behave the same
way.
The real problem is that, with hardware that behaves in this ways,
converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern
that would be produced from 0x80000000 to 0x7fffffff.
This reverts commit ad05920258.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
Otherwise drivers that don't use 16-bit slots for varyings will get
confused and have their driver_locations scribbled over. This has caused
multiple problems for both Panfrost and Asahi this week. Given the only
other user of the pass for varyings is radeonsi, which needs both
together, I think this is the least controversial fix.
Fixes: fb29cef8dd ("nir: add many passes that lower and optimize 16-bit input/outputs and samplers")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11732>
Both nir_opt_algebraic and nir_opt_idiv_const have optimizations for
umod/imod/irem by constants.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
This lowering is smaller and -INT64_MIN is probably UB (signed overflow).
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
ineg(INT_MIN)/iabs(INT_MIN) won't work as expected.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
If "a" is a multiple of "b", then the result would have been "b" instead
of 0.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0ef5f3552f ("nir: add strength reduction pattern for imod/irem with pow2 divisor.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
The previous handling conflated RelPatchID and PrimID, which would
result in incorrect gl_PrimitiveID when doing draw splitting and didn't
work with PrimID passthrough which fills the VPC slot with the "correct"
PrimID value from the tess factor BO which we left 0. Replace PrimID in
the tess lowering pass with a new RelPatchID sysval, and relace PrimID
with RelPatchID in the VS input code in turnip/freedreno at the same
time so that there is no net change in the tess lowering code. However,
now we have to add new mechanisms for getting the user-level PrimID:
- In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
This means we have to add another register to our VS->TCS ABI. I
decided to put PrimID in r0.z, after the TCS header and RelPatchID,
because it might not be read in the TCS.
- If any stage after the TCS uses PrimID, the TCS stores it in the first
dword of the tess factor BO, and it is read by the fixed-function
tessellator and accessed in the TES via the newly-uncovered DSPRIMID
field. If we have tess and GS, the TES passes this value through to
the GS in the same way as the VS does. PrimID passthrough for reading
it in the FS when there's tess but no GS also "just works" once we
start storing it in the TCS. In particular this fixes
dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
The liveness information will be a superset of real liveness so it's
unlikely something will explode if it tries to use it. However, it is
out-of-date and should be re-run if someone really wants it.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12186>
Many places need to know the maximum or minimum possible value for a
given size integer... so everyone just open-codes their favorite
version. There is some potential to hit either undefined or
implementation-defined behavior, so having one version that Just Works
seems beneficial.
v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in
u_intmin. Noticed by CI. Lol. Rename functions
`s/u_(uint|int)(min|max)/u_\1N_\2/g`. Suggested by Jason. Add some
unit tests that would have caught the copy-and-paste bug before wasting
CI time. Change the implementation of u_intN_min to use the same
pattern as stdint.h. This avoids the integer division. Noticed by
Jason.
v3: Add changes to convert_clear_color
(src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.
Fossil DB results on Sienna Cichlid (with NGGC on):
Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%
No Fossil DB changed with NGGC off.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
For TGSI, we need the coordinate, comparator, bias, and LOD all together
in the first two vec4 args, and by doing it in the backend we were
generating extra MOVs.
softpipe shader-db results:
total instructions in shared programs: 2985416 -> 2953625 (-1.06%)
instructions in affected programs: 499937 -> 468146 (-6.36%)
total temps in shared programs: 544769 -> 565869 (3.87%)
temps in affected programs: 105469 -> 126569 (20.01%)
i915g shader-db:
total instructions in shared programs: 371625 -> 369594 (-0.55%)
instructions in affected programs: 24903 -> 22872 (-8.16%)
total tex_indirect in shared programs: 11381 -> 11365 (-0.14%)
tex_indirect in affected programs: 43 -> 27 (-37.21%)
LOST: 7
GAINED: 16
The temps increase is the pre-existing issue that we never release temps
for NIR regs, which doesn't matter much for softpipe (just memory/cache
footprint) but does for i915g as seen by shaders that no longer compile
(though overall we seem to win).
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11912>
This patch allows to shrink vecN instructions where
one or more components at any position are unused.
Stat changes for softpipe:
total instructions in shared programs: 2986101 -> 2985416 (-0.02%)
instructions in affected programs: 51216 -> 50531 (-1.34%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
ALU instructions of which not all components are read,
can be shrunk to the number of read components.
Previously, this would only remove trailing components.
This patch enables to remove components from any position.
Stat changes for softpipe:
total instructions in shared programs: 3001291 -> 2984698 (-0.55%)
instructions in affected programs: 225585 -> 208992 (-7.36%)
total loops in shared programs: 1389 -> 1358 (-2.23%)
loops in affected programs: 36 -> 5 (-86.11%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11411>
Only fragment and some compute shaders support implicit derivatives.
They're totally meaningless without helper invocations and some
understanding of the dispatch pattern. We've got code to lower
nir_texop_tex in these shader stages to use an explicit derivative of 0
but it was pretty badly broken:
1. It only handled nir_texop_tex, not nir_texop_txb or nir_texop_lod.
2. It didn't take min_lod into account
3. It was conflated with adding a missing LOD parameter to opcodes
which expect one such as nir_texop_txf. While not really a bug,
this does make it way harder to reason about the code.
4. Unless you set a flag (which most drivers don't), it left the
opcode nir_texop_tex instead of nir_texop_txl which it should have
been.
This reworks it to go through roughly the same path as other LOD
lowering only with a constant lod of 0 instead of calling out to
nir_texop_lod. We also get rid of the lower_tex_without_implicit_lod
flag because most drivers set it and those that don't are probably
subtly broken. If someone really wants to get nir_texop_tex in their
vertex shaders, they can write a new patch to add the flag back in.
Fixes: e382890e25 "nir: set default lod to texture opcodes that..."
Fixes: d5ac5d6e83 "nir: Add option to lower tex to txl when..."
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
It's intel-specific, used to get at MSAA compression information.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>