Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
This pattern, found in the FSR upscaling shader,
helps the vectorization efforts by keeping the
chain of vectorized instructions intact.
Radeon can optimize it to per-component fneg modifiers.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>
Two 64-bit shifts and an addition are usually faster than the several
multiplications nir_lower_int64 creates.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
No shader-db changes on any Intel platform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
v2: Fix copy-and-paste bugs in lowering patterns.
v3: Add has_sudot_4x8 flag. Requested by Rhys.
v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
v2: Add and modify patterns to let constant folding do better.
v3: Remove '(is_not_zero)' from the patterns that try to combine
addends. I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in. Noticed by
Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by
Alyssa and Jason. Add missing sudot_4x8 patterns.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results
in a lot of shifts and MOVs. When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.
v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Not only does this eliminate a bunch of unnecessary type converting
MOVs, but it can also enable some SWAR. The
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does
something about like:
c = a.x ^ b.x;
d = a.y ^ b.y;
e = a.z ^ b.z;
After this change, it looks more like:
uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b);
c = extract_u8(t, 0);
d = extract_u8(t, 1);
e = extract_u8(t, 2);
On Ice Lake, this results in:
SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
This eliminates some spurious, size-converting moves. For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:
SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
v2: Condition two of the patterns on !options->lower_extract_byte.
Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666,
the assumption fundamental to this optimization is false. Section
2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation.
The wording of the section is somewhat confusing (because it doesn't
clearly delineate between signed and unsigned integers), but the last
two rows of the table make it clear that F->UD conversion clamps
negative float values to 0.
All other hardware mentioned in that thread seems to behave the same
way.
The real problem is that, with hardware that behaves in this ways,
converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern
that would be produced from 0x80000000 to 0x7fffffff.
This reverts commit ad05920258.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>
If "a" is a multiple of "b", then the result would have been "b" instead
of 0.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0ef5f3552f ("nir: add strength reduction pattern for imod/irem with pow2 divisor.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>
This patch also adds has_iadd3 bit to give more control if backend
supports ternary add instruction or not.
v2:
- Add patterns in late optimization (Connor Abbott)
Suggested-by: Alyssa/Jason
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
The flt version could have been added in 56e21647e2, but our
collective understanding of NaN and comparisons was poor in 2015. The
new "is_a_number" predicate makes the others possible.
All of the helped shaders in shader-db are either from Mad Max or Skia.
Some of the Skia shaders just get decimated by this change:
instructions helped: shaders/skia/580-4.shader_test FS SIMD8: 81 -> 29 (-64.20%) (scheduled: top-down)
I looked at a couple of those shaders, and they had sequences like:
vec1 32 ssa_44 = flt32 ssa_32, ssa_32
vec1 32 ssa_45 = b32csel ssa_44, ssa_43, ssa_0
vec1 32 ssa_46 = fge32 ssa_32, ssa_32
vec1 32 ssa_47 = b32csel ssa_46, ssa_0, ssa_45
vec1 32 ssa_48 = iand ssa_46, ssa_44
vec1 32 ssa_49 = b32csel ssa_48, ssa_43, ssa_0
ssa_44 is replaced with False. Then ssa_47 selects between ssa_0 and
ssa_0, so ssa_47 and ssa_46 are eliminated. ssa_48 is (False && don't
care), so ssa_48 and ssa_49 are eliminated. After that, many
calculations now involve constants of zero, so they are optimized down
too. So it continues until there's not much left!
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21072238 -> 21071386 (<.01%)
instructions in affected programs: 33722 -> 32870 (-2.53%)
helped: 146
HURT: 1
helped stats (abs) min: 1 max: 62 x̄: 5.84 x̃: 2
helped stats (rel) min: 0.19% max: 62.35% x̄: 4.09% x̃: 1.07%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.20% max: 0.20% x̄: 0.20% x̃: 0.20%
95% mean confidence interval for instructions value: -7.94 -3.65
95% mean confidence interval for instructions %-change: -5.87% -2.25%
Instructions are helped.
total cycles in shared programs: 856203326 -> 856192238 (<.01%)
cycles in affected programs: 749966 -> 738878 (-1.48%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 1226 x̄: 74.92 x̃: 18
helped stats (rel) min: 0.07% max: 49.70% x̄: 2.69% x̃: 0.46%
95% mean confidence interval for cycles value: -104.82 -45.02
95% mean confidence interval for cycles %-change: -4.01% -1.37%
Cycles are helped.
LOST: 4
GAINED: 0
Fossil-db results:
Tiger Lake
Instructions in all programs: 160915223 -> 160898354 (-0.0%)
SENDs in all programs: 6812780 -> 6812780 (+0.0%)
Loops in all programs: 38340 -> 38340 (+0.0%)
Cycles in all programs: 7434144207 -> 7433978462 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304537 -> 304537 (+0.0%)
Ice Lake
Instructions in all programs: 145296298 -> 145279531 (-0.0%)
SENDs in all programs: 6863692 -> 6863692 (+0.0%)
Loops in all programs: 38334 -> 38334 (+0.0%)
Cycles in all programs: 8800257014 -> 8800088384 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334248 -> 334248 (+0.0%)
Skylake
Instructions in all programs: 135891664 -> 135874910 (-0.0%)
SENDs in all programs: 6802946 -> 6802946 (+0.0%)
Loops in all programs: 38331 -> 38331 (+0.0%)
Cycles in all programs: 8444273433 -> 8444130932 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301114 -> 301114 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
If the values are known to be numbers, the the replacements are exact.
This is only applied to the patterns with constants. Constants should
always be numbers, and shaders with NaN constants should be handled in a
different way.
No shader-db or fossil-db changes on any Intel platform. The intention
is to make these patterns more future proof.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Haswell and later Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049056 -> 21048939 (<.01%)
instructions in affected programs: 4716 -> 4599 (-2.48%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.99% max: 5.43% x̄: 2.80% x̃: 2.51%
95% mean confidence interval for instructions value: -3.46 -2.54
95% mean confidence interval for instructions %-change: -3.22% -2.38%
Instructions are helped.
total cycles in shared programs: 855141411 -> 855141159 (<.01%)
cycles in affected programs: 54491 -> 54239 (-0.46%)
helped: 28
HURT: 5
helped stats (abs) min: 2 max: 34 x̄: 12.82 x̃: 12
helped stats (rel) min: 0.06% max: 2.73% x̄: 0.94% x̃: 0.75%
HURT stats (abs) min: 2 max: 52 x̄: 21.40 x̃: 6
HURT stats (rel) min: 0.11% max: 2.46% x̄: 0.90% x̃: 0.56%
95% mean confidence interval for cycles value: -13.72 -1.55
95% mean confidence interval for cycles %-change: -1.01% -0.31%
Cycles are helped.
Tiger Lake
Instructions in all programs: 160902191 -> 160899554 (-0.0%)
SENDs in all programs: 6812435 -> 6812435 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7428581420 -> 7428555881 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
A lot of fragment shaders in Shadow of the Tomb Raider were helped, and
a bunch of vertex shaders in Octopath Traveler were hurt.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
It seems worth the small amount of damage to give an extra cushion of
not having to debug problems later.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043197 -> 21043359 (<.01%)
instructions in affected programs: 4409 -> 4571 (3.67%)
helped: 0
HURT: 25
HURT stats (abs) min: 1 max: 16 x̄: 6.48 x̃: 5
HURT stats (rel) min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40%
95% mean confidence interval for instructions value: 4.37 8.59
95% mean confidence interval for instructions %-change: 2.93% 6.26%
Instructions are HURT.
total cycles in shared programs: 856175986 -> 856176921 (<.01%)
cycles in affected programs: 58908 -> 59843 (1.59%)
helped: 0
HURT: 25
HURT stats (abs) min: 7 max: 70 x̄: 37.40 x̃: 38
HURT stats (rel) min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39%
95% mean confidence interval for cycles value: 31.11 43.69
95% mean confidence interval for cycles %-change: 1.35% 2.39%
Cycles are HURT.
No fossil-db changes on any Intel platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
When most of these patterns were created, we believed, incorrectly, that
fsat(NaN) was NaN. We have since realized that fsat(NaN) is zero.
Originally, this changed the patterns to use is_a_number. This didn't
help any shaders, so it's easier to just drop the optimizations.
This commit crossed paths with 4c3ad4d065 ("nir/algebraic: mark more
optimization with fsat(NaN) as inexact") and bc123c396a
("nir/algebraic: mark some optimizations with fsat(NaN) as inexact").
Given that these don't impact very many shaders, it seems safer to just
remove them.
As discussed in
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716, I tried
modifying these patterns to use !(b cmp a). Unfortunately, on Intel
GPUs, the results were much worse than just removing the patterns
altogether.
Some other related patterns will be addressed in later commits.
There are still a number of patterns that use the identity fsat(1-X) ==
1 - fsat(X). If X is NaN, the former is zero while the latter is 1.0.
I haven't evaluted these patterns yet. If changes are needed in these
patterns, it should be a separate commit anyway.
v2: Replace arrow `=>` with `->` in comments because the `=>` looks a
lot like `<=` comparison. Suggested by Rhys.
Fixes: 92b75c126b ("nir/algebraic: Replace checks that a value is between (or not) [0, 1]")
Fixes: a7f0c57673 ("nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
All Intel hardware had similar results. (Ice Lake shown)
total instructions in shared programs: 20029060 -> 20029670 (<.01%)
instructions in affected programs: 69236 -> 69846 (0.88%)
helped: 0
HURT: 263
HURT stats (abs) min: 1 max: 20 x̄: 2.32 x̃: 1
HURT stats (rel) min: 0.30% max: 11.11% x̄: 1.35% x̃: 0.98%
95% mean confidence interval for instructions value: 1.86 2.78
95% mean confidence interval for instructions %-change: 1.18% 1.52%
Instructions are HURT.
total cycles in shared programs: 979821278 -> 979834425 (<.01%)
cycles in affected programs: 1476848 -> 1489995 (0.89%)
helped: 49
HURT: 204
helped stats (abs) min: 1 max: 812 x̄: 102.31 x̃: 20
helped stats (rel) min: 0.01% max: 21.43% x̄: 2.23% x̃: 0.52%
HURT stats (abs) min: 2 max: 2600 x̄: 89.02 x̃: 16
HURT stats (rel) min: 0.04% max: 27.27% x̄: 1.49% x̃: 0.72%
95% mean confidence interval for cycles value: 13.18 90.75
95% mean confidence interval for cycles %-change: 0.29% 1.25%
Cycles are HURT.
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
These are equivalent to the 32bit opcodes if there are no more efficient
24bit opcodes available, but inputs are guaranteed to already be 24bit,
so the 24bit opcodes can be used instead if they exist and are efficient.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10549>
For unsigned comparisons with zero these ops can be eliminated.
v2: Add comparison optimizations with -1 (Rhys Perry)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10583>
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.
Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
Some backends, like r600 support a fused version of int and float compare
against zero and and csel. Adding these opcodes here makes it possible to
optimize this in nir.
v2: Add rules for float compare + csel
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>
Some hardware supports a version of find_msb where the bits are counted
starting at the high bit, and this needs some lowering to obtain the
value that is expected by *find_msb
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>