Commit Graph

433 Commits

Author SHA1 Message Date
Emma Anholt f6c5b1d6c6 nir: Split usub_sat lowering flag from uadd_sat.
Intel vec4 would like to do uadd_sat, but use lowering for usub_sat.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>
2022-07-22 17:54:28 +00:00
Georg Lehmann aac8ddae2f nir/opt_algebraic: Optimize [ui](add|sub)_sat with 0.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17468>
2022-07-13 07:34:09 +00:00
Rhys Perry bc1ea2fda9 nir/algebraic: optimize bcsel(c, fsin/cos_amd(a), fsin/cos_amd(b))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10587>
2022-07-07 22:18:08 +00:00
Ian Romanick a2a2fbc510 nir/algebraic: Fix NaN-unsafe fcsel patterns
For example, the proof for this pattern

    (('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)),

would be

    bcsel(a < 0, b, c)
    bcsel(!(a < 0), c, b)
    bcsel(a >= 0, c, b)
    fcsel_ge(a, c, b)

However, !(a < 0) => (a >= 0) is well known to produce different
results if `a` is NaN.

Instead of that replacement, use this replacement:

    bcsel(a < 0, b, c)
    bcsel(-0 < -a, b, c)
    bcsel(0 < -a, b, c)
    fcsel_gt(-a, b, c)

This is NaN-safe and exact.

Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Fixes: 0f5b3c37c5 ("nir: Add opcodes for fused comp + csel and optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
2022-06-22 19:26:59 +00:00
Georg Lehmann bfc25d6ec9 nir: Add optional lowering for mul_32x16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13895>
2022-06-01 17:09:25 +00:00
Jason Ekstrand 836ff4b586 nir/algebraic: Add two more pack/unpack rules
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16591>
2022-05-23 14:10:54 +00:00
Gert Wollny 3749a6ecd2 nir: honor lower_double options for ffloor and ffract
v2: Don't lower ffloor@64 to ffract@64 when both ops are
    to be lowered. Settle on ffloor in opt_algebraic because
    in can be lowered to other ops in lower_double_ops.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>(v1)
Jason Ekstrand <jason.ekstrand@collabora.com> (v1)

Reviewed-by: Emma Anholt <emma@anholt.net> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16431>
2022-05-16 15:03:05 +00:00
Georg Lehmann bc5c68fc08 nir/opt_algebraic: Optimize Doom Eternal's word extract by LSB.
Foz-db GFX10_3:
Totals from 419 (0.31% of 134913) affected shaders:
CodeSize: 4126032 -> 4121756 (-0.10%)
Instrs: 783608 -> 782541 (-0.14%)
Latency: 7889664 -> 7888521 (-0.01%); split: -0.02%, +0.00%
InvThroughput: 1315690 -> 1314863 (-0.06%); split: -0.06%, +0.00%
VClause: 11826 -> 11830 (+0.03%)
SClause: 27736 -> 27734 (-0.01%)
Copies: 50493 -> 50428 (-0.13%); split: -0.13%, +0.01%
PreSGPRs: 23264 -> 23265 (+0.00%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16436>
2022-05-12 17:10:41 +00:00
Jason Ekstrand df1876f615 nir: Mark negative re-distribution on fadd as imprecise
Otherwise, it would mutate `fneg(fadd(-0, 0))` into `fadd(0, -0)` which
isn't correct since -0 + (+0) = +0 + (-0) = +0.

This fixes the OpenCL contraction tests on Iris.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16041>
2022-05-12 00:05:10 +00:00
Georg Lehmann 60c9a45562 nir/opt_algebraic: Simple xor/ishr optimizations.
The first pattern here removes the xor-swap pattern.

Foz-DB GFX10_3:
Totals from 305 (0.23% of 134913) affected shaders:
CodeSize: 1589040 -> 1585164 (-0.24%)
Instrs: 284344 -> 283375 (-0.34%)
Latency: 4205148 -> 4198472 (-0.16%); split: -0.16%, +0.00%
InvThroughput: 708745 -> 708739 (-0.00%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16411>
2022-05-10 19:29:31 +00:00
Georg Lehmann 66e917fff6 nir/opt_algebraic: Fix mask in shift by constant combining.
The comment above is correct, but the code to calculate the mask was broken.

No Foz-db changes outside of noise.

Fixes: 0e6581b87d ("nir/algebraic: Reassociate shift-by-constant of shift-by-constant")
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15990>
2022-05-10 18:47:21 +00:00
Karol Herbst a2c9e1cb50 nir: add 16 and 64 bit fisnormal lowering
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16206>
2022-04-28 18:36:52 +00:00
Gert Wollny 47d3f7c69f nir: Don't optimize to 64 bit fsub if the driver doesn't support it
Fixes: a4840e15ab
  r600: Use nir-to-tgsi instead of TGSI when the NIR debug opt is disabled.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16130>
2022-04-27 00:01:20 +00:00
Jason Ekstrand 1755730362 nir: Lower all bit sizes of usub_borrow
It's not clear why this is restricted to 32-bit besides that being the
only bit size where GLSL has an intrinsic for this.  All drivers that
set this probably want it lowered for all bit sizes as far as I can
tell.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6353
Fixes: 8a3e344180 ("nir/opt_algebraic: Fix some expressions with ambiguous bit sizes")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16146>
2022-04-25 21:27:09 +00:00
Emma Anholt e4aa5f7889 nir: Skip fround_even on already-integral values.
Just like the other make-the-float-an-integer opcodes.  Noticed in a
gallium nine shader run through TGSI-to-NIR, where the array index had
been floored by the user, but got implicitly rounded by DX9 array
indexing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>
2022-04-16 13:07:09 -07:00
Emma Anholt 6947016b46 nir: Add lowering for fround_even on r300.
When we put NIR in the compiler stack for r300, indirect addressing broke
for gallium nine.  DX's array indirects round the float value, so the DX
shader gets mapped to a TGSI "ARR ADDR[0] src.x" instruction.  Translating
that to NIR maps to r0[f2i32(fround(src.x))].  While we might hope that in
translation back using nir-to-tgsi after optimization we would recognize
the construct and emit ARR again, that's going to be error prone (think
"what if src.x is in a NIR register?") so we need a fallback plan.  r300
will be able to handle this lowering, so get it in place first to fix the
regression.

Fixes: #6297
Fixes: 7d2ea9b0ed ("r300: Request NIR shaders from mesa/st and use NIR-to-TGSI.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>
2022-04-16 13:07:09 -07:00
Georg Lehmann 16be909936 nir: Add an option to lower 64bit iadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Georg Lehmann 922916bf64 nir: Move lower_usub_sat64 to nir_lower_int64_options.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Ian Romanick 06eb9fb125 nir/algebraic: Optimize some cases of (sXX(a, b) != 0.0)
I noticed the SGE case while looking at the output of
shaders/closed/steam/trine-2/fp-3.shader_test on i915g.  These are
especially bad on i915 that needs two instructions to implement SNE.

An alternative would be to duplicate the sne(sXX(a, b), 0.0) rules in an
algebraic pass that occurs after bool_to_float.  Doing the work earlier
seems preferable.

i915
total instructions in shared programs: 788274 -> 788223 (<.01%)
instructions in affected programs: 666 -> 615 (-7.66%)
helped: 5
HURT: 0
helped stats (abs) min: 9 max: 12 x̄: 10.20 x̃: 9
helped stats (rel) min: 5.00% max: 11.11% x̄: 8.12% x̃: 8.16%
95% mean confidence interval for instructions value: -12.24 -8.16
95% mean confidence interval for instructions %-change: -10.81% -5.43%
Instructions are helped.

LOST:   0
GAINED: 2

The two gained shaders are assembly fragment programs in Euro Truck
Simulator 2.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>
2022-03-03 00:07:58 +00:00
Bas Nieuwenhuizen d1530a3f3b Revert "nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants"
This reverts commit a1af902531.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5423
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14532>
2022-01-21 16:58:11 +00:00
Rhys Perry 495debebad nir/algebraic: optimize expressions using fmulz/ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry f2fbba7920 nir/algebraic: optimize open-coded fmulz/ffmaz
This pattern will be found in future versions of D3D9 DXVK.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry 312a284980 nir/algebraic: add ignore_exact() wrapper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Rhys Perry 7f05ea3793 nir: add nir_op_fmulz and nir_op_ffmaz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>
2022-01-20 22:54:42 +00:00
Connor Abbott 9c9e8c3349 nir: Reorder ffma and fsub combining
It's relatively common to do something like "a * b - c", which on most
GPUs can be implemented in a single instruction. Before
opt_algebraic_late this will be something like
"fadd(fmul(a, b), fneg(c))", and we want to turn it info
"ffma(a, b, fneg(c))". But because the fsub pattern was first we
instead turned it into "fsub(fmul(a, b), c)". Fix this by reordering
them.

Selected shader-db results on freedreno:

total instructions in shared programs: 1561330 -> 1551619 (-0.62%)
instructions in affected programs: 780272 -> 770561 (-1.24%)
helped: 1941
HURT: 491
helped stats (abs) min: 1 max: 147 x̄: 7.98 x̃: 4
helped stats (rel) min: 0.07% max: 30.77% x̄: 4.36% x̃: 3.17%
HURT stats (abs)   min: 1 max: 307 x̄: 11.76 x̃: 5
HURT stats (rel)   min: 0.09% max: 18.71% x̄: 2.26% x̃: 1.38%
95% mean confidence interval for instructions value: -4.57 -3.41
95% mean confidence interval for instructions %-change: -3.21% -2.84%
Instructions are helped.

total nops in shared programs: 358926 -> 356263 (-0.74%)
nops in affected programs: 167116 -> 164453 (-1.59%)
helped: 1395
HURT: 859
helped stats (abs) min: 1 max: 108 x̄: 6.80 x̃: 3
helped stats (rel) min: 0.17% max: 100.00% x̄: 19.15% x̃: 10.57%
HURT stats (abs)   min: 1 max: 307 x̄: 7.95 x̃: 3
HURT stats (rel)   min: 0.00% max: 381.82% x̄: 20.04% x̃: 10.00%
95% mean confidence interval for nops value: -1.77 -0.59
95% mean confidence interval for nops %-change: -5.55% -2.87%
Nops are helped.

total non-nops in shared programs: 1202404 -> 1195356 (-0.59%)
non-nops in affected programs: 496682 -> 489634 (-1.42%)
helped: 1951
HURT: 265
helped stats (abs) min: 1 max: 39 x̄: 4.02 x̃: 3
helped stats (rel) min: 0.07% max: 15.38% x̄: 2.97% x̃: 1.96%
HURT stats (abs)   min: 1 max: 22 x̄: 2.97 x̃: 2
HURT stats (rel)   min: 0.05% max: 10.00% x̄: 1.14% x̃: 0.75%
95% mean confidence interval for non-nops value: -3.38 -2.99
95% mean confidence interval for non-nops %-change: -2.60% -2.36%
Non-nops are helped.

total systall in shared programs: 288317 -> 292975 (1.62%)
systall in affected programs: 87876 -> 92534 (5.30%)
helped: 388
HURT: 431
helped stats (abs) min: 1 max: 214 x̄: 14.39 x̃: 8
helped stats (rel) min: 0.25% max: 100.00% x̄: 22.12% x̃: 11.96%
HURT stats (abs)   min: 1 max: 232 x̄: 23.77 x̃: 7
HURT stats (rel)   min: 0.00% max: 1300.00% x̄: 51.71% x̃: 17.30%
95% mean confidence interval for systall value: 3.07 8.30
95% mean confidence interval for systall %-change: 9.49% 23.97%
Systall are HURT.

(The systall hurt is probably just due to having having fewer
instructions to hide latency with.)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14554>
2022-01-18 17:44:50 +00:00
Danylo Piliaiev b8d486f298 nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
2022-01-10 13:20:39 +02:00
Daniel Schürmann 17ecd0b31a nir/opt_algebraic: lower fneg_hi/lo to fmul
This pattern, found in the FSR upscaling shader,
helps the vectorization efforts by keeping the
chain of vectorized instructions intact.
Radeon can optimize it to per-component fneg modifiers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>
2021-12-21 13:23:37 +01:00
Rhys Perry 403ae3b48e nir/algebraic: optimize more 64-bit imul with constant source
Two 64-bit shifts and an addition are usually faster than the several
multiplications nir_lower_int64 creates.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>
2021-12-17 18:51:24 +00:00
Rhys Perry a2d8c5b26d nir/algebraic: optimize a*#b & -4
fossil-db (Sienna Cichlid):
Totals from 611 (0.47% of 128647) affected shaders:
CodeSize: 3096680 -> 3090976 (-0.18%)
Instrs: 570494 -> 569249 (-0.22%)
Latency: 5765865 -> 5759619 (-0.11%)
InvThroughput: 969840 -> 967608 (-0.23%)
VClause: 9690 -> 9688 (-0.02%)
Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03%
PreVGPRs: 28290 -> 28288 (-0.01%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>
2021-12-03 13:41:07 +00:00
Rhys Perry 12294026d5 nir/algebraic: optimize Cyberpunk 2077's open-coded bitfieldReverse()
fossil-db (Sienna Cichlid):
Totals from 9 (0.01% of 128647) affected shaders:
CodeSize: 29900 -> 28640 (-4.21%)
Instrs: 5677 -> 5443 (-4.12%)
Latency: 96561 -> 95025 (-1.59%)
Copies: 571 -> 544 (-4.73%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13673>
2021-11-05 09:31:04 +00:00
Jason Ekstrand b71bdc3404 nir/algebraic: Add some opts for comparisons of comparisons
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>
2021-10-07 18:21:11 +00:00
Jason Ekstrand 7abf3955ca nir/algebraic: Add some boolean optimizations
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>
2021-10-07 18:21:11 +00:00
Jason Ekstrand c8b2be0b95 nir/algebraic: Lower fisfinite
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>
2021-10-07 18:21:11 +00:00
Ian Romanick cb28361642 nir/algebraic: Small optimizations for SpvOpFOrdNotEqual and SpvOpFUnordEqual
No shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 144380118 -> 143692823 (-0.5%)
SENDs in all programs: 6920822 -> 6920822 (+0.0%)
Loops in all programs: 38299 -> 38299 (+0.0%)
Cycles in all programs: 8434782176 -> 8423078994 (-0.1%)
Spills in all programs: 206830 -> 204469 (-1.1%)
Fills in all programs: 318737 -> 313660 (-1.6%)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>
2021-10-06 01:53:47 +00:00
Rhys Perry a1af902531 nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants
This allows for more MAD/FMA instructions to be created.

fossil-db (Sienna Cichlid):
Totals from 50134 (33.46% of 149839) affected shaders:
VGPRs: 2436536 -> 2436000 (-0.02%); split: -0.05%, +0.03%
SpillSGPRs: 13136 -> 13135 (-0.01%); split: -0.02%, +0.02%
CodeSize: 206621424 -> 206278292 (-0.17%); split: -0.23%, +0.07%
MaxWaves: 1116804 -> 1117448 (+0.06%); split: +0.07%, -0.01%
Instrs: 38977460 -> 38862886 (-0.29%); split: -0.33%, +0.04%
Latency: 832425389 -> 827432260 (-0.60%); split: -0.63%, +0.03%
InvThroughput: 184193457 -> 183563350 (-0.34%); split: -0.37%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7458>
2021-09-17 17:28:26 +00:00
Rhys Perry 41ecef7855 nir: add sdot_2x16 and udot_2x16 opcodes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:27 +00:00
Rhys Perry ae00f5af61 nir: separate lower_add_sat
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:27 +00:00
Samuel Pitoiset cff106c4b6 nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(fabs(b), -a)
and fmin(-fmax(b, a)) to fmin(-fabs(b), -a).

fossils-db (Sienna Cichlid):
Totals from 34 (0.02% of 150170) affected shaders:
CodeSize: 388540 -> 387748 (-0.20%)
Instrs: 74621 -> 74423 (-0.27%)
Latency: 1039407 -> 1039011 (-0.04%)
InvThroughput: 208364 -> 208150 (-0.10%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12519>
2021-08-25 07:18:24 +02:00
Ian Romanick a6db40605e nir/algebraic: Add some extract optimizations
These help quite a bit when vectored versions of SpvOpSDotKHR and
friends are emitted as packed versions and then lowered.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 839495efc6 nir/algebraic: Add lowering for dot_4x8 instructions
v2: Fix copy-and-paste bugs in lowering patterns.

v3: Add has_sudot_4x8 flag.  Requested by Rhys.

v4: Since the names of the opcodes changed from dp4 to dot_4x8, also
change the names of the lowering helpers.  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Ian Romanick 806cd2341c nir/algebraic: Basic patterns for dot_4x8
v2: Add and modify patterns to let constant folding do better.

v3: Remove '(is_not_zero)' from the patterns that try to combine
addends.  I honestly don't know why I had it there in the first place,
and nothing in my deep git logs could help clue me in.  Noticed by
Alyssa.  Remover patterns that detect open-coded udot_4x8.  Suggested by
Alyssa and Jason.  Add missing sudot_4x8 patterns.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Daniel Schürmann 2cf164feb9 nir/opt_algebraic: optimize flrp(fadd, fadd, x) only if fadd are used_once
Totals from 201 (0.13% of 150170) affected shaders: (GFX10.3)
VGPRs: 13880 -> 13856 (-0.17%)
CodeSize: 1517328 -> 1518124 (+0.05%); split: -0.04%, +0.10%
MaxWaves: 3184 -> 3192 (+0.25%)
Instrs: 285487 -> 285569 (+0.03%); split: -0.06%, +0.08%
Latency: 7774066 -> 7780877 (+0.09%); split: -0.10%, +0.19%
InvThroughput: 1936341 -> 1935287 (-0.05%); split: -0.07%, +0.02%
SClause: 11446 -> 11448 (+0.02%); split: -0.01%, +0.03%
Copies: 17500 -> 17506 (+0.03%); split: -0.51%, +0.55%
Branches: 8174 -> 8180 (+0.07%); split: -0.13%, +0.21%
PreVGPRs: 12507 -> 12427 (-0.64%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>
2021-08-24 16:10:30 +00:00
Samuel Pitoiset f4b858e746 Revert "nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)"
This is wrong for negative values.

This reverts commit 07cd30ca29.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12515>
2021-08-24 08:58:38 +00:00
Samuel Pitoiset 07cd30ca29 nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)
Found with Cyberpunk 2077.

fossils-db (GFX10.3):
Totals from 128 (2.34% of 5465) affected shaders:
CodeSize: 769720 -> 767656 (-0.27%); split: -0.27%, +0.00%
Instrs: 145748 -> 145229 (-0.36%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11604>
2021-08-23 17:53:38 +00:00
Ian Romanick f0a8a9816a nir: intel/compiler: Add and use nir_op_pack_32_4x8_split
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO.  This results
in a lot of shifts and MOVs.  When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.

v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick 89f639c0ca nir/algebraic: Remove spurious conversions from inside logic ops
Not only does this eliminate a bunch of unnecessary type converting
MOVs, but it can also enable some SWAR.  The
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does
something about like:

    c = a.x ^ b.x;
    d = a.y ^ b.y;
    e = a.z ^ b.z;

After this change, it looks more like:

    uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b);
    c = extract_u8(t, 0);
    d = extract_u8(t, 1);
    e = extract_u8(t, 2);

On Ice Lake, this results in:

SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick a147717a93 nir/algebraic: Optimize some extract forms resulting from 8-bit lowering
This eliminates some spurious, size-converting moves.  For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:

SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends

v2: Condition two of the patterns on !options->lower_extract_byte.
Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Rhys Perry ed70b256ce nir: add ffma creation helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry 4ec4d862c2 nir/algebraic: add is_used_once to dot product reassociation optimization
This improves register usage.

fossil-db (Sienna Cichlid, on top of !9805):
Totals from 4317 (2.88% of 149839) affected shaders:
VGPRs: 352592 -> 351704 (-0.25%); split: -1.48%, +1.23%
SpillSGPRs: 182 -> 248 (+36.26%)
CodeSize: 31601192 -> 31587624 (-0.04%); split: -0.09%, +0.04%
MaxWaves: 56964 -> 57298 (+0.59%); split: +2.48%, -1.90%
Instrs: 5973557 -> 5974122 (+0.01%); split: -0.05%, +0.06%
Latency: 72088175 -> 72253033 (+0.23%); split: -0.36%, +0.59%
InvThroughput: 14978160 -> 14798919 (-1.20%); split: -1.29%, +0.09%
VClause: 100994 -> 98645 (-2.33%); split: -3.05%, +0.73%
SClause: 278206 -> 276820 (-0.50%); split: -0.54%, +0.04%
Copies: 200264 -> 199556 (-0.35%); split: -1.17%, +0.82%
Branches: 86410 -> 85930 (-0.56%); split: -0.56%, +0.01%
PreSGPRs: 207355 -> 207759 (+0.19%); split: -0.00%, +0.20%
PreVGPRs: 314646 -> 310911 (-1.19%); split: -1.35%, +0.17%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Rhys Perry f95a16be72 nir/algebraic: reassociate add chains for more MAD/FMA-friendly code
fossil-db (GFX10.3):
Totals from 25866 (17.68% of 146267) affected shaders:
VGPRs: 1625456 -> 1644936 (+1.20%); split: -0.05%, +1.24%
SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03%
CodeSize: 161604460 -> 161458052 (-0.09%); split: -0.11%, +0.02%
MaxWaves: 454842 -> 452160 (-0.59%); split: +0.04%, -0.63%
Instrs: 30652596 -> 30456446 (-0.64%); split: -0.65%, +0.01%
Latency: 723098749 -> 722084247 (-0.14%); split: -0.21%, +0.07%
InvThroughput: 166023468 -> 165506875 (-0.31%); split: -0.36%, +0.05%

fossil-db (GFX10):
Totals from 25866 (17.68% of 146267) affected shaders:
VGPRs: 1593576 -> 1611976 (+1.15%); split: -0.09%, +1.25%
SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03%
CodeSize: 162294468 -> 162154456 (-0.09%); split: -0.11%, +0.02%
MaxWaves: 477448 -> 474166 (-0.69%); split: +0.10%, -0.79%
Instrs: 30820164 -> 30625805 (-0.63%); split: -0.65%, +0.02%
Latency: 723190249 -> 722273445 (-0.13%); split: -0.20%, +0.08%
InvThroughput: 163114872 -> 162582966 (-0.33%); split: -0.37%, +0.04%

fossil-db (GFX9):
Totals from 25866 (17.67% of 146401) affected shaders:
SGPRs: 2167808 -> 2169920 (+0.10%); split: -0.09%, +0.19%
VGPRs: 1649404 -> 1667592 (+1.10%); split: -0.43%, +1.53%
CodeSize: 161273556 -> 161281996 (+0.01%); split: -0.07%, +0.08%
MaxWaves: 114910 -> 113519 (-1.21%); split: +0.10%, -1.31%
Instrs: 31557180 -> 31403708 (-0.49%); split: -0.50%, +0.02%
Latency: 899594793 -> 898786283 (-0.09%); split: -0.19%, +0.10%
InvThroughput: 412265691 -> 411551698 (-0.17%); split: -0.28%, +0.11%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00