Commit Graph

329 Commits

Author SHA1 Message Date
Daniel Schürmann 0ef5f3552f nir: add strength reduction pattern for imod/irem with pow2 divisor.
Affected games are Detroit : Become Human and Doom : Eternal.

Totals from 6262 (4.54% of 138013) affected shaders (RAVEN):
SGPRs: 678472 -> 678640 (+0.02%)
VGPRs: 498288 -> 498360 (+0.01%)
CodeSize: 67064196 -> 65926000 (-1.70%)
MaxWaves: 19390 -> 19382 (-0.04%)
Instrs: 13175372 -> 12932517 (-1.84%)
Cycles: 1444043256 -> 1443022576 (-0.07%); split: -0.08%, +0.01%
VMEM: 929560 -> 908726 (-2.24%); split: +0.39%, -2.63%
SMEM: 406207 -> 400062 (-1.51%); split: +0.46%, -1.97%
VClause: 215168 -> 215031 (-0.06%)
SClause: 443312 -> 442324 (-0.22%); split: -0.25%, +0.03%
Copies: 1350793 -> 1344326 (-0.48%); split: -0.52%, +0.04%
Branches: 506432 -> 506370 (-0.01%); split: -0.02%, +0.01%
PreSGPRs: 619652 -> 619619 (-0.01%)
PreVGPRs: 473212 -> 473168 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/175>
2020-11-13 15:59:03 +01:00
Samuel Pitoiset 1aa1c1aec2 nir/algebraic: optimize bitfield_select(a, iand(a, b), c)
fossils-db (Vega10):
Totals from 242 (0.17% of 139517) affected shaders:
CodeSize: 853752 -> 852752 (-0.12%)
Instrs: 165944 -> 165694 (-0.15%)
Cycles: 855720 -> 854528 (-0.14%)
VMEM: 83772 -> 83668 (-0.12%); split: +0.13%, -0.25%
SMEM: 12360 -> 12316 (-0.36%)
SClause: 8222 -> 8238 (+0.19%)

Only helps Control.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7531>
2020-11-11 15:28:01 +01:00
Samuel Pitoiset 1c5271346a nir/algebraic: optimize bitfield_select(a, b, 0) to iand(a, b)
(src0 & src1) | (~src0 & src2) to (src0 & src1).

fossils-db (Polaris10):
Totals from 873 (0.63% of 138014) affected shaders:
SGPRs: 33781 -> 33733 (-0.14%)
VGPRs: 37704 -> 37520 (-0.49%); split: -0.51%, +0.02%
CodeSize: 3861460 -> 3853424 (-0.21%); split: -0.21%, +0.00%
MaxWaves: 5306 -> 5305 (-0.02%)
Instrs: 743798 -> 743486 (-0.04%); split: -0.04%, +0.00%
Cycles: 10962244 -> 10960936 (-0.01%); split: -0.01%, +0.00%
VMEM: 128309 -> 128350 (+0.03%); split: +0.33%, -0.30%
SMEM: 44797 -> 44113 (-1.53%); split: +0.02%, -1.54%
Copies: 71875 -> 71674 (-0.28%); split: -0.31%, +0.03%
PreSGPRs: 23484 -> 23479 (-0.02%)
PreVGPRs: 34582 -> 34529 (-0.15%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7479>
2020-11-09 19:51:27 +00:00
Samuel Pitoiset 77d6fda0f5 nir/algebraic: distribute imul(iadd(a, b), c) when b and c are constants
This distributes imul(iadd(a, b), c) to iadd(imul(a, c), b * c)
when both b and c are constants. This might allow some compiler
backends to create more MADs.

For ACO, this allows to combine more DS additions.

fossilds-db (Vega10):
Totals from 673 (0.49% of 136546) affected shaders:
VGPRs: 44548 -> 44516 (-0.07%); split: -0.11%, +0.04%
CodeSize: 8301552 -> 8286220 (-0.18%); split: -0.19%, +0.01%
MaxWaves: 2731 -> 2735 (+0.15%); split: +0.26%, -0.11%
Instrs: 1642684 -> 1638725 (-0.24%); split: -0.24%, +0.00%
Cycles: 20846156 -> 20793444 (-0.25%); split: -0.25%, +0.00%
VMEM: 108870 -> 108106 (-0.70%); split: +0.03%, -0.73%
SMEM: 35718 -> 35674 (-0.12%); split: +0.22%, -0.34%
VClause: 20603 -> 20622 (+0.09%); split: -0.01%, +0.10%
SClause: 48527 -> 48539 (+0.02%)
Copies: 156735 -> 156742 (+0.00%); split: -0.05%, +0.05%
PreSGPRs: 43169 -> 43166 (-0.01%); split: -0.02%, +0.02%
PreVGPRs: 41369 -> 41330 (-0.09%)

shader-db results on Intel:
Ice Lake
total instructions in shared programs: 20027588 -> 20027446 (<.01%)
instructions in affected programs: 71766 -> 71624 (-0.20%)
helped: 70
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.03 x̃: 1
helped stats (rel) min: 0.10% max: 2.50% x̄: 0.29% x̃: 0.15%
95% mean confidence interval for instructions value: -2.42 -1.64
95% mean confidence interval for instructions %-change: -0.38% -0.20%
Instructions are helped.

total cycles in shared programs: 977525222 -> 977494323 (<.01%)
cycles in affected programs: 8884593 -> 8853694 (-0.35%)
helped: 56
HURT: 16
helped stats (abs) min: 2 max: 7852 x̄: 681.29 x̃: 400
helped stats (rel) min: <.01% max: 19.84% x̄: 2.79% x̃: 0.41%
HURT stats (abs)   min: 2 max: 1212 x̄: 453.31 x̃: 120
HURT stats (rel)   min: 0.05% max: 1.09% x̄: 0.32% x̃: 0.11%
95% mean confidence interval for cycles value: -802.75 -55.56
95% mean confidence interval for cycles %-change: -3.19% -1.01%
Cycles are helped.

total sends in shared programs: 1032273 -> 1032272 (<.01%)
sends in affected programs: 41 -> 40 (-2.44%)
helped: 1
HURT: 0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7445>
2020-11-06 07:49:02 +00:00
Rhys Perry 89c4bba8bc nir/algebraic: better propagate constants up fadd chains
Make the optimization create more mad-friendly code if the order of the
fadd's operands is unlucky.

fossil-db (Navi):
Totals from 9259 (8.07% of 114665) affected shaders:
SGPRs: 615991 -> 616191 (+0.03%); split: -0.05%, +0.08%
VGPRs: 442184 -> 443568 (+0.31%); split: -0.10%, +0.41%
CodeSize: 32674876 -> 32625572 (-0.15%); split: -0.17%, +0.02%
MaxWaves: 108560 -> 108152 (-0.38%); split: +0.07%, -0.44%
Instrs: 6126473 -> 6120463 (-0.10%); split: -0.13%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
2020-11-03 14:56:00 +00:00
Ian Romanick 67956689bb nir: Rename replicated-result dot-product instructions
All these instructions replicate the result of a N-component dot-product
to a vec4.  Naming them fdot_replicatedN gives the impression that are
some sort of abstract dot-product that replicates the result to a vecN.
They also deviate from fdph_replicated... which nobody would reasonably
consider naming fdot_replicatedh.

Naming these opcodes fdotN_replicated more closely matches what they
are, and it matches the pattern of fdph_replicated.

I believe that the only reason these opcodes were named this way was
because it simplified the implementation of the binop_reduce function in
nir_opcodes.py.  I made some fairly simple changes to that function, and
I think the end result is ok.

The bulk of the changes come from the sed rename:

    sed --in-place -e 's/fdot_replicated\([234]\)/fdot\1_replicated/g' \
        $(grep -r 'fdot_replicated[234]' src/)

v2: Use a named parameter to binop_reduce instead of using
isinstance(name, str).  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5725>
2020-10-22 18:00:19 +00:00
Daniel Schürmann f503699e10 nir/opt_algebraic: optimize unpack_half_2x16_split_x(ushr, a, 16)
Same as extract_u16(a, 1)

Totals from 2021 (1.48% of 136546) affected shaders (RAVEN):
VGPRs: 129516 -> 129524 (+0.01%); split: -0.00%, +0.01%
CodeSize: 12485704 -> 12486600 (+0.01%); split: -0.00%, +0.01%
Instrs: 2435041 -> 2434999 (-0.00%); split: -0.00%, +0.00%
Cycles: 20952552 -> 20952624 (+0.00%); split: -0.00%, +0.00%
VMEM: 374492 -> 374212 (-0.07%); split: +0.01%, -0.08%
SMEM: 123309 -> 123291 (-0.01%); split: +0.00%, -0.02%
VClause: 64156 -> 64164 (+0.01%)
Copies: 191620 -> 191616 (-0.00%); split: -0.03%, +0.03%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>
2020-10-14 15:31:38 +00:00
Jose Maria Casanova Crespo e7127b3468 nir/algebraic: optimize iand/ior of (n)eq zero when umax/umin not available
Before 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero") this
optimization didn't need the use of umax/umin. VC4 HW supports only signed
integer max/min operations.

lower_umin and lower_umax are added to allow enabling previous optimizations
behaviour for this cases.

Fixes: 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7083>
2020-10-10 13:16:37 +02:00
Marek Olšák 1e7d82c881 nir/algebraic: always lower idiv to shifts if bitops are allowed
why would you want anything else

The only platform significantly affected by this is Intel where `lower_idiv`
is not set today but neither is `lower_bitops`.  There it seems to still be
a boon over-all.

Shader-db results on Ice Lake:

    total instructions in shared programs: 19719051 -> 19735766 (0.08%)
    instructions in affected programs: 106992 -> 123707 (15.62%)
    helped: 0
    HURT: 445
    HURT stats (abs)   min: 3 max: 295 x̄: 37.56 x̃: 44
    HURT stats (rel)   min: 0.16% max: 33.33% x̄: 19.60% x̃: 19.38%
    95% mean confidence interval for instructions value: 33.60 41.53
    95% mean confidence interval for instructions %-change: 18.97% 20.23%
    Instructions are HURT.

    total loops in shared programs: 5973 -> 5973 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 489405810 -> 486917482 (-0.51%)
    cycles in affected programs: 4759097 -> 2270769 (-52.29%)
    helped: 406
    HURT: 34
    helped stats (abs) min: 2 max: 64661 x̄: 6291.95 x̃: 3126
    helped stats (rel) min: 0.02% max: 79.42% x̄: 43.32% x̃: 55.83%
    HURT stats (abs)   min: 2 max: 29376 x̄: 1947.12 x̃: 30
    HURT stats (rel)   min: 0.04% max: 23.82% x̄: 4.66% x̃: 1.33%
    95% mean confidence interval for cycles value: -6753.06 -4557.52
    95% mean confidence interval for cycles %-change: -42.60% -36.63%
    Cycles are helped.

    total spills in shared programs: 12481 -> 12482 (<.01%)
    spills in affected programs: 47 -> 48 (2.13%)
    helped: 0
    HURT: 1

    total fills in shared programs: 12816 -> 12819 (0.02%)
    fills in affected programs: 71 -> 74 (4.23%)
    helped: 0
    HURT: 1

    total sends in shared programs: 1010124 -> 1010124 (0.00%)
    sends in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    LOST:   1
    GAINED: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6963>
2020-10-07 10:50:53 -04:00
Kenneth Graunke 140f53e646 Revert "nir: replace lower_ffma and fuse_ffma with has_ffma"
This reverts commit 939ddf3f67.

Intel has a separate pass for fusing FFMAs selectively.  We split
these flags in commit 1b72c31e1f and
the reasoning still stands.  The patch being reverted was just a
cleanup, so there should be no issue with reverting it.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6849>
2020-09-24 13:11:50 -07:00
Marek Olšák 939ddf3f67 nir: replace lower_ffma and fuse_ffma with has_ffma
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>
2020-09-24 12:29:11 +00:00
Marek Olšák 771aad3027 nir: split lower_ffma into lower_ffma16/32/64
AMD wants different behavior for each bit size

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>
2020-09-24 12:29:11 +00:00
Marek Olšák 21174dedec nir: split fuse_ffma into fuse_ffma16/32/64
AMD wants different behavior for each bit size

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>
2020-09-24 12:29:11 +00:00
Marek Olšák 57bf4c2028 nir,radeonsi: move ffma fusing to late optimizations for better codegen
The freedreno trace changes were suggested by Rob Clark.

ALU performance is higher, because ffma is used more often, but so is
register usage, because trinary opcodes (such as ffma) usually need
at least 3 live registers.

54793 shaders in 33659 tests
Totals:
SGPRS: 2639746 -> 2642938 (0.12 %)
VGPRS: 1534120 -> 1536392 (0.15 %)
Spilled SGPRs: 3541 -> 3618 (2.17 %)
Spilled VGPRs: 33 -> 44 (33.33 %)
Scratch size: 292 -> 312 (6.85 %) dwords per thread
Code Size: 55639836 -> 55620116 (-0.04 %) bytes
Max Waves: 964785 -> 963977 (-0.08 %)

Totals from affected shaders:
SGPRS: 1105800 -> 1108992 (0.29 %)
VGPRS: 635292 -> 637564 (0.36 %)
Spilled SGPRs: 3193 -> 3270 (2.41 %)
Spilled VGPRs: 33 -> 44 (33.33 %)
Scratch size: 36 -> 56 (55.56 %) dwords per thread
Code Size: 31568708 -> 31548988 (-0.06 %) bytes
Max Waves: 319991 -> 319183 (-0.25 %)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6596>
2020-09-16 02:39:02 +00:00
Italo Nicola 00914e2179 nir/algebraic: fold some nested comparisons with ball and bany
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6604>
2020-09-14 17:47:39 +00:00
Marek Olšák 50d335804f nir/algebraic: add late optimizations that optimize out mediump conversions (v3)
v2: move *2*mp patterns to the end of late_optimizations
v3: remove ftrunc from the optimizations to fix:
    dEQP-GLES3.functional.shaders.builtin_functions.common.modf.vec2_lowp_vertex

Reviewed-by: Rob Clark <robdclark@chromium.org> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>
2020-09-10 23:35:13 +00:00
Marek Olšák b86305bb57 nir/algebraic: collapse conversion opcodes (many patterns)
mediump inserts a lot of conversions. This cleans up the IR.
All other combinations are covered too.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>
2020-09-10 23:35:13 +00:00
Marek Olšák cdd498bbe8 nir: add new mediump opcodes f2[ui]mp, i2fmp, u2fmp
Algebraic optimizations will select them.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>
2020-09-10 23:35:13 +00:00
Marek Olšák 3d3df8dbff nir: remove redundant opcode u2ump
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>
2020-09-10 23:35:13 +00:00
Marek Olšák 26fc5e1f4a nir/algebraic: expand existing 32-bit patterns to all bit sizes using loops
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>
2020-09-10 23:35:13 +00:00
Marek Olšák 3c8934a644 nir/algebraic: add flrp patterns for 16 and 64 bits
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6283>
2020-09-10 23:35:13 +00:00
Marek Olšák a7ece63de9 nir/algebraic: add 16-bit versions of a few 32-bit patterns
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>
2020-09-04 17:06:22 +00:00
Marek Olšák 00b28a50b2 nir/algebraic: trivially enable existing 32-bit patterns for all bit sizes
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>
2020-09-04 17:06:22 +00:00
Eric Anholt 479d9c97eb nir: Add simplistic lowering for bany_equal/ball_inequal.
It would be nice if we could do swizzling of an expression on the
replacement side so that we could have a single ieq/ine of the vector
after CSE.  However, if you do want vector operations, nir_opt_vectorize()
does just fine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6567>
2020-09-02 09:58:44 -07:00
Samuel Pitoiset bc123c396a nir/algebraic: mark some optimizations with fsat(NaN) as inexact
If a is Nan, fsat(NaN) is expected to be 0 and some optimizations
should be marked as inexact.

Fixes a GPU hang with Death Stranding and RADV/ACO (RADV/LLVM
isn't affected because it lowers fsat).

No fossils-db change.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3368
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6519>
2020-09-01 11:20:03 +02:00
Jesse Natalie d91f85f16e nir: Remove 32bit restriction for uadd_carry optimization
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6313>
2020-08-27 16:57:42 +00:00
Daniel Schürmann a79dad950b nir,amd: remove trinary_minmax opcodes
These consist of the variations nir_op_{i|u|f}{min|max|med}3 which are either
lowered in the backend (LLVM) anyway or can be recombined by the backend (ACO).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6421>
2020-08-24 20:56:11 +00:00
Erik Faye-Lund 5e841e8b4f nir: add iabs-lowering code
Microsoft's DXIL is based on LLVM, which doesn't have an integer ABS
opcode, but instead needs it lowered to NEG + MAX. We need to do this
with an option, to prevent an already existing optimization rule from
undoing this.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5211>
2020-08-24 10:02:47 +00:00
Karol Herbst e5899c1e88 nir: rename nir_op_fne to nir_op_fneu
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
2020-08-21 17:26:21 +00:00
Boris Brezillon 18e464cfc0 compiler/nir: Add new flags to lower pack/unpack split instructions
And add new rules to do this lowering in nir_opt_algebraic.py.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6309>
2020-08-17 19:46:10 +00:00
Jesse Natalie a1ed83fddd nir: Optimize mask+downcast to just downcast
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6330>
2020-08-17 14:36:18 +00:00
Daniel Schürmann 5f79e4e69a nir/algebraic: fold some nested bcsel
Totals from 14266 (10.62% of 134368) affected shaders (Polaris):
SGPRs: 761756 -> 762732 (+0.13%); split: -0.00%, +0.13%
VGPRs: 430392 -> 430924 (+0.12%); split: -0.05%, +0.17%
SpillSGPRs: 4652 -> 4628 (-0.52%); split: -0.60%, +0.09%
CodeSize: 30133000 -> 29949780 (-0.61%); split: -0.66%, +0.05%
MaxWaves: 102122 -> 102111 (-0.01%); split: +0.00%, -0.01%
Instrs: 5845085 -> 5841668 (-0.06%); split: -0.08%, +0.03%
Cycles: 69033140 -> 68889188 (-0.21%); split: -0.22%, +0.01%
VMEM: 8479021 -> 8474978 (-0.05%); split: +0.03%, -0.08%
SMEM: 831437 -> 830464 (-0.12%); split: +0.06%, -0.18%
VClause: 105411 -> 105410 (-0.00%); split: -0.01%, +0.01%
SClause: 327727 -> 327780 (+0.02%); split: -0.00%, +0.02%
Copies: 372704 -> 373306 (+0.16%); split: -0.16%, +0.32%
Branches: 112260 -> 112269 (+0.01%); split: -0.00%, +0.01%
PreSGPRs: 433308 -> 433631 (+0.07%); split: -0.01%, +0.09%
PreVGPRs: 397888 -> 397905 (+0.00%); split: -0.01%, +0.01%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:46 +00:00
Daniel Schürmann 27244662f2 nir/algebraic: propagate b2i out of ior/iand
Totals from 761 (0.57% of 134368) affected shaders (Polaris):
SGPRs: 29496 -> 29488 (-0.03%)
SpillSGPRs: 41 -> 43 (+4.88%)
CodeSize: 1922036 -> 1882408 (-2.06%); split: -2.08%, +0.02%
Instrs: 366051 -> 360362 (-1.55%); split: -1.57%, +0.02%
Cycles: 7692516 -> 7661216 (-0.41%); split: -0.41%, +0.01%
VMEM: 365175 -> 365172 (-0.00%)
VClause: 15324 -> 15322 (-0.01%)
SClause: 9825 -> 9824 (-0.01%); split: -0.02%, +0.01%
Copies: 41216 -> 41294 (+0.19%); split: -0.01%, +0.20%
Branches: 7020 -> 7033 (+0.19%)
PreSGPRs: 22103 -> 22106 (+0.01%)
PreVGPRs: 26518 -> 26515 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:46 +00:00
Daniel Schürmann baee5a9812 nir/algebraic: add distributive rules for ior/iand
Totals from 581 (0.43% of 134368) affected shaders (Polaris):
CodeSize: 1389560 -> 1386488 (-0.22%)
Instrs: 264488 -> 263984 (-0.19%)
Cycles: 1057952 -> 1055936 (-0.19%)
VMEM: 296016 -> 291613 (-1.49%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:46 +00:00
Daniel Schürmann 70d3efeb88 nir/algebraic: optimize (a < 0.0) ? -a : a -> fabs(a)
Totals from affected shaders: (VEGA)
SGPRS: 13920 -> 13920 (0.00 %)
VGPRS: 10252 -> 10252 (0.00 %)
Spilled SGPRs: 62 -> 62 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 587648 -> 587224 (-0.07 %) bytes
LDS: 5 -> 5 (0.00 %) blocks
Max Waves: 1489 -> 1489 (0.00 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:46 +00:00
Daniel Schürmann 9d22c5ed71 nir/algebraic: optimize fmul(x, bcsel(c, -1.0, 1.0)) -> bcsel(c, -x, x)
Totals from affected shaders: (VEGA)
SGPRS: 545712 -> 545712 (0.00 %)
VGPRS: 413092 -> 413116 (0.01 %)
Spilled SGPRs: 10616 -> 10616 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 37031684 -> 36984248 (-0.13 %) bytes
LDS: 427 -> 427 (0.00 %) blocks
Max Waves: 54350 -> 54340 (-0.02 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:46 +00:00
Daniel Schürmann 56ec814b56 nir/algebraic: add some more unop + bcsel optimizations
Totals from affected shaders: (VEGA)
SGPRS: 284392 -> 284400 (0.00 %)
VGPRS: 261080 -> 261076 (-0.00 %)
Spilled SGPRs: 105 -> 105 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 24698596 -> 24277788 (-1.70 %) bytes
LDS: 196 -> 196 (0.00 %) blocks
Max Waves: 10101 -> 10105 (0.04 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:45 +00:00
Daniel Schürmann 2fca183910 nir/algebraic: add optimizations for fsign/isign
This just reverts fsign/isign lowering.

Totals from affected shaders:
SGPRS: 257496 -> 256672 (-0.32 %)
VGPRS: 181800 -> 178864 (-1.61 %)
Spilled SGPRs: 105 -> 105 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 11355852 -> 11141840 (-1.88 %) bytes
LDS: 3789 -> 3789 (0.00 %) blocks
Max Waves: 30453 -> 30951 (1.64 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:45 +00:00
Daniel Schürmann 8e1b75b330 nir/algebraic: optimize iand/ior of (n)eq zero
Found in some Detroit: Become Human shaders.

Totals from affected shaders:
SGPRS: 700256 -> 700256 (0.00 %)
VGPRS: 507208 -> 507212 (0.00 %)
Spilled SGPRs: 142531 -> 142531 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 76404616 -> 76301768 (-0.13 %) bytes
LDS: 43 -> 43 (0.00 %) blocks
Max Waves: 21438 -> 21438 (0.00 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:45 +00:00
Daniel Schürmann de0ebaf09d nir/algebraic: optimize bcsel(a, 0, 1) to b2i
This avoids combination with other bcsel operations,
and as b2i is often a no-op (when used for iadd and such),
the resulting pattern is preferable.

Totals from affected shaders: (VEGA)
SGPRS: 598448 -> 598448 (0.00 %)
VGPRS: 457940 -> 457352 (-0.13 %)
Spilled SGPRs: 127154 -> 127154 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 64836352 -> 64802728 (-0.05 %) bytes
LDS: 781 -> 781 (0.00 %) blocks
Max Waves: 22931 -> 22931 (0.00 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4830>
2020-07-20 15:56:45 +00:00
Ian Romanick 8591adea38 nir/algebraic: Don't distrubte absolute-value into dot-products
Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.

Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions.  In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on...  I'm really
shocked that there were no test regressions!

All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.

total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs)   min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel)   min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>
2020-07-02 14:05:33 -07:00
Alyssa Rosenzweig 54d7907c27 nir: Propagate *2*16 conversions into vectors
If we have code like:

   ('f2f16', ('vec2', ('f2f32', 'a@16'), '#b@32'))

We would like to eliminate the conversions, but the existing rules can't
see into the the (heterogenous) vector. So instead of trying to
eliminate in one pass, we add opts to propagate the f2f16 into the
vector. Even if nothing further happens, this is often a win since then
the created vector is smaller (half2 instead of float2). Hence the above
gets transformed to

   ('vec2', ('f2f16', ('f2f32', 'a@16')), ('f2f16', '#b@32'))

Then the existing f2f16(f2f32) rule will kick in for the first component
and constant folding will for the second and we'll be left with

   ('vec2', 'a@16', '#b@16')

...eliminating all conversions.

v2: Predicate on !options->vectorize_vec2_16bit. As discussed, this
optimization helps greatly on true vector architectures (like Midgard)
but wreaks havoc on more modern SIMD-within-a-register architectures
(like Bifrost and modern AMD). So let's predicate on that.

v3: Extend for integers as well and add a comment explaining the
transforms.

Results on Midgard (unfortunately a true SIMD architecture):

total instructions in shared programs: 51359 -> 50963 (-0.77%)
instructions in affected programs: 4523 -> 4127 (-8.76%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 86 x̄: 7.47 x̃: 6
helped stats (rel) min: 1.71% max: 28.00% x̄: 9.66% x̃: 7.34%
95% mean confidence interval for instructions value: -10.58 -4.36
95% mean confidence interval for instructions %-change: -11.45% -7.88%
Instructions are helped.

total bundles in shared programs: 25825 -> 25670 (-0.60%)
bundles in affected programs: 2057 -> 1902 (-7.54%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 26 x̄: 2.92 x̃: 2
helped stats (rel) min: 2.86% max: 30.00% x̄: 8.64% x̃: 8.33%
95% mean confidence interval for bundles value: -3.93 -1.92
95% mean confidence interval for bundles %-change: -10.69% -6.59%
Bundles are helped.

total quadwords in shared programs: 41359 -> 41055 (-0.74%)
quadwords in affected programs: 3801 -> 3497 (-8.00%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 57 x̄: 5.33 x̃: 4
helped stats (rel) min: 1.92% max: 21.05% x̄: 8.22% x̃: 6.67%
95% mean confidence interval for quadwords value: -7.35 -3.32
95% mean confidence interval for quadwords %-change: -9.54% -6.90%
Quadwords are helped.

total registers in shared programs: 3849 -> 3807 (-1.09%)
registers in affected programs: 167 -> 125 (-25.15%)
helped: 32
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.34 x̃: 1
helped stats (rel) min: 20.00% max: 50.00% x̄: 26.35% x̃: 20.00%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for registers value: -1.54 -1.00
95% mean confidence interval for registers %-change: -29.41% -20.69%
Registers are helped.

total threads in shared programs: 2471 -> 2520 (1.98%)
threads in affected programs: 49 -> 98 (100.00%)
helped: 25
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.96 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.88 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].

total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4999>
2020-06-30 16:21:33 +00:00
Boris Brezillon cff418cc4c nir: Add new rules to optimize NOOP pack/unpack pairs
nir_load_store_vectorize_test.ssbo_load_adjacent_32_32_64_64 expectations
need to be fixed accordingly.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5589>
2020-06-29 09:18:26 +02:00
Marek Olšák f798513f91 nir: add i2imp and u2ump opcodes for conversions to mediump
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5002>
2020-06-02 20:01:18 +00:00
Alyssa Rosenzweig f3310cb3e1 nir: Fold f2f16(b2f32(x)) to b2f16(x)
By definition.

This reduces register pressure on freedreno so that the noubo expected
failure goes away.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5002>
2020-06-02 20:01:18 +00:00
Ian Romanick 412e29c277 nir/algebraic: Eliminate useless extract before unpack
The shader helped for spills and fills is the big compute shader in Dirt
Showdown.  One of the shaders hurt for spills and fills on Broadwell is
the big compute shader in Bioshock Infinite, but combined with the
previous commit, it's still an impovement.

Tiger Lake
total instructions in shared programs: 21833218 -> 21832449 (<.01%)
instructions in affected programs: 66104 -> 65335 (-1.16%)
helped: 106
HURT: 14
helped stats (abs) min: 1 max: 67 x̄: 7.87 x̃: 5
helped stats (rel) min: 0.19% max: 5.76% x̄: 1.27% x̃: 0.95%
HURT stats (abs)   min: 1 max: 14 x̄: 4.64 x̃: 1
HURT stats (rel)   min: 0.19% max: 4.12% x̄: 1.41% x̃: 0.19%
95% mean confidence interval for instructions value: -8.51 -4.30
95% mean confidence interval for instructions %-change: -1.23% -0.69%
Instructions are helped.

total cycles in shared programs: 506180109 -> 506196314 (<.01%)
cycles in affected programs: 1671429 -> 1687634 (0.97%)
helped: 37
HURT: 84
helped stats (abs) min: 1 max: 490 x̄: 73.27 x̃: 24
helped stats (rel) min: 0.02% max: 7.98% x̄: 1.25% x̃: 0.41%
HURT stats (abs)   min: 1 max: 5000 x̄: 225.19 x̃: 8
HURT stats (rel)   min: 0.03% max: 10.22% x̄: 1.22% x̃: 0.42%
95% mean confidence interval for cycles value: 2.85 265.00
95% mean confidence interval for cycles %-change: 0.04% 0.88%
Cycles are HURT.

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19961317 -> 19960543 (<.01%)
instructions in affected programs: 30268 -> 29494 (-2.56%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 142 x̄: 19.85 x̃: 7
helped stats (rel) min: 0.19% max: 7.87% x̄: 2.33% x̃: 2.31%
95% mean confidence interval for instructions value: -29.46 -10.23
95% mean confidence interval for instructions %-change: -2.95% -1.71%
Instructions are helped.

total cycles in shared programs: 498863755 -> 498865843 (<.01%)
cycles in affected programs: 1831136 -> 1833224 (0.11%)
helped: 57
HURT: 65
helped stats (abs) min: 1 max: 1400 x̄: 128.93 x̃: 25
helped stats (rel) min: 0.05% max: 3.49% x̄: 0.89% x̃: 0.71%
HURT stats (abs)   min: 1 max: 1887 x̄: 145.18 x̃: 15
HURT stats (rel)   min: 0.02% max: 9.88% x̄: 1.83% x̃: 0.73%
95% mean confidence interval for cycles value: -58.30 92.53
95% mean confidence interval for cycles %-change: 0.16% 0.97%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8774 -> 8773 (-0.01%)
spills in affected programs: 20 -> 19 (-5.00%)
helped: 1
HURT: 0

total fills in shared programs: 9496 -> 9494 (-0.02%)
fills in affected programs: 40 -> 38 (-5.00%)
helped: 1
HURT: 0

Broadwell
total instructions in shared programs: 17859373 -> 17858548 (<.01%)
instructions in affected programs: 38452 -> 37627 (-2.15%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 143 x̄: 26.61 x̃: 10
helped stats (rel) min: 0.19% max: 7.87% x̄: 2.57% x̃: 2.69%
95% mean confidence interval for instructions value: -39.79 -13.44
95% mean confidence interval for instructions %-change: -3.25% -1.89%
Instructions are helped.

total cycles in shared programs: 525858109 -> 525869236 (<.01%)
cycles in affected programs: 2058597 -> 2069724 (0.54%)
helped: 44
HURT: 75
helped stats (abs) min: 2 max: 1330 x̄: 187.84 x̃: 23
helped stats (rel) min: 0.04% max: 31.31% x̄: 2.13% x̃: 0.85%
HURT stats (abs)   min: 1 max: 3915 x̄: 258.56 x̃: 47
HURT stats (rel)   min: 0.02% max: 10.53% x̄: 2.81% x̃: 2.21%
95% mean confidence interval for cycles value: -26.06 213.07
95% mean confidence interval for cycles %-change: 0.19% 1.78%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 25744 -> 25730 (-0.05%)
spills in affected programs: 1578 -> 1564 (-0.89%)
helped: 4
HURT: 2

total fills in shared programs: 31710 -> 31689 (-0.07%)
fills in affected programs: 4346 -> 4325 (-0.48%)
helped: 3
HURT: 3

Haswell
total instructions in shared programs: 16228399 -> 16227783 (<.01%)
instructions in affected programs: 22201 -> 21585 (-2.77%)
helped: 27
HURT: 0
helped stats (abs) min: 1 max: 68 x̄: 22.81 x̃: 11
helped stats (rel) min: 0.19% max: 7.87% x̄: 2.92% x̃: 2.86%
95% mean confidence interval for instructions value: -31.96 -13.66
95% mean confidence interval for instructions %-change: -3.68% -2.15%
Instructions are helped.

total cycles in shared programs: 538613967 -> 538701354 (0.02%)
cycles in affected programs: 1653044 -> 1740431 (5.29%)
helped: 36
HURT: 81
helped stats (abs) min: 2 max: 708 x̄: 104.50 x̃: 17
helped stats (rel) min: <.01% max: 15.01% x̄: 1.67% x̃: 0.65%
HURT stats (abs)   min: 1 max: 30100 x̄: 1125.30 x̃: 304
HURT stats (rel)   min: 0.02% max: 16.21% x̄: 8.98% x̃: 11.60%
95% mean confidence interval for cycles value: 23.78 1470.01
95% mean confidence interval for cycles %-change: 4.29% 7.12%
Cycles are HURT.

total spills in shared programs: 23418 -> 23409 (-0.04%)
spills in affected programs: 177 -> 168 (-5.08%)
helped: 2
HURT: 0

total fills in shared programs: 25919 -> 25896 (-0.09%)
fills in affected programs: 568 -> 545 (-4.05%)
helped: 3
HURT: 0

Ivy Bridge
total instructions in shared programs: 15265983 -> 15265759 (<.01%)
instructions in affected programs: 8418 -> 8194 (-2.66%)
helped: 5
HURT: 0
helped stats (abs) min: 18 max: 99 x̄: 44.80 x̃: 26
helped stats (rel) min: 1.74% max: 4.26% x̄: 3.12% x̃: 3.00%
95% mean confidence interval for instructions value: -86.29 -3.31
95% mean confidence interval for instructions %-change: -4.43% -1.81%
Instructions are helped.

total cycles in shared programs: 422930336 -> 422929589 (<.01%)
cycles in affected programs: 59347 -> 58600 (-1.26%)
helped: 3
HURT: 2
helped stats (abs) min: 72 max: 1060 x̄: 433.33 x̃: 168
helped stats (rel) min: 1.14% max: 3.48% x̄: 2.23% x̃: 2.06%
HURT stats (abs)   min: 265 max: 288 x̄: 276.50 x̃: 276
HURT stats (rel)   min: 4.79% max: 5.64% x̄: 5.22% x̃: 5.22%
95% mean confidence interval for cycles value: -829.08 530.28
95% mean confidence interval for cycles %-change: -4.43% 5.93%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4953 -> 4946 (-0.14%)
spills in affected programs: 344 -> 337 (-2.03%)
helped: 2
HURT: 0

total fills in shared programs: 5548 -> 5521 (-0.49%)
fills in affected programs: 838 -> 811 (-3.22%)
helped: 2
HURT: 0

No shader-db changes on any earlier Intel platform.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4515>
2020-05-11 12:07:01 -07:00
Ian Romanick bc0bbb8f0b nir/algebraic: Add some half packing optimizations for pack_half_2x16_split
Like 1f72857739 ("nir/algebraic: add some half packing optimizations"),
but for the pack_half_2x16_split variant.

The shader helped for spills and fills is the big compute shader in
Bioshock Infinite.

Tiger Lake
total instructions in shared programs: 21834539 -> 21833218 (<.01%)
instructions in affected programs: 60119 -> 58798 (-2.20%)
helped: 105
HURT: 0
helped stats (abs) min: 5 max: 50 x̄: 12.58 x̃: 9
helped stats (rel) min: 0.86% max: 26.46% x̄: 2.58% x̃: 1.70%
95% mean confidence interval for instructions value: -14.35 -10.81
95% mean confidence interval for instructions %-change: -3.20% -1.97%
Instructions are helped.

total cycles in shared programs: 506215169 -> 506180109 (<.01%)
cycles in affected programs: 1445088 -> 1410028 (-2.43%)
helped: 97
HURT: 8
helped stats (abs) min: 1 max: 16882 x̄: 387.76 x̃: 26
helped stats (rel) min: 0.05% max: 18.31% x̄: 1.77% x̃: 1.34%
HURT stats (abs)   min: 21 max: 635 x̄: 319.12 x̃: 212
HURT stats (rel)   min: 0.39% max: 20.08% x̄: 8.96% x̃: 4.46%
95% mean confidence interval for cycles value: -782.96 115.15
95% mean confidence interval for cycles %-change: -1.74% -0.16%
Inconclusive result (value mean confidence interval includes 0).

Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown)
total instructions in shared programs: 19962974 -> 19961317 (<.01%)
instructions in affected programs: 63471 -> 61814 (-2.61%)
helped: 105
HURT: 0
helped stats (abs) min: 6 max: 82 x̄: 15.78 x̃: 11
helped stats (rel) min: 1.11% max: 28.65% x̄: 3.17% x̃: 2.16%
95% mean confidence interval for instructions value: -18.38 -13.18
95% mean confidence interval for instructions %-change: -3.86% -2.48%
Instructions are helped.

total cycles in shared programs: 498908953 -> 498863755 (<.01%)
cycles in affected programs: 1566998 -> 1521800 (-2.88%)
helped: 89
HURT: 15
helped stats (abs) min: 2 max: 17502 x̄: 532.19 x̃: 69
helped stats (rel) min: 0.07% max: 18.54% x̄: 4.71% x̃: 3.12%
HURT stats (abs)   min: 3 max: 661 x̄: 144.47 x̃: 16
HURT stats (rel)   min: 0.14% max: 20.57% x̄: 4.29% x̃: 0.30%
95% mean confidence interval for cycles value: -903.93 34.74
95% mean confidence interval for cycles %-change: -4.50% -2.32%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8776 -> 8774 (-0.02%)
spills in affected programs: 25 -> 23 (-8.00%)
helped: 1
HURT: 0

total fills in shared programs: 9500 -> 9496 (-0.04%)
fills in affected programs: 46 -> 42 (-8.70%)
helped: 1
HURT: 0

Haswell
total instructions in shared programs: 16229912 -> 16228399 (<.01%)
instructions in affected programs: 61257 -> 59744 (-2.47%)
helped: 105
HURT: 0
helped stats (abs) min: 6 max: 51 x̄: 14.41 x̃: 11
helped stats (rel) min: 0.77% max: 28.65% x̄: 3.08% x̃: 2.15%
95% mean confidence interval for instructions value: -16.14 -12.68
95% mean confidence interval for instructions %-change: -3.77% -2.40%
Instructions are helped.

total cycles in shared programs: 538654481 -> 538613967 (<.01%)
cycles in affected programs: 1448966 -> 1408452 (-2.80%)
helped: 58
HURT: 47
helped stats (abs) min: 9 max: 22604 x̄: 957.00 x̃: 74
helped stats (rel) min: 0.40% max: 18.81% x̄: 6.22% x̃: 3.03%
HURT stats (abs)   min: 5 max: 3720 x̄: 318.98 x̃: 49
HURT stats (rel)   min: 0.20% max: 34.50% x̄: 5.05% x̃: 2.12%
95% mean confidence interval for cycles value: -999.84 228.14
95% mean confidence interval for cycles %-change: -2.86% 0.51%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 15266086 -> 15265983 (<.01%)
instructions in affected programs: 7272 -> 7169 (-1.42%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 41 x̄: 34.33 x̃: 41
helped stats (rel) min: 0.66% max: 5.43% x̄: 2.44% x̃: 1.23%

total cycles in shared programs: 422930883 -> 422930336 (<.01%)
cycles in affected programs: 49259 -> 48712 (-1.11%)
helped: 3
HURT: 0
helped stats (abs) min: 106 max: 221 x̄: 182.33 x̃: 220
helped stats (rel) min: 0.71% max: 5.95% x̄: 2.46% x̃: 0.72%

No changes on any earilier Intel platforms.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4515>
2020-05-11 12:07:01 -07:00
Ian Romanick a2bf41ec65 nir/algebraic: Optimize ushr of pack_half, not ishr
When a = -1.0, pack_half_2x16(vec2(0x0000, 0xBC00)) will produce
0xBC000000.  The ishr will produce 0xFFFFBC00.  The replacement
pack_half_2x16(vec2(0xBC00, 0x0000)) will produce 0x0000BC00.

Fixes: 1f72857739 ("nir/algebraic: add some half packing optimizations")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4515>
2020-05-11 12:07:01 -07:00
Ian Romanick 3b6449d453 nir/algebraic: Optimize some bfe patterns
v2: Use -x instead of 32-x in shift counts.

Tiger Lake
total instructions in shared programs: 17597691 -> 17597405 (<.01%)
instructions in affected programs: 224557 -> 224271 (-0.13%)
helped: 74
HURT: 17
helped stats (abs) min: 1 max: 71 x̄: 14.36 x̃: 7
helped stats (rel) min: 0.08% max: 1.80% x̄: 0.50% x̃: 0.37%
HURT stats (abs)   min: 1 max: 141 x̄: 45.71 x̃: 40
HURT stats (rel)   min: 0.03% max: 3.55% x̄: 1.20% x̃: 1.14%
95% mean confidence interval for instructions value: -10.53 4.24
95% mean confidence interval for instructions %-change: -0.38% 0.01%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 333595656 -> 333180770 (-0.12%)
cycles in affected programs: 70056467 -> 69641581 (-0.59%)
helped: 91
HURT: 4
helped stats (abs) min: 1 max: 25174 x̄: 4571.40 x̃: 400
helped stats (rel) min: <.01% max: 2.23% x̄: 0.40% x̃: 0.21%
HURT stats (abs)   min: 1 max: 370 x̄: 277.75 x̃: 370
HURT stats (rel)   min: 0.01% max: 0.04% x̄: 0.04% x̃: 0.04%
95% mean confidence interval for cycles value: -5981.55 -2752.89
95% mean confidence interval for cycles %-change: -0.48% -0.29%
Cycles are helped.

Ice Lake, Skylake, Broadwell, and Haswell had similar results. (Ice Lake shown)
total instructions in shared programs: 16117204 -> 16116723 (<.01%)
instructions in affected programs: 207109 -> 206628 (-0.23%)
helped: 100
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 4.81 x̃: 7
helped stats (rel) min: 0.10% max: 1.58% x̄: 0.23% x̃: 0.20%
95% mean confidence interval for instructions value: -5.51 -4.11
95% mean confidence interval for instructions %-change: -0.27% -0.19%
Instructions are helped.

total cycles in shared programs: 330487341 -> 330082421 (-0.12%)
cycles in affected programs: 68037050 -> 67632130 (-0.60%)
helped: 89
HURT: 7
helped stats (abs) min: 2 max: 24610 x̄: 4567.07 x̃: 400
helped stats (rel) min: <.01% max: 1.52% x̄: 0.39% x̃: 0.22%
HURT stats (abs)   min: 1 max: 370 x̄: 221.29 x̃: 170
HURT stats (rel)   min: 0.01% max: 1.66% x̄: 0.58% x̃: 0.04%
95% mean confidence interval for cycles value: -5780.79 -2655.05
95% mean confidence interval for cycles %-change: -0.42% -0.22%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 11873641 -> 11873137 (<.01%)
instructions in affected programs: 147464 -> 146960 (-0.34%)
helped: 54
HURT: 0
helped stats (abs) min: 9 max: 10 x̄: 9.33 x̃: 9
helped stats (rel) min: 0.29% max: 0.41% x̄: 0.34% x̃: 0.34%
95% mean confidence interval for instructions value: -9.46 -9.20
95% mean confidence interval for instructions %-change: -0.35% -0.33%
Instructions are helped.

total cycles in shared programs: 175769085 -> 175549519 (-0.12%)
cycles in affected programs: 60770592 -> 60551026 (-0.36%)
helped: 54
HURT: 0
helped stats (abs) min: 252 max: 13434 x̄: 4066.04 x̃: 1290
helped stats (rel) min: 0.02% max: 0.74% x̄: 0.34% x̃: 0.26%
95% mean confidence interval for cycles value: -5323.59 -2808.48
95% mean confidence interval for cycles %-change: -0.41% -0.27%
Cycles are helped.

No changes on any earlier Intel platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4156>
2020-05-07 10:55:50 -07:00
Ian Romanick f46eabf84e nir/algebraic: Split ibfe and ubfe with two constant sources
I also tried splitting ubfe instructions with one or zero constants,
and zero shaders in shader-db were affected.

The "lost" shader is a compute shader that was promoted from SIMD8 to
SIMD16, so is also counted as the gained shader.

v2: Further restrict bfe splitting.  bfe with multiple constants is
better on at least some Radeon GPUs.  Use -x instead of 32-x in shift
counts.

v3: Fix the outer shift count for ibfe lowering.  Add c=0 optimizations
to prevent bad lowering.  Both suggested by Rhys.  Add shift by -32
optimizations.

Tiger Lake
total instructions in shared programs: 17608764 -> 17596316 (-0.07%)
instructions in affected programs: 303765 -> 291317 (-4.10%)
helped: 113
HURT: 46
helped stats (abs) min: 1 max: 458 x̄: 120.67 x̃: 21
helped stats (rel) min: 0.09% max: 11.23% x̄: 3.47% x̃: 1.39%
HURT stats (abs)   min: 1 max: 201 x̄: 25.83 x̃: 6
HURT stats (rel)   min: 0.23% max: 5.18% x̄: 1.53% x̃: 1.11%
95% mean confidence interval for instructions value: -101.13 -55.45
95% mean confidence interval for instructions %-change: -2.61% -1.44%
Instructions are helped.

total cycles in shared programs: 338390770 -> 333530868 (-1.44%)
cycles in affected programs: 79438330 -> 74578428 (-6.12%)
helped: 112
HURT: 64
helped stats (abs) min: 2 max: 268955 x̄: 44261.93 x̃: 1452
helped stats (rel) min: <.01% max: 29.51% x̄: 4.72% x̃: 2.23%
HURT stats (abs)   min: 2 max: 17618 x̄: 1522.41 x̃: 84
HURT stats (rel)   min: <.01% max: 7.34% x̄: 1.35% x̃: 0.34%
95% mean confidence interval for cycles value: -37232.47 -17993.69
95% mean confidence interval for cycles %-change: -3.37% -1.65%
Cycles are helped.

total spills in shared programs: 8944 -> 8138 (-9.01%)
spills in affected programs: 3240 -> 2434 (-24.88%)
helped: 67
HURT: 0

total fills in shared programs: 9373 -> 7842 (-16.33%)
fills in affected programs: 4736 -> 3205 (-32.33%)
helped: 67
HURT: 0

LOST:   1
GAINED: 2

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 16123288 -> 16116876 (-0.04%)
instructions in affected programs: 241155 -> 234743 (-2.66%)
helped: 126
HURT: 2
helped stats (abs) min: 1 max: 209 x̄: 50.90 x̃: 7
helped stats (rel) min: 0.07% max: 5.94% x̄: 1.76% x̃: 0.65%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.24% x̄: 0.15% x̃: 0.15%
95% mean confidence interval for instructions value: -61.29 -38.89
95% mean confidence interval for instructions %-change: -2.05% -1.42%
Instructions are helped.

total cycles in shared programs: 335419163 -> 330438819 (-1.48%)
cycles in affected programs: 77515502 -> 72535158 (-6.42%)
helped: 139
HURT: 37
helped stats (abs) min: 2 max: 269140 x̄: 36374.19 x̃: 597
helped stats (rel) min: <.01% max: 28.60% x̄: 3.67% x̃: 1.31%
HURT stats (abs)   min: 4 max: 17618 x̄: 2045.08 x̃: 174
HURT stats (rel)   min: 0.02% max: 8.32% x̄: 2.61% x̃: 0.62%
95% mean confidence interval for cycles value: -37799.30 -18795.51
95% mean confidence interval for cycles %-change: -3.13% -1.57%
Cycles are helped.

total spills in shared programs: 8065 -> 7306 (-9.41%)
spills in affected programs: 3153 -> 2394 (-24.07%)
helped: 67
HURT: 0

total fills in shared programs: 8710 -> 7412 (-14.90%)
fills in affected programs: 4466 -> 3168 (-29.06%)
helped: 67
HURT: 0

LOST:   1
GAINED: 1

Broadwell
total instructions in shared programs: 14970538 -> 14965967 (-0.03%)
instructions in affected programs: 227040 -> 222469 (-2.01%)
helped: 126
HURT: 2
helped stats (abs) min: 1 max: 136 x̄: 36.29 x̃: 8
helped stats (rel) min: 0.07% max: 6.02% x̄: 1.47% x̃: 0.89%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.24% x̄: 0.14% x̃: 0.14%
95% mean confidence interval for instructions value: -43.05 -28.37
95% mean confidence interval for instructions %-change: -1.69% -1.19%
Instructions are helped.

total cycles in shared programs: 336237662 -> 333035960 (-0.95%)
cycles in affected programs: 72066394 -> 68864692 (-4.44%)
helped: 134
HURT: 42
helped stats (abs) min: 4 max: 122640 x̄: 24344.54 x̃: 1833
helped stats (rel) min: <.01% max: 26.93% x̄: 4.02% x̃: 2.38%
HURT stats (abs)   min: 1 max: 17205 x̄: 1439.69 x̃: 92
HURT stats (rel)   min: <.01% max: 7.12% x̄: 1.34% x̃: 0.62%
95% mean confidence interval for cycles value: -23753.58 -12629.40
95% mean confidence interval for cycles %-change: -3.50% -1.98%
Cycles are helped.

total spills in shared programs: 21122 -> 20204 (-4.35%)
spills in affected programs: 3644 -> 2726 (-25.19%)
helped: 67
HURT: 0

total fills in shared programs: 24879 -> 23460 (-5.70%)
fills in affected programs: 4883 -> 3464 (-29.06%)
helped: 67
HURT: 0

Haswell
total instructions in shared programs: 13148269 -> 13145444 (-0.02%)
instructions in affected programs: 137046 -> 134221 (-2.06%)
helped: 97
HURT: 3
helped stats (abs) min: 1 max: 137 x̄: 30.58 x̃: 3
helped stats (rel) min: 0.14% max: 4.38% x̄: 1.38% x̃: 0.44%
HURT stats (abs)   min: 1 max: 70 x̄: 47.00 x̃: 70
HURT stats (rel)   min: 0.05% max: 5.82% x̄: 3.90% x̃: 5.82%
95% mean confidence interval for instructions value: -37.15 -19.35
95% mean confidence interval for instructions %-change: -1.56% -0.89%
Instructions are helped.

total cycles in shared programs: 321221834 -> 318333159 (-0.90%)
cycles in affected programs: 54932349 -> 52043674 (-5.26%)
helped: 95
HURT: 53
helped stats (abs) min: 4 max: 123390 x̄: 30648.39 x̃: 702
helped stats (rel) min: <.01% max: 28.87% x̄: 4.27% x̃: 2.87%
HURT stats (abs)   min: 4 max: 2357 x̄: 432.49 x̃: 113
HURT stats (rel)   min: <.01% max: 3.44% x̄: 1.03% x̃: 0.54%
95% mean confidence interval for cycles value: -26154.16 -12881.99
95% mean confidence interval for cycles %-change: -3.20% -1.55%
Cycles are helped.

total spills in shared programs: 19878 -> 19293 (-2.94%)
spills in affected programs: 3020 -> 2435 (-19.37%)
helped: 41
HURT: 2

total fills in shared programs: 20918 -> 19875 (-4.99%)
fills in affected programs: 3968 -> 2925 (-26.29%)
helped: 41
HURT: 2

LOST:   0
GAINED: 1

Ivy Bridge
total instructions in shared programs: 11875585 -> 11873641 (-0.02%)
instructions in affected programs: 78065 -> 76121 (-2.49%)
helped: 27
HURT: 0
helped stats (abs) min: 8 max: 134 x̄: 72.00 x̃: 72
helped stats (rel) min: 0.36% max: 4.23% x̄: 2.42% x̃: 2.42%
95% mean confidence interval for instructions value: -83.68 -60.32
95% mean confidence interval for instructions %-change: -2.78% -2.07%
Instructions are helped.

total cycles in shared programs: 178232734 -> 175769085 (-1.38%)
cycles in affected programs: 50018707 -> 47555058 (-4.93%)
helped: 27
HURT: 0
helped stats (abs) min: 82035 max: 99953 x̄: 91246.26 x̃: 92278
helped stats (rel) min: 4.40% max: 5.69% x̄: 4.93% x̃: 4.95%
95% mean confidence interval for cycles value: -93674.20 -88818.32
95% mean confidence interval for cycles %-change: -5.09% -4.78%
Cycles are helped.

total spills in shared programs: 4182 -> 3739 (-10.59%)
spills in affected programs: 1089 -> 646 (-40.68%)
helped: 27
HURT: 0

total fills in shared programs: 5216 -> 4345 (-16.70%)
fills in affected programs: 1874 -> 1003 (-46.48%)
helped: 27
HURT: 0

No changes on any earlier Intel platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4156>
2020-05-07 10:55:50 -07:00