Commit Graph

296 Commits

Author SHA1 Message Date
Elie Tournier 22c5c54a4f nir/algebraic: sqrt(x)*sqrt(x) -> fabs(x)
total instructions in shared programs: 12840840 -> 12839341 (-0.01%)
instructions in affected programs: 122581 -> 121082 (-1.22%)
helped: 559
HURT: 0

total cycles in shared programs: 302505756 -> 302490031 (<.01%)
cycles in affected programs: 2022900 -> 2007175 (-0.78%)
helped: 1090
HURT: 130

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/948>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/948>
2020-01-15 00:30:52 +00:00
Elie Tournier 6f394343b1 nir/algebraic: i2f(f2i()) -> trunc()
total instructions in shared programs: 12840968 -> 12840784 (<.01%)
instructions in affected programs: 17886 -> 17702 (-1.03%)
helped: 77
HURT: 0

total cycles in shared programs: 302508917 -> 302505592 (<.01%)
cycles in affected programs: 249964 -> 246639 (-1.33%)
helped: 70
HURT: 7

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/948>
2020-01-15 00:30:52 +00:00
Rhys Perry cc3ef3643a nir/algebraic: a & ~(a >> 31) -> imax(a, 0)
Found in some Doom shaders

Totals from affected shaders:
SGPRS: 30056 -> 30064 (0.03 %)
VGPRS: 28024 -> 28024 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 4278648 -> 4270852 (-0.18 %) bytes
Max Waves: 1476 -> 1476 (0.00 %)
Instructions: 835287 -> 833338 (-0.23 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3089>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3089>
2020-01-14 17:54:40 +00:00
Jonathan Marek 004797002f nir: add option to lower half packing opcodes
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
2019-12-16 19:20:07 -05:00
Ian Romanick fbd5359a0a nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select
Reviewed-by: Matt Turner <mattst88@gmail.com>

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14660366 -> 14653437 (-0.05%)
instructions in affected programs: 316166 -> 309237 (-2.19%)
helped: 905
HURT: 10
helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6
helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97%
95% mean confidence interval for instructions value: -7.91 -7.23
95% mean confidence interval for instructions %-change: -4.46% -3.99%
Instructions are helped.

total cycles in shared programs: 228571646 -> 228549759 (<.01%)
cycles in affected programs: 56239919 -> 56218032 (-0.04%)
helped: 681
HURT: 216
helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10
helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65%
HURT stats (abs)   min: 1 max: 320 x̄: 42.09 x̃: 14
HURT stats (rel)   min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49%
95% mean confidence interval for cycles value: -41.51 -7.29
95% mean confidence interval for cycles %-change: -0.80% -0.49%
Cycles are helped.

LOST:   1
GAINED: 0
2019-12-02 16:46:20 -08:00
Ian Romanick 780b5c1037 nir/algebraic: Simplify some Inf and NaN avoidance code
Since a is non-negative, neither fsqrt nor frsq should return NaN.  frsq
should only return Inf when fsqrt returns 0.

The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.

An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max.  That's more explicit, but we may also want to
have specific bit encodings of float values later.  I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.

total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.

Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.
2019-12-02 16:46:20 -08:00
Ian Romanick 9be4a422a0 nir/algebraic: Mark other comparison exact when removing a == a
This prevents some additional optimizations that would change the
original result.  This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a.  Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.

This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958.  However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false.  If they had left
the calls to isnan() alone, everything would have just worked.

No shader-db changes on any Intel platform.

I also tried marking the comparison generated by the isnan() function
precise.  The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().

I also considered adding a new ir_unop_isnan opcode that would implement
the functionality.  During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).

This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.

Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-04 14:05:49 -08:00
Ian Romanick ea19f2fb68 nir/algebraic: Add the ability to mark a replacement as exact
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-04 14:05:49 -08:00
Rob Clark 5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Rob Clark 1bdde31392 nir: add address calc related opt rules
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark 6320e37d4b nir: add amul instruction
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size.  For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Daniel Schürmann 239423d234 nir: Remove unnecessary subtraction optimizations
These optimizations are already covered after lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann 99848a57b7 nir: recombine nir_op_*sub when lower_sub = false
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.

This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Ian Romanick 317a88b920 nir/algebraic: Additional D3D Boolean optimization
I observed this pattern in several shaders in Hand of Fate 2 while
investigating bugzilla #111490.  This also led to the related
bugzilla #111578.  The shaders from HoF2 are *not* in shader-db.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Skylake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 16222621 -> 16205419 (-0.11%)
instructions in affected programs: 798418 -> 781216 (-2.15%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35
helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09%
95% mean confidence interval for instructions value: -33.22 -29.56
95% mean confidence interval for instructions %-change: -3.11% -2.56%
Instructions are helped.

total cycles in shared programs: 364676209 -> 363345763 (-0.36%)
cycles in affected programs: 112810504 -> 111480058 (-1.18%)
helped: 546
HURT: 7
helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340
helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08%
HURT stats (abs)   min: 2 max: 770 x̄: 238.00 x̃: 43
HURT stats (rel)   min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35%
95% mean confidence interval for cycles value: -2884.33 -1927.41
95% mean confidence interval for cycles %-change: -1.59% -1.21%
Cycles are helped.

total spills in shared programs: 8870 -> 8514 (-4.01%)
spills in affected programs: 1230 -> 874 (-28.94%)
helped: 161
HURT: 0

total fills in shared programs: 21901 -> 21348 (-2.52%)
fills in affected programs: 2120 -> 1567 (-26.08%)
helped: 155
HURT: 5

Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 14994910 -> 14975495 (-0.13%)
instructions in affected programs: 839033 -> 819618 (-2.31%)
helped: 548
HURT: 0
helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49
helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22%
95% mean confidence interval for instructions value: -37.46 -33.40
95% mean confidence interval for instructions %-change: -3.12% -2.70%
Instructions are helped.

total cycles in shared programs: 386032453 -> 384450722 (-0.41%)
cycles in affected programs: 117807357 -> 116225626 (-1.34%)
helped: 547
HURT: 6
helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926
helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31%
HURT stats (abs)   min: 4 max: 60 x̄: 32.83 x̃: 29
HURT stats (rel)   min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65%
95% mean confidence interval for cycles value: -3060.28 -2660.27
95% mean confidence interval for cycles %-change: -1.59% -1.37%
Cycles are helped.

total spills in shared programs: 23372 -> 21869 (-6.43%)
spills in affected programs: 11730 -> 10227 (-12.81%)
helped: 352
HURT: 0

total fills in shared programs: 34747 -> 35351 (1.74%)
fills in affected programs: 11013 -> 11617 (5.48%)
helped: 3
HURT: 347

Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956420 -> 11956126 (<.01%)
instructions in affected programs: 14898 -> 14604 (-1.97%)
helped: 98
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -2.18% -1.98%
Instructions are helped.

total cycles in shared programs: 178791217 -> 178790792 (<.01%)
cycles in affected programs: 149763 -> 149338 (-0.28%)
helped: 91
HURT: 7
helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16
helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18%
HURT stats (abs)   min: 3 max: 322 x̄: 207.43 x̃: 322
HURT stats (rel)   min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41%
95% mean confidence interval for cycles value: -18.94 10.27
95% mean confidence interval for cycles %-change: -1.28% 0.49%
Inconclusive result (value mean confidence interval includes 0).
2019-09-19 14:22:22 -07:00
Ian Romanick 92f70df8c3 nir/algebraic: Do not apply late DPH optimization in vertex processing stages
Some shaders do not use 'invariant' in vertex and (possibly) geometry
shader stages on some outputs that are intended to be invariant.  For
various reasons, this optimization may not be fully applied in all
shaders used for different rendering passes of the same geometry.  This
can result in Z-fighting artifacts (at best).  For now, disable this
optimization in these stages.

In tessellation stages applications seem to use 'precise' when
necessary, so allow the optimization in those stages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16194726 -> 16344745 (0.93%)
instructions in affected programs: 2855172 -> 3005191 (5.25%)
helped: 6
HURT: 20279
helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1
helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44%
HURT stats (abs)   min: 1 max: 32 x̄: 7.40 x̃: 7
HURT stats (rel)   min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56%
95% mean confidence interval for instructions value: 7.34 7.45
95% mean confidence interval for instructions %-change: 8.48% 8.67%
Instructions are HURT.

total cycles in shared programs: 364471296 -> 365014683 (0.15%)
cycles in affected programs: 32421530 -> 32964917 (1.68%)
helped: 2925
HURT: 16144
helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5
helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15%
HURT stats (abs)   min: 1 max: 18471 x̄: 36.99 x̃: 15
HURT stats (rel)   min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87%
95% mean confidence interval for cycles value: 21.58 35.41
95% mean confidence interval for cycles %-change: 4.36% 4.52%
Cycles are HURT.
2019-09-19 14:21:31 -07:00
Andres Gomez 3f782cdd25 nir/algebraic: mark float optimizations returning one parameter as inexact
With the arrival of VK_KHR_shader_float_controls algebraic
optimizations for float types of the form (('fop', a, b), a) become
inexact depending on the execution mode.

For example, if we have activated SHADER_DENORM_FLUSH_TO_ZERO, in case
of a denorm value for the "a" parameter, we cannot return it still as
a denorm, it needs to be flushed to zero. Therefore, we mark now all
those operations as inexact.

Suggested-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-17 23:39:18 +03:00
Ian Romanick e07248d2a8 nir/algebraic: Clean up value range analysis-based optimizations
Fix the a / b ordering in some compares.  Delete duplicate patterns.
Add a table explaining things.  While I was cleaning this up, I managed
to confuse myself.  The table helped sort that out.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-29 13:15:52 -07:00
Ian Romanick ccb236d1bc nir/algebraic: Mark some value range analysis-based optimizations imprecise
This didn't fix bug #111308, but it was found will trying to find the
actual cause of that bug.

Fixes piglit tests (new in piglit!110):

    - fs-fract-of-NaN.shader_test
    - fs-lt-nan-tautology.shader_test
    - fs-ge-nan-tautology.shader_test

No shader-db changes on any Intel platform.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308
Fixes: b77070e293 ("nir/algebraic: Use value range analysis to eliminate tautological compares")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-29 13:15:52 -07:00
Ian Romanick d3fd1c761a nir/algrbraic: Don't optimize open-coded bitfield reverse when lowering is enabled
This caused a problem on Sandybridge where an open-coded
bitfieldReverse() function could be optimized to a
nir_op_bitfield_reverse that would generate an unsupported BFREV
instruction in the backend.  This was encountered in some Unreal4 tech
demos in shader-db.  The bug was not previously noticed because we don't
actually try to run those demos on Sandybridge.

The fixes tag is a bit a lie.  The actual bug was introduced about
26,000 commits earlier in 371c4b3c48 ("nir: Recognize open-coded
bitfield_reverse.").  Without the NIR lowering pass, the flag needed to
avoid the optimization does not exist.  Hopefully nobody will care to
fix this on an earlier Mesa release.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 7afa26d4e3 ("nir: Add lowering for nir_op_bitfield_reverse.")
2019-08-28 11:38:51 -07:00
Daniel Schürmann 7fa1740035 nir/algebraic: some subtraction optimizations
Changes with RADV/ACO:
Totals from affected shaders:
SGPRS: 444087 -> 455543 (2.58 %)
VGPRS: 436468 -> 436768 (0.07 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 13448928 -> 13353520 (-0.71 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 68060 -> 67979 (-0.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-21 08:51:49 +00:00
Rhys Perry 0a790c3019 nir/algebraic: add a few masking-before-unpack optimizations
Helps some Dawn of War 3 and F1 2017 shaders with ACO:
Totals from affected shaders:
SGPRS: 2136 -> 2128 (-0.37 %)
VGPRS: 1624 -> 1628 (0.25 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 168068 -> 164332 (-2.22 %) bytes
LDS: 44 -> 44 (0.00 %) blocks
Max Waves: 222 -> 221 (-0.45 %)
Wait states: 0 -> 0 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-16 12:13:01 +01:00
Ian Romanick 0e6581b87d nir/algebraic: Reassociate shift-by-constant of shift-by-constant
v2: After some review discussion with Alyssa, the replacements now
correct account for cases where (b+c) >= bitsize.

v3: Use a temporary to simplify the Python code quite a bit.  Suggested
by Jason.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16251155 -> 16249576 (<.01%)
instructions in affected programs: 232627 -> 231048 (-0.68%)
helped: 547
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -3.12 -2.65
95% mean confidence interval for instructions %-change: -1.20% -1.06%
Instructions are helped.

total cycles in shared programs: 365924392 -> 365372103 (-0.15%)
cycles in affected programs: 59207053 -> 58654764 (-0.93%)
helped: 497
HURT: 34
helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16
helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82%
HURT stats (abs)   min: 2 max: 424 x̄: 101.03 x̃: 63
HURT stats (rel)   min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06%
95% mean confidence interval for cycles value: -1426.41 -653.77
95% mean confidence interval for cycles %-change: -1.66% -1.15%
Cycles are helped.

total spills in shared programs: 8870 -> 8871 (0.01%)
spills in affected programs: 104 -> 105 (0.96%)
helped: 0
HURT: 1

Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11956236 -> 11955635 (<.01%)
instructions in affected programs: 94110 -> 93509 (-0.64%)
helped: 106
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4
helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76%
95% mean confidence interval for instructions value: -6.62 -4.72
95% mean confidence interval for instructions %-change: -2.27% -1.64%
Instructions are helped.

total cycles in shared programs: 179296340 -> 178788044 (-0.28%)
cycles in affected programs: 51009603 -> 50501307 (-1.00%)
helped: 82
HURT: 7
helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16
helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11%
HURT stats (abs)   min: 2 max: 8 x̄: 3.14 x̃: 2
HURT stats (rel)   min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10%
95% mean confidence interval for cycles value: -7649.38 -3773.00
95% mean confidence interval for cycles %-change: -2.71% -1.99%
Cycles are helped.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-14 11:15:37 -07:00
Ian Romanick 73aaeac0a3 nir/algebraic: Reassociate add-and-shift to be shift-and-add
A common thing in many shaders:

    uniform vs { vec4 bones[...]; };

    ...

    x = some_calculation(bones[i + 0]);
    y = some_calculation(bones[i + 1]);
    z = some_calculation(bones[i + 2]);

This turns into stuff like

    vec1 32 ssa_12 = iadd ssa_11, ssa_0
    vec1 32 ssa_13 = ishl ssa_12, ssa_3
    vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
    vec1 32 ssa_15 = iadd ssa_11, ssa_1
    vec1 32 ssa_16 = ishl ssa_15, ssa_3
    vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
    vec1 32 ssa_18 = iadd ssa_11, ssa_2
    vec1 32 ssa_19 = ishl ssa_18, ssa_3
    vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)

By reassociating the shift and the add, we can reduce this to

    vec1 32 ssa_12 = ishl ssa_11, ssa_3
    vec1 32 ssa_13 = iadd ssa_12, ssa_0
    vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0)
    vec1 32 ssa_16 = iadd ssa_12, ssa_1
    vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0)
    vec1 32 ssa_19 = iadd ssa_12, ssa_2
    vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0)

v2: Add some commentary from Rhys Perry's nearly identical patch.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16277758 -> 16250704 (-0.17%)
instructions in affected programs: 1440284 -> 1413230 (-1.88%)
helped: 4920
HURT: 6
helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4
helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79%
HURT stats (abs)   min: 1 max: 12 x̄: 4.50 x̃: 3
HURT stats (rel)   min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55%
95% mean confidence interval for instructions value: -5.67 -5.31
95% mean confidence interval for instructions %-change: -2.26% -2.16%
Instructions are helped.

total cycles in shared programs: 367118526 -> 365895358 (-0.33%)
cycles in affected programs: 93504145 -> 92280977 (-1.31%)
helped: 2754
HURT: 1269
helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16
helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12%
HURT stats (abs)   min: 1 max: 1500 x̄: 35.85 x̃: 9
HURT stats (rel)   min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75%
95% mean confidence interval for cycles value: -387.31 -220.78
95% mean confidence interval for cycles %-change: -2.11% -1.68%
Cycles are helped.

LOST:   1
GAINED: 1

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-14 11:15:32 -07:00
Ian Romanick 5544b2cbbd nir/algebraic: Use value range analysis to eliminate useless unary ops
Sandy Bridge is the big winner because it lies at something of a
crossroads.  It supports a fairly high OpenGL version, and it still has
the old style math box.  The high OpenGL version means a lot more
shaders can run on it.  The old style math box means extra moves are
necessary to resolve source modifiers on operands to complex math
instructions like COS, SQRT, and RCP.

v2: Remove a couple patterns that are now redundant.

All Gen7+ platforms had similar results.  (Ice Lake shown)
total instructions in shared programs: 16282006 -> 16278207 (-0.02%)
instructions in affected programs: 174555 -> 170756 (-2.18%)
helped: 661
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 5.75 x̃: 3
helped stats (rel) min: 0.06% max: 23.68% x̄: 2.81% x̃: 1.94%
95% mean confidence interval for instructions value: -6.16 -5.34
95% mean confidence interval for instructions %-change: -3.02% -2.60%
Instructions are helped.

total cycles in shared programs: 367168597 -> 367134284 (<.01%)
cycles in affected programs: 1105276 -> 1070963 (-3.10%)
helped: 460
HURT: 150
helped stats (abs) min: 1 max: 568 x̄: 96.60 x̃: 82
helped stats (rel) min: 0.02% max: 32.50% x̄: 7.99% x̃: 4.27%
HURT stats (abs)   min: 1 max: 901 x̄: 67.49 x̃: 39
HURT stats (rel)   min: 0.07% max: 20.00% x̄: 4.90% x̃: 4.22%
95% mean confidence interval for cycles value: -65.68 -46.82
95% mean confidence interval for cycles %-change: -5.59% -4.05%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10824272 -> 10802557 (-0.20%)
instructions in affected programs: 1237988 -> 1216273 (-1.75%)
helped: 8199
HURT: 0
helped stats (abs) min: 1 max: 41 x̄: 2.65 x̃: 2
helped stats (rel) min: 0.12% max: 20.00% x̄: 2.04% x̃: 1.73%
95% mean confidence interval for instructions value: -2.70 -2.59
95% mean confidence interval for instructions %-change: -2.07% -2.00%
Instructions are helped.

total cycles in shared programs: 154009894 -> 153843598 (-0.11%)
cycles in affected programs: 10650486 -> 10484190 (-1.56%)
helped: 4973
HURT: 1533
helped stats (abs) min: 1 max: 3904 x̄: 40.20 x̃: 20
helped stats (rel) min: 0.02% max: 41.72% x̄: 2.63% x̃: 1.67%
HURT stats (abs)   min: 1 max: 453 x̄: 21.94 x̃: 8
HURT stats (rel)   min: 0.02% max: 41.91% x̄: 1.54% x̃: 0.58%
95% mean confidence interval for cycles value: -28.02 -23.10
95% mean confidence interval for cycles %-change: -1.74% -1.56%
Cycles are helped.

LOST:   0
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8135196 -> 8134888 (<.01%)
instructions in affected programs: 31920 -> 31612 (-0.96%)
helped: 169
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 1.82 x̃: 2
helped stats (rel) min: 0.43% max: 3.23% x̄: 1.23% x̃: 1.16%
95% mean confidence interval for instructions value: -2.01 -1.64
95% mean confidence interval for instructions %-change: -1.32% -1.15%
Instructions are helped.

total cycles in shared programs: 188575724 -> 188574092 (<.01%)
cycles in affected programs: 406840 -> 405208 (-0.40%)
helped: 169
HURT: 0
helped stats (abs) min: 4 max: 72 x̄: 9.66 x̃: 10
helped stats (rel) min: 0.07% max: 2.16% x̄: 0.57% x̃: 0.47%
95% mean confidence interval for cycles value: -10.72 -8.59
95% mean confidence interval for cycles %-change: -0.63% -0.50%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:14 -07:00
Ian Romanick 8d14380971 nir/algebraic: Use value range analysis to convert fmin to fsat
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16297320 -> 16282006 (-0.09%)
instructions in affected programs: 2434498 -> 2419184 (-0.63%)
helped: 8091
HURT: 1
helped stats (abs) min: 1 max: 51 x̄: 1.89 x̃: 2
helped stats (rel) min: 0.04% max: 14.29% x̄: 0.98% x̃: 0.95%
HURT stats (abs)   min: 7 max: 7 x̄: 7.00 x̃: 7
HURT stats (rel)   min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for instructions value: -1.94 -1.85
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.

total cycles in shared programs: 367221624 -> 367168597 (-0.01%)
cycles in affected programs: 126409635 -> 126356608 (-0.04%)
helped: 5612
HURT: 1023
helped stats (abs) min: 1 max: 2332 x̄: 31.11 x̃: 16
helped stats (rel) min: <.01% max: 30.31% x̄: 1.69% x̃: 1.42%
HURT stats (abs)   min: 1 max: 2372 x̄: 118.84 x̃: 16
HURT stats (rel)   min: <.01% max: 46.98% x̄: 1.46% x̃: 0.35%
95% mean confidence interval for cycles value: -11.52 -4.46
95% mean confidence interval for cycles %-change: -1.26% -1.14%
Cycles are helped.

total spills in shared programs: 8868 -> 8870 (0.02%)
spills in affected programs: 28 -> 30 (7.14%)
helped: 0
HURT: 1

total fills in shared programs: 21903 -> 21904 (<.01%)
fills in affected programs: 42 -> 43 (2.38%)
helped: 0
HURT: 1

Haswell
total instructions in shared programs: 13353925 -> 13338728 (-0.11%)
instructions in affected programs: 2265850 -> 2250653 (-0.67%)
helped: 8127
HURT: 5
helped stats (abs) min: 1 max: 51 x̄: 1.88 x̃: 2
helped stats (rel) min: 0.04% max: 20.00% x̄: 1.13% x̃: 1.07%
HURT stats (abs)   min: 5 max: 16 x̄: 9.00 x̃: 6
HURT stats (rel)   min: 0.19% max: 0.52% x̄: 0.35% x̃: 0.28%
95% mean confidence interval for instructions value: -1.91 -1.83
95% mean confidence interval for instructions %-change: -1.15% -1.11%
Instructions are helped.

total cycles in shared programs: 375535444 -> 375536343 (<.01%)
cycles in affected programs: 131206582 -> 131207481 (<.01%)
helped: 5590
HURT: 1055
helped stats (abs) min: 1 max: 2844 x̄: 34.15 x̃: 16
helped stats (rel) min: <.01% max: 21.57% x̄: 2.08% x̃: 1.60%
HURT stats (abs)   min: 1 max: 2487 x̄: 181.78 x̃: 21
HURT stats (rel)   min: <.01% max: 40.66% x̄: 1.96% x̃: 0.37%
95% mean confidence interval for cycles value: -4.74 5.01
95% mean confidence interval for cycles %-change: -1.51% -1.37%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 23401 -> 23407 (0.03%)
spills in affected programs: 248 -> 254 (2.42%)
helped: 2
HURT: 5

total fills in shared programs: 34850 -> 34845 (-0.01%)
fills in affected programs: 383 -> 378 (-1.31%)
helped: 2
HURT: 5

Ivy Bridge
total instructions in shared programs: 11975423 -> 11968117 (-0.06%)
instructions in affected programs: 845703 -> 838397 (-0.86%)
helped: 4071
HURT: 0
helped stats (abs) min: 1 max: 51 x̄: 1.79 x̃: 1
helped stats (rel) min: 0.08% max: 8.21% x̄: 1.04% x̃: 0.93%
95% mean confidence interval for instructions value: -1.87 -1.71
95% mean confidence interval for instructions %-change: -1.06% -1.02%
Instructions are helped.

total cycles in shared programs: 179674318 -> 179635552 (-0.02%)
cycles in affected programs: 5100065 -> 5061299 (-0.76%)
helped: 2650
HURT: 611
helped stats (abs) min: 1 max: 900 x̄: 21.85 x̃: 16
helped stats (rel) min: <.01% max: 21.55% x̄: 2.39% x̃: 1.40%
HURT stats (abs)   min: 1 max: 1841 x̄: 31.33 x̃: 6
HURT stats (rel)   min: <.01% max: 58.71% x̄: 1.64% x̃: 0.37%
95% mean confidence interval for cycles value: -14.14 -9.64
95% mean confidence interval for cycles %-change: -1.75% -1.52%
Cycles are helped.

LOST:   3
GAINED: 7

Sandy Bridge
total instructions in shared programs: 10828844 -> 10824272 (-0.04%)
instructions in affected programs: 525678 -> 521106 (-0.87%)
helped: 2386
HURT: 0
helped stats (abs) min: 1 max: 51 x̄: 1.92 x̃: 2
helped stats (rel) min: 0.11% max: 7.96% x̄: 1.05% x̃: 0.94%
95% mean confidence interval for instructions value: -2.04 -1.80
95% mean confidence interval for instructions %-change: -1.08% -1.03%
Instructions are helped.

total cycles in shared programs: 154024591 -> 154009894 (<.01%)
cycles in affected programs: 4005766 -> 3991069 (-0.37%)
helped: 1245
HURT: 506
helped stats (abs) min: 1 max: 585 x̄: 21.07 x̃: 16
helped stats (rel) min: 0.02% max: 11.57% x̄: 1.98% x̃: 0.83%
HURT stats (abs)   min: 1 max: 639 x̄: 22.81 x̃: 6
HURT stats (rel)   min: 0.01% max: 26.21% x̄: 1.07% x̃: 0.26%
95% mean confidence interval for cycles value: -10.57 -6.21
95% mean confidence interval for cycles %-change: -1.23% -0.97%
Cycles are helped.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8137248 -> 8135196 (-0.03%)
instructions in affected programs: 148322 -> 146270 (-1.38%)
helped: 992
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 2.07 x̃: 2
helped stats (rel) min: 0.41% max: 9.73% x̄: 1.74% x̃: 1.51%
95% mean confidence interval for instructions value: -2.16 -1.98
95% mean confidence interval for instructions %-change: -1.80% -1.67%
Instructions are helped.

total cycles in shared programs: 188583424 -> 188575724 (<.01%)
cycles in affected programs: 4409620 -> 4401920 (-0.17%)
helped: 956
HURT: 6
helped stats (abs) min: 2 max: 168 x̄: 8.09 x̃: 8
helped stats (rel) min: 0.04% max: 6.76% x̄: 0.27% x̃: 0.18%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.10% max: 0.10% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: -8.41 -7.60
95% mean confidence interval for cycles %-change: -0.29% -0.25%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:14 -07:00
Ian Romanick b77070e293 nir/algebraic: Use value range analysis to eliminate tautological compares
It's only one application on one platform (Haswell) that's affected,
but spills and fills increase quite dramatically. :(

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16320850 -> 16297320 (-0.14%)
instructions in affected programs: 448012 -> 424482 (-5.25%)
helped: 1938
HURT: 0
helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10
helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38%
95% mean confidence interval for instructions value: -12.80 -11.48
95% mean confidence interval for instructions %-change: -5.99% -5.72%
Instructions are helped.

total cycles in shared programs: 367496943 -> 367221624 (-0.07%)
cycles in affected programs: 8557232 -> 8281913 (-3.22%)
helped: 1907
HURT: 26
helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48
helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94%
HURT stats (abs)   min: 4 max: 1870 x̄: 208.23 x̃: 20
HURT stats (rel)   min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79%
95% mean confidence interval for cycles value: -165.38 -119.48
95% mean confidence interval for cycles %-change: -5.68% -5.04%
Cycles are helped.

LOST:   1
GAINED: 0

Haswell
total instructions in shared programs: 13374211 -> 13353925 (-0.15%)
instructions in affected programs: 349868 -> 329582 (-5.80%)
helped: 1669
HURT: 1
helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10
helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49%
HURT stats (abs)   min: 700 max: 700 x̄: 700.00 x̃: 700
HURT stats (rel)   min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34%
95% mean confidence interval for instructions value: -13.25 -11.04
95% mean confidence interval for instructions %-change: -7.01% -6.63%
Instructions are helped.

total cycles in shared programs: 375763544 -> 375535444 (-0.06%)
cycles in affected programs: 6932686 -> 6704586 (-3.29%)
helped: 1622
HURT: 48
helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68
helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12%
HURT stats (abs)   min: 3 max: 7451 x̄: 259.44 x̃: 41
HURT stats (rel)   min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88%
95% mean confidence interval for cycles value: -159.86 -113.31
95% mean confidence interval for cycles %-change: -5.86% -5.18%
Cycles are helped.

total spills in shared programs: 23258 -> 23401 (0.61%)
spills in affected programs: 54 -> 197 (264.81%)
helped: 4
HURT: 2

total fills in shared programs: 34775 -> 34850 (0.22%)
fills in affected programs: 52 -> 127 (144.23%)
helped: 4
HURT: 1

LOST:   5
GAINED: 0

Ivy Bridge
total instructions in shared programs: 11996051 -> 11977964 (-0.15%)
instructions in affected programs: 346679 -> 328592 (-5.22%)
helped: 1508
HURT: 0
helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10
helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43%
95% mean confidence interval for instructions value: -12.65 -11.34
95% mean confidence interval for instructions %-change: -5.86% -5.60%
Instructions are helped.

total cycles in shared programs: 179891389 -> 179691339 (-0.11%)
cycles in affected programs: 7869479 -> 7669429 (-2.54%)
helped: 1485
HURT: 23
helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54
helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49%
HURT stats (abs)   min: 1 max: 403 x̄: 93.48 x̃: 6
HURT stats (rel)   min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81%
95% mean confidence interval for cycles value: -154.59 -110.73
95% mean confidence interval for cycles %-change: -4.79% -4.19%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10829247 -> 10828844 (<.01%)
instructions in affected programs: 21258 -> 20855 (-1.90%)
helped: 88
HURT: 0
helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5
helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21%
95% mean confidence interval for instructions value: -5.03 -4.13
95% mean confidence interval for instructions %-change: -2.21% -1.89%
Instructions are helped.

total cycles in shared programs: 154035437 -> 154024591 (<.01%)
cycles in affected programs: 430176 -> 419330 (-2.52%)
helped: 78
HURT: 10
helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32
helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07%
HURT stats (abs)   min: 3 max: 265 x̄: 31.30 x̃: 6
HURT stats (rel)   min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21%
95% mean confidence interval for cycles value: -232.53 -13.97
95% mean confidence interval for cycles %-change: -2.13% -1.23%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8137402 -> 8137248 (<.01%)
instructions in affected programs: 2280 -> 2126 (-6.75%)
helped: 10
HURT: 0
helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15
helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95%
95% mean confidence interval for instructions value: -17.69 -13.11
95% mean confidence interval for instructions %-change: -8.99% -5.39%
Instructions are helped.

total cycles in shared programs: 188538716 -> 188583424 (0.02%)
cycles in affected programs: 69326 -> 114034 (64.49%)
helped: 0
HURT: 10
HURT stats (abs)   min: 2068 max: 7686 x̄: 4470.80 x̃: 4870
HURT stats (rel)   min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41%
95% mean confidence interval for cycles value: 2830.86 6110.74
95% mean confidence interval for cycles %-change: 39.18% 99.91%
Cycles are HURT.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick 96fcb3f95b nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements
This just eliminates tautological / contradictory compares that are used
for bcsel and other non-if-statement cases.  If-statements are not
affected because removing flow control can cause the i965 instrution
scheduler to create some very long live ranges resulting in unncessary
spilling.  This causes some shaders to fall of a performance cliff.

Since many small if-statements are already flattened to bcsel, this
optimization covers more than 68% of the possible cases (2417 shaders
helped for instructions on Skylake vs. 3554).

v2: Reorder and add whitespace to make the relationship between the
patterns more obvious.  Suggested by Caio.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333474 -> 16322028 (-0.07%)
instructions in affected programs: 438559 -> 427113 (-2.61%)
helped: 1765
HURT: 0
helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4
helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82%
95% mean confidence interval for instructions value: -6.87 -6.10
95% mean confidence interval for instructions %-change: -4.30% -3.84%
Instructions are helped.

total cycles in shared programs: 367608554 -> 367511103 (-0.03%)
cycles in affected programs: 8368829 -> 8271378 (-1.16%)
helped: 1541
HURT: 129
helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39
helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17%
HURT stats (abs)   min: 1 max: 973 x̄: 42.25 x̃: 10
HURT stats (rel)   min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60%
95% mean confidence interval for cycles value: -64.90 -51.81
95% mean confidence interval for cycles %-change: -3.89% -3.36%
Cycles are helped.

total spills in shared programs: 8867 -> 8868 (0.01%)
spills in affected programs: 18 -> 19 (5.56%)
helped: 0
HURT: 1

total fills in shared programs: 21900 -> 21903 (0.01%)
fills in affected programs: 78 -> 81 (3.85%)
helped: 0
HURT: 1

All Gen6 and earlier platforms had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10829877 -> 10829247 (<.01%)
instructions in affected programs: 30240 -> 29610 (-2.08%)
helped: 177
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3
helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94%
95% mean confidence interval for instructions value: -3.93 -3.18
95% mean confidence interval for instructions %-change: -3.04% -2.32%
Instructions are helped.

total cycles in shared programs: 154036580 -> 154035437 (<.01%)
cycles in affected programs: 352402 -> 351259 (-0.32%)
helped: 96
HURT: 28
helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6
helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46%
HURT stats (abs)   min: 1 max: 117 x̄: 9.68 x̃: 4
HURT stats (rel)   min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23%
95% mean confidence interval for cycles value: -13.40 -5.03
95% mean confidence interval for cycles %-change: -1.62% -0.53%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick d24edb4b8c nir/algebraic: Simplify some comparisons like a+constant < constant
v2: Remove unsafe integer versions of the optimizations.  This change
had no effect on shader-db results.  Suggested by Caio.

All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333713 -> 16332631 (<.01%)
instructions in affected programs: 258112 -> 257030 (-0.42%)
helped: 1275
HURT: 407
helped stats (abs) min: 1 max: 7 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.20% max: 8.33% x̄: 1.33% x̃: 0.86%
HURT stats (abs)   min: 1 max: 2 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 2.94% x̄: 0.98% x̃: 0.98%
95% mean confidence interval for instructions value: -0.70 -0.59
95% mean confidence interval for instructions %-change: -0.84% -0.70%
Instructions are helped.

total cycles in shared programs: 367596791 -> 367601268 (<.01%)
cycles in affected programs: 3420062 -> 3424539 (0.13%)
helped: 1553
HURT: 783
helped stats (abs) min: 1 max: 742 x̄: 24.36 x̃: 6
helped stats (rel) min: 0.05% max: 21.12% x̄: 1.47% x̃: 0.65%
HURT stats (abs)   min: 1 max: 557 x̄: 54.04 x̃: 14
HURT stats (rel)   min: 0.01% max: 33.66% x̄: 3.36% x̃: 1.43%
95% mean confidence interval for cycles value: -1.60 5.43
95% mean confidence interval for cycles %-change: -0.03% 0.33%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Iron Lake
total instructions in shared programs: 8137992 -> 8137874 (<.01%)
instructions in affected programs: 17501 -> 17383 (-0.67%)
helped: 104
HURT: 2
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.25% max: 2.63% x̄: 0.87% x̃: 0.72%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.22 -1.00
95% mean confidence interval for instructions %-change: -0.94% -0.76%
Instructions are helped.

total cycles in shared programs: 188540038 -> 188539650 (<.01%)
cycles in affected programs: 704574 -> 704186 (-0.06%)
helped: 125
HURT: 84
helped stats (abs) min: 2 max: 96 x̄: 6.45 x̃: 4
helped stats (rel) min: <.01% max: 3.47% x̄: 0.42% x̃: 0.25%
HURT stats (abs)   min: 2 max: 58 x̄: 4.98 x̃: 4
HURT stats (rel)   min: 0.01% max: 2.75% x̄: 0.36% x̃: 0.33%
95% mean confidence interval for cycles value: -3.20 -0.52
95% mean confidence interval for cycles %-change: -0.19% -0.03%
Cycles are helped.

GM45
total instructions in shared programs: 5008889 -> 5008830 (<.01%)
instructions in affected programs: 8824 -> 8765 (-0.67%)
helped: 52
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.25% max: 2.38% x̄: 0.86% x̃: 0.72%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.27 -0.95
95% mean confidence interval for instructions %-change: -0.96% -0.71%
Instructions are helped.

total cycles in shared programs: 128969426 -> 128969128 (<.01%)
cycles in affected programs: 399798 -> 399500 (-0.07%)
helped: 74
HURT: 30
helped stats (abs) min: 2 max: 22 x̄: 6.76 x̃: 6
helped stats (rel) min: <.01% max: 1.83% x̄: 0.46% x̃: 0.29%
HURT stats (abs)   min: 2 max: 58 x̄: 6.73 x̃: 6
HURT stats (rel)   min: 0.06% max: 2.75% x̄: 0.42% x̃: 0.21%
95% mean confidence interval for cycles value: -4.60 -1.14
95% mean confidence interval for cycles %-change: -0.32% -0.08%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick 7c64cbf49d nir/algebraic: Recognize (a < 0 || 0 < b) as min(a, -b) < 0
Similar to commit 97e6c1b9 and f5cf74d8ba.

First apply 0 < b => -b < 0 to get (a < 0 || -b < 0), then apply some
pre-existing rules to get min(a, -b) < 0.

v2: Substantially update the comment explaining the use of is_used_once
and the duplication of patterns.  Suggested by Caio.  Also, while flt
and fge are not commutative, ior and iand are.  Half of the original
patterns were redundant, so delete them.  As alternate justification for
deleting them, fmin(a, -b) < 0 <=> 0 < fmax(-a, b).  Proof left as an
exercise for the reader.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16333789 -> 16333713 (<.01%)
instructions in affected programs: 11424 -> 11348 (-0.67%)
helped: 32
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.20% max: 1.67% x̄: 0.76% x̃: 0.69%
95% mean confidence interval for instructions value: -3.03 -1.72
95% mean confidence interval for instructions %-change: -0.89% -0.62%
Instructions are helped.

total cycles in shared programs: 367598295 -> 367596791 (<.01%)
cycles in affected programs: 141414 -> 139910 (-1.06%)
helped: 23
HURT: 6
helped stats (abs) min: 3 max: 386 x̄: 72.52 x̃: 20
helped stats (rel) min: 0.15% max: 4.86% x̄: 1.01% x̃: 0.76%
HURT stats (abs)   min: 4 max: 88 x̄: 27.33 x̃: 12
HURT stats (rel)   min: 0.22% max: 3.95% x̄: 1.08% x̃: 0.59%
95% mean confidence interval for cycles value: -93.51 -10.21
95% mean confidence interval for cycles %-change: -1.10% -0.05%
Cycles are helped.

total instructions in shared programs: 10830836 -> 10830779 (<.01%)
instructions in affected programs: 6895 -> 6838 (-0.83%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.75 x̃: 1
helped stats (rel) min: 0.14% max: 1.61% x̄: 0.65% x̃: 0.33%
95% mean confidence interval for instructions value: -8.46 -1.04
95% mean confidence interval for instructions %-change: -1.03% -0.27%
Instructions are helped.

total cycles in shared programs: 154028477 -> 154032740 (<.01%)
cycles in affected programs: 178433 -> 182696 (2.39%)
helped: 3
HURT: 9
helped stats (abs) min: 3 max: 20 x̄: 11.00 x̃: 10
helped stats (rel) min: 0.07% max: 0.20% x̄: 0.12% x̃: 0.09%
HURT stats (abs)   min: 27 max: 1415 x̄: 477.33 x̃: 262
HURT stats (rel)   min: 0.22% max: 6.45% x̄: 2.49% x̃: 1.76%
95% mean confidence interval for cycles value: 28.68 681.82
95% mean confidence interval for cycles %-change: 0.37% 3.30%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8137966 -> 8137992 (<.01%)
instructions in affected programs: 3281 -> 3307 (0.79%)
helped: 0
HURT: 6
HURT stats (abs)   min: 3 max: 7 x̄: 4.33 x̃: 3
HURT stats (rel)   min: 0.63% max: 1.01% x̄: 0.76% x̃: 0.64%
95% mean confidence interval for instructions value: 2.17 6.50
95% mean confidence interval for instructions %-change: 0.56% 0.96%
Instructions are HURT.

total cycles in shared programs: 188539386 -> 188540038 (<.01%)
cycles in affected programs: 103826 -> 104478 (0.63%)
helped: 0
HURT: 7
HURT stats (abs)   min: 16 max: 218 x̄: 93.14 x̃: 80
HURT stats (rel)   min: 0.14% max: 0.95% x̄: 0.53% x̃: 0.46%
95% mean confidence interval for cycles value: 10.26 176.02
95% mean confidence interval for cycles %-change: 0.24% 0.81%
Cycles are HURT.

GM45
total instructions in shared programs: 5008876 -> 5008889 (<.01%)
instructions in affected programs: 1645 -> 1658 (0.79%)
helped: 0
HURT: 3
HURT stats (abs)   min: 3 max: 7 x̄: 4.33 x̃: 3
HURT stats (rel)   min: 0.63% max: 1.00% x̄: 0.76% x̃: 0.63%

total cycles in shared programs: 128968950 -> 128969426 (<.01%)
cycles in affected programs: 64854 -> 65330 (0.73%)
helped: 0
HURT: 4
HURT stats (abs)   min: 18 max: 218 x̄: 119.00 x̃: 120
HURT stats (rel)   min: 0.14% max: 0.95% x̄: 0.60% x̃: 0.66%
95% mean confidence interval for cycles value: -62.92 300.92
95% mean confidence interval for cycles %-change: -0.05% 1.26%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Ian Romanick 92b75c126b nir/algebraic: Replace checks that a value is between (or not) [0, 1]
v2: Add an extra line to one of the proofs.  Suggested by Caio.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 16329772 -> 16329427 (<.01%)
instructions in affected programs: 41980 -> 41635 (-0.82%)
helped: 110
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 3.14 x̃: 2
helped stats (rel) min: 0.19% max: 5.56% x̄: 1.12% x̃: 0.94%
95% mean confidence interval for instructions value: -4.10 -2.17
95% mean confidence interval for instructions %-change: -1.28% -0.96%
Instructions are helped.

total cycles in shared programs: 367551273 -> 367549979 (<.01%)
cycles in affected programs: 492462 -> 491168 (-0.26%)
helped: 76
HURT: 25
helped stats (abs) min: 1 max: 400 x̄: 42.86 x̃: 12
helped stats (rel) min: 0.06% max: 10.72% x̄: 1.23% x̃: 0.75%
HURT stats (abs)   min: 2 max: 730 x̄: 78.52 x̃: 16
HURT stats (rel)   min: 0.17% max: 6.89% x̄: 2.08% x̃: 1.23%
95% mean confidence interval for cycles value: -37.79 12.16
95% mean confidence interval for cycles %-change: -0.90% 0.07%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Sandy Bridge
total instructions in shared programs: 10831115 -> 10830836 (<.01%)
instructions in affected programs: 37830 -> 37551 (-0.74%)
helped: 70
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 3.99 x̃: 2
helped stats (rel) min: 0.33% max: 7.14% x̄: 1.21% x̃: 0.97%
95% mean confidence interval for instructions value: -5.47 -2.50
95% mean confidence interval for instructions %-change: -1.49% -0.92%
Instructions are helped.

total cycles in shared programs: 154029323 -> 154028477 (<.01%)
cycles in affected programs: 247909 -> 247063 (-0.34%)
helped: 52
HURT: 6
helped stats (abs) min: 2 max: 254 x̄: 25.81 x̃: 4
helped stats (rel) min: 0.07% max: 4.39% x̄: 0.81% x̃: 0.19%
HURT stats (abs)   min: 4 max: 403 x̄: 82.67 x̃: 8
HURT stats (rel)   min: 0.18% max: 1.60% x̄: 0.71% x̃: 0.53%
95% mean confidence interval for cycles value: -34.83 5.65
95% mean confidence interval for cycles %-change: -0.98% -0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake
total instructions in shared programs: 8138007 -> 8137966 (<.01%)
instructions in affected programs: 4060 -> 4019 (-1.01%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.32 x̃: 1
helped stats (rel) min: 0.68% max: 8.33% x̄: 1.45% x̃: 0.90%
95% mean confidence interval for instructions value: -1.50 -1.15
95% mean confidence interval for instructions %-change: -2.11% -0.79%
Instructions are helped.

total cycles in shared programs: 188539492 -> 188539386 (<.01%)
cycles in affected programs: 26280 -> 26174 (-0.40%)
helped: 25
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4
helped stats (rel) min: 0.08% max: 2.11% x̄: 0.54% x̃: 0.50%
95% mean confidence interval for cycles value: -5.08 -3.40
95% mean confidence interval for cycles %-change: -0.70% -0.37%
Cycles are helped.

GM45
total instructions in shared programs: 5008897 -> 5008876 (<.01%)
instructions in affected programs: 2096 -> 2075 (-1.00%)
helped: 16
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1
helped stats (rel) min: 0.68% max: 7.69% x̄: 1.41% x̃: 0.89%
95% mean confidence interval for instructions value: -1.57 -1.06
95% mean confidence interval for instructions %-change: -2.32% -0.49%
Instructions are helped.

total cycles in shared programs: 128969020 -> 128968950 (<.01%)
cycles in affected programs: 18490 -> 18420 (-0.38%)
helped: 15
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 4.67 x̃: 4
helped stats (rel) min: 0.08% max: 2.11% x̄: 0.51% x̃: 0.48%
95% mean confidence interval for cycles value: -6.03 -3.30
95% mean confidence interval for cycles %-change: -0.78% -0.24%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Erico Nunes b3676a6548 nir/algebraic: rename lower_bitshift to lower_bitops
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-31 23:06:04 +02:00
Erico Nunes 4a407df682 nir/algebraic: add new fsum ops and fdot lowering
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-31 21:35:58 +02:00
Jonathan Marek 97c8314c5f nir/algebraic: add scmp algebraic optimizations
When 'x' is the result of a scmp op:

x != 0.0 or x == 1.0: passthrough
x == 0.0 or x != 1.0: invert

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek 9be902097c nir/algebraic: add option to lower fall_equalN/fany_nequalN
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek 397375d3f3 nir/algebraic: add fdot2 optimizations
Add simple fdot2 optimizations that are missing.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek 1e089d0575 nir/algebraic: add option to lower fdph
For backends that don't have a 'fdph' instructions

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek bc3b6168ba nir: replace lower_sincos with algebraic opt
This version has less ops for the same precision.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek 5a4e71c082 nir/algebraic: allow swizzle in nir_algebraic replace expression
This is to allow optimizations in nir_opt_algebraic not otherwise possible

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Rhys Perry e8644122ed nir/algebraic: mark a few comparison simplifications as precise
No vkpipeline-db changes found.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-19 16:33:01 +00:00
Rhys Perry 79801b9d7d nir/algebraic: optimize contradictory iand operands
Some of these were found in a few GTAV, Rise of the Tomb Raider and
Shadow of the Tomb Raider shaders.

Results from vkpipeline-db run with ACO:
Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 220 -> 220 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 13492 -> 11560 (-14.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 69 -> 69 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: use False instead of 0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-07-19 16:33:01 +00:00
Jason Ekstrand 812b341578 nir/algebraic: Optimize comparisons and up-casts
These seem like obvious enough optimizations in the world of multiple
integer bit sizes.  The only known thing which hits these at the moment
is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast
and check for equality.  However, it's something that's bound to come up
as we start seeing more integers in shaders.

The optimizations of comparisons of casted values with constants are
something which we would ideally do with range analysis.  However,
lacking that, we can do it in opt_algebraic as long as one side is a
constant.

In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along
with the previous commit, reduce the number of instructions emitted on
Skylake from 55328 to 44546, a reduction of 20%.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-17 18:44:35 +00:00
Jason Ekstrand e8505e982a nir/algebraic: Optimize comparing unpacked values
We could, in theory, add the same optimization for 64-bit unpack
operations but that's likely to fight with 64-bit integer lowering on
platforms which require it so it will require more infrastructure before
that will be a good idea.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-17 18:44:35 +00:00
Ian Romanick ef7b4fdf3f nir/algebraic: Recognize open-coded flrp(a, b, a)
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

All Gen6-Gen9 platforms had similar results. (Skylake shown)
total instructions in shared programs: 15041996 -> 15041184 (<.01%)
instructions in affected programs: 71776 -> 70964 (-1.13%)
helped: 312
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3
helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28%
95% mean confidence interval for instructions value: -2.66 -2.55
95% mean confidence interval for instructions %-change: -1.89% -1.61%
Instructions are helped.

total cycles in shared programs: 354303333 -> 354301807 (<.01%)
cycles in affected programs: 433742 -> 432216 (-0.35%)
helped: 206
HURT: 78
helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8
helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82%
HURT stats (abs)   min: 1 max: 220 x̄: 35.95 x̃: 10
HURT stats (rel)   min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56%
95% mean confidence interval for cycles value: -10.68 -0.06
95% mean confidence interval for cycles %-change: -0.99% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick 0c2b3a7fc0 nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form
No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Convert the pattern directly to flrp.  There were negligible
improvements on Gen4 and Gen5, and Gen11 was actually hurt.  I believe
the problem is this optimization conflicts with the (1-x)*y =>
ffma(-x, y, y) optimization on Gen11.

Skylake
total instructions in shared programs: 15046487 -> 15041996 (-0.03%)
instructions in affected programs: 194681 -> 190190 (-2.31%)
helped: 880
HURT: 20
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17%
95% mean confidence interval for instructions value: -5.25 -4.73
95% mean confidence interval for instructions %-change: -5.11% -4.36%
Instructions are helped.

total cycles in shared programs: 354340839 -> 354303333 (-0.01%)
cycles in affected programs: 1753622 -> 1716116 (-2.14%)
helped: 786
HURT: 182
helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22
helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84%
HURT stats (abs)   min: 1 max: 440 x̄: 37.99 x̃: 9
HURT stats (rel)   min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32%
95% mean confidence interval for cycles value: -45.90 -31.59
95% mean confidence interval for cycles %-change: -3.09% -2.50%
Cycles are helped.

All Gen6-Gen8 platforms had similar results. (Broadwell shown)
total instructions in shared programs: 15055907 -> 15051466 (-0.03%)
instructions in affected programs: 196370 -> 191929 (-2.26%)
helped: 871
HURT: 26
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12%
95% mean confidence interval for instructions value: -5.21 -4.69
95% mean confidence interval for instructions %-change: -4.99% -4.24%
Instructions are helped.

total cycles in shared programs: 387729170 -> 387699745 (<.01%)
cycles in affected programs: 1816409 -> 1786984 (-1.62%)
helped: 788
HURT: 172
helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22
helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76%
HURT stats (abs)   min: 1 max: 404 x̄: 45.59 x̃: 14
HURT stats (rel)   min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43%
95% mean confidence interval for cycles value: -35.69 -25.61
95% mean confidence interval for cycles %-change: -2.88% -2.40%
Cycles are helped.

total fills in shared programs: 34712 -> 34710 (<.01%)
fills in affected programs: 7 -> 5 (-28.57%)
helped: 1
HURT: 0

LOST:   0
GAINED: 2

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick 09705747d7 nir/algebraic: Reassociate fadd into fmul in DPH-like pattern
Moving the add to the other end of the sequence allows it to be fused
into an FMA.

Ice Lake
total instructions in shared programs: 17173074 -> 16933147 (-1.40%)
instructions in affected programs: 7938745 -> 7698818 (-3.02%)
helped: 35583
HURT: 90
helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6
helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45%
HURT stats (abs)   min: 1 max: 41 x̄: 2.46 x̃: 1
HURT stats (rel)   min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77%
95% mean confidence interval for instructions value: -6.80 -6.65
95% mean confidence interval for instructions %-change: -5.32% -5.22%
Instructions are helped.

total cycles in shared programs: 360881386 -> 359533568 (-0.37%)
cycles in affected programs: 189489144 -> 188141326 (-0.71%)
helped: 27250
HURT: 6707
helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16
helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35%
HURT stats (abs)   min: 1 max: 3507 x̄: 51.56 x̃: 14
HURT stats (rel)   min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27%
95% mean confidence interval for cycles value: -44.70 -34.68
95% mean confidence interval for cycles %-change: -2.75% -2.65%
Cycles are helped.

total spills in shared programs: 8943 -> 8829 (-1.27%)
spills in affected programs: 625 -> 511 (-18.24%)
helped: 6
HURT: 3

total fills in shared programs: 21815 -> 21719 (-0.44%)
fills in affected programs: 1653 -> 1557 (-5.81%)
helped: 7
HURT: 10

LOST:   11
GAINED: 3

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15271996 -> 15040882 (-1.51%)
instructions in affected programs: 7193699 -> 6962585 (-3.21%)
helped: 33985
HURT: 30
helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6
helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85%
HURT stats (abs)   min: 1 max: 41 x̄: 4.00 x̃: 3
HURT stats (rel)   min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72%
95% mean confidence interval for instructions value: -6.87 -6.72
95% mean confidence interval for instructions %-change: -5.59% -5.48%
Instructions are helped.

total cycles in shared programs: 355520785 -> 354253799 (-0.36%)
cycles in affected programs: 185869148 -> 184602162 (-0.68%)
helped: 25824
HURT: 6287
helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16
helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41%
HURT stats (abs)   min: 1 max: 3327 x̄: 51.76 x̃: 14
HURT stats (rel)   min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28%
95% mean confidence interval for cycles value: -44.70 -34.21
95% mean confidence interval for cycles %-change: -2.87% -2.76%
Cycles are helped.

total spills in shared programs: 8835 -> 8818 (-0.19%)
spills in affected programs: 613 -> 596 (-2.77%)
helped: 5
HURT: 2

total fills in shared programs: 21738 -> 21744 (0.03%)
fills in affected programs: 1348 -> 1354 (0.45%)
helped: 5
HURT: 11

LOST:   0
GAINED: 12

Haswell
total instructions in shared programs: 13447102 -> 13381508 (-0.49%)
instructions in affected programs: 3770735 -> 3705141 (-1.74%)
helped: 11999
HURT: 29
helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3
helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87%
HURT stats (abs)   min: 3 max: 750 x̄: 54.90 x̃: 3
HURT stats (rel)   min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82%
95% mean confidence interval for instructions value: -5.71 -5.19
95% mean confidence interval for instructions %-change: -2.39% -2.30%
Instructions are helped.

total cycles in shared programs: 376342236 -> 375690458 (-0.17%)
cycles in affected programs: 155699021 -> 155047243 (-0.42%)
helped: 8397
HURT: 2876
helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18
helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49%
HURT stats (abs)   min: 1 max: 15414 x̄: 94.15 x̃: 22
HURT stats (rel)   min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41%
95% mean confidence interval for cycles value: -67.64 -48.00
95% mean confidence interval for cycles %-change: -0.99% -0.74%
Cycles are helped.

total spills in shared programs: 23134 -> 23184 (0.22%)
spills in affected programs: 1675 -> 1725 (2.99%)
helped: 13
HURT: 11

total fills in shared programs: 34550 -> 34686 (0.39%)
fills in affected programs: 1421 -> 1557 (9.57%)
helped: 13
HURT: 11

LOST:   0
GAINED: 11

Ivy Bridge
total instructions in shared programs: 12019642 -> 11987285 (-0.27%)
instructions in affected programs: 1532236 -> 1499879 (-2.11%)
helped: 5522
HURT: 110
helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3
helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88%
HURT stats (abs)   min: 1 max: 750 x̄: 18.07 x̃: 3
HURT stats (rel)   min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15%
95% mean confidence interval for instructions value: -6.25 -5.24
95% mean confidence interval for instructions %-change: -2.43% -2.26%
Instructions are helped.

total cycles in shared programs: 180214667 -> 179761900 (-0.25%)
cycles in affected programs: 31448723 -> 30995956 (-1.44%)
helped: 7191
HURT: 2838
helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17
helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40%
HURT stats (abs)   min: 1 max: 15540 x̄: 64.63 x̃: 24
HURT stats (rel)   min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51%
95% mean confidence interval for cycles value: -53.34 -36.95
95% mean confidence interval for cycles %-change: -0.81% -0.53%
Cycles are helped.

total spills in shared programs: 3599 -> 3642 (1.19%)
spills in affected programs: 1180 -> 1223 (3.64%)
helped: 12
HURT: 2

total fills in shared programs: 4031 -> 4162 (3.25%)
fills in affected programs: 876 -> 1007 (14.95%)
helped: 12
HURT: 2

LOST:   6
GAINED: 5

Sandy Bridge
total instructions in shared programs: 10850686 -> 10822890 (-0.26%)
instructions in affected programs: 1247986 -> 1220190 (-2.23%)
helped: 4699
HURT: 102
helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3
helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88%
HURT stats (abs)   min: 1 max: 16 x̄: 4.70 x̃: 3
HURT stats (rel)   min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10%
95% mean confidence interval for instructions value: -6.10 -5.47
95% mean confidence interval for instructions %-change: -2.42% -2.30%
Instructions are helped.

total cycles in shared programs: 154044149 -> 153920095 (-0.08%)
cycles in affected programs: 26037392 -> 25913338 (-0.48%)
helped: 5974
HURT: 2521
helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16
helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84%
HURT stats (abs)   min: 1 max: 862 x̄: 34.73 x̃: 20
HURT stats (rel)   min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85%
95% mean confidence interval for cycles value: -16.31 -12.90
95% mean confidence interval for cycles %-change: -0.56% -0.45%
Cycles are helped.

total spills in shared programs: 2876 -> 2957 (2.82%)
spills in affected programs: 592 -> 673 (13.68%)
helped: 6
HURT: 35

total fills in shared programs: 3157 -> 3134 (-0.73%)
fills in affected programs: 402 -> 379 (-5.72%)
helped: 6
HURT: 0

LOST:   5
GAINED: 11

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Ian Romanick ff9f526de3 nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)
v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

v3: Add a couple more patterns that just move the negation around.

No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

Skylake
total instructions in shared programs: 15279687 -> 15256058 (-0.15%)
instructions in affected programs: 4344440 -> 4320811 (-0.54%)
helped: 23455
HURT: 18
helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 2 x̄: 1.06 x̃: 1
HURT stats (rel)   min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34%
95% mean confidence interval for instructions value: -1.01 -1.00
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 355593755 -> 355339981 (-0.07%)
cycles in affected programs: 162089552 -> 161835778 (-0.16%)
helped: 20467
HURT: 7158
helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6
helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58%
HURT stats (abs)   min: 1 max: 4814 x̄: 47.46 x̃: 11
HURT stats (rel)   min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98%
95% mean confidence interval for cycles value: -10.39 -7.98
95% mean confidence interval for cycles %-change: -0.57% -0.47%
Cycles are helped.

total spills in shared programs: 8843 -> 8835 (-0.09%)
spills in affected programs: 190 -> 182 (-4.21%)
helped: 2
HURT: 0

total fills in shared programs: 21738 -> 21738 (0.00%)
fills in affected programs: 372 -> 372 (0.00%)
helped: 1
HURT: 1

LOST:   12
GAINED: 22

Broadwell
total instructions in shared programs: 15290523 -> 15266818 (-0.16%)
instructions in affected programs: 4314738 -> 4291033 (-0.55%)
helped: 23391
HURT: 11
helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 189 x̄: 18.09 x̃: 1
HURT stats (rel)   min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50%
95% mean confidence interval for instructions value: -1.04 -0.99
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 388911660 -> 388830827 (-0.02%)
cycles in affected programs: 172903324 -> 172822491 (-0.05%)
helped: 15601
HURT: 13269
helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6
helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55%
HURT stats (abs)   min: 1 max: 14904 x̄: 28.21 x̃: 6
HURT stats (rel)   min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60%
95% mean confidence interval for cycles value: -4.20 -1.40
95% mean confidence interval for cycles %-change: -0.17% -0.08%
Cycles are helped.

total spills in shared programs: 23110 -> 23069 (-0.18%)
spills in affected programs: 656 -> 615 (-6.25%)
helped: 3
HURT: 1

total fills in shared programs: 34399 -> 34398 (<.01%)
fills in affected programs: 905 -> 904 (-0.11%)
helped: 3
HURT: 1

LOST:   6
GAINED: 23

Haswell
total instructions in shared programs: 13465303 -> 13441142 (-0.18%)
instructions in affected programs: 3726999 -> 3702838 (-0.65%)
helped: 22139
HURT: 347
helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12%
95% mean confidence interval for instructions value: -1.08 -1.07
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.

total cycles in shared programs: 376271308 -> 376273090 (<.01%)
cycles in affected programs: 167496811 -> 167498593 (<.01%)
helped: 13206
HURT: 13281
helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8
helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80%
HURT stats (abs)   min: 1 max: 3828 x̄: 35.32 x̃: 8
HURT stats (rel)   min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61%
95% mean confidence interval for cycles value: -1.33 1.47
95% mean confidence interval for cycles %-change: 0.22% 0.36%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 23158 -> 23134 (-0.10%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 34580 -> 34550 (-0.09%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   23
GAINED: 13

Ivy Bridge
total instructions in shared programs: 12034154 -> 12014301 (-0.16%)
instructions in affected programs: 3636209 -> 3616356 (-0.55%)
helped: 18771
HURT: 459
helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11%
95% mean confidence interval for instructions value: -1.04 -1.02
95% mean confidence interval for instructions %-change: -0.86% -0.84%
Instructions are helped.

total cycles in shared programs: 180186960 -> 180175147 (<.01%)
cycles in affected programs: 44652745 -> 44640932 (-0.03%)
helped: 12979
HURT: 11033
helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6
helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74%
HURT stats (abs)   min: 1 max: 4811 x̄: 37.61 x̃: 9
HURT stats (rel)   min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69%
95% mean confidence interval for cycles value: -2.29 1.31
95% mean confidence interval for cycles %-change: 0.11% 0.26%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3623 -> 3599 (-0.66%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 4061 -> 4031 (-0.74%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   17
GAINED: 18

Sandy Bridge
total instructions in shared programs: 10853968 -> 10834932 (-0.18%)
instructions in affected programs: 3769957 -> 3750921 (-0.50%)
helped: 17944
HURT: 204
helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93%
95% mean confidence interval for instructions value: -1.05 -1.04
95% mean confidence interval for instructions %-change: -0.81% -0.78%
Instructions are helped.

total cycles in shared programs: 153894864 -> 153885988 (<.01%)
cycles in affected programs: 50643925 -> 50635049 (-0.02%)
helped: 9361
HURT: 10534
helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4
helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22%
HURT stats (abs)   min: 1 max: 1371 x̄: 16.42 x̃: 5
HURT stats (rel)   min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27%
95% mean confidence interval for cycles value: -1.27 0.38
95% mean confidence interval for cycles %-change: -0.03% 0.04%
Inconclusive result (value mean confidence interval includes 0).

LOST:   6
GAINED: 24

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Sagar Ghuge 80117117bd nir: Add optimization to use ROR/ROL instructions
v2: 1) Add more optimization rules for ROL/ROR (Matt Turner)
    2) Add lowering rules for ROL/ROR (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Ian Romanick 8d6b35fffd nir/algebraic: Fail build when too many commutative expressions are used
Search patterns that are expected to have too many (e.g., the giant
bitfield_reverse pattern) can be added to a white list.

This would have saved me a few hours debugging. :(

v2: Implement the expected-failure annotation as a property of the
search-replace pattern instead of as a property of the whole list of
patterns.  Suggested by Connor.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-06-28 18:56:19 -07:00
Eric Anholt 8fd8964302 nir: Fix lowering of bitfield_insert to shifts.
The bfi/bfm behavior change replaced the bfi/bfm usage in
lower_bitfield_insert_to_shifts with actual shifts like the name says,
but it failed to handle the offset=0, bits==32 case in the new
lowering.

v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits,
    31) -> false optimization.

Fixes regressions in dEQP-GLES31.*bitfield_insert* on freedreno.

Fixes: 165b7f3a44 ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-06-28 16:38:23 -07:00
Caio Marcelo de Oliveira Filho 085c0f1f13 nir/algebraic: Add helpers and a rule involving wrapping
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules.  Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-26 14:13:02 -07:00
Jonathan Marek a70ff70158 nir: remove fnot/fxor/fand/for opcodes
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-26 15:26:10 -04:00
Daniel Schürmann a8b0b6e52b nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select.
bitfield_select is defined as:
bitfield_select(mask, base, insert) = (mask & base) | (~mask & insert)
matching the behavior of AMD's BFI instruction.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann 1403c3a7bf nir/algebraic: Use unsigned comparison when lowering bitfield insert/extract
This lets us use the optimization pattern
(('ult', 31, ('iand', b, 31)), False) to remove the
bcsel instruction for code originating in D3D shaders.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann 4eeb49ea71 nir/algebraic: Remove unnecessary iand of [iu]bfe and bfm sources
The [iu]bfe and bfm instructions are defined to only use the five
least significant bits.
This optimizes a common pattern from D3D -> SPIR-V translation.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann 165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Daniel Schürmann a74f256c58 nir/algebraic: add optimization pattern for ('ult', a, ('and', b, a)) and friends.
These optimizations are based on the fact that
'and(a,b) <= umin(a,b)'.
For AMD, this series moves the optimization from LLVM to NIR,
so currently no vkpipeline-db changes here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-06-24 18:42:20 +02:00
Eduardo Lima Mitev fb2169040a nir/opt_algebraic: Fix rules for imadsh_mix16
The rules added in patch 3addd7c are inverted:

It should be:

(al * bh) << 16 + c

instead of:

(ah * bl) << 16 + c

Fixes a number of regressions under
dEQP-GLES31.functional.draw_indirect.compute_interop.large.*
on Freedreno.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-10 22:27:46 +02:00
Eduardo Lima Mitev 3addd7c8d9 nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16
For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.

for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.

A couple of nir_search_helpers are added:

is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.

is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Kenneth Graunke c7d1b52a2c nir: Combine lower_fmod16/32 back into a single lower_fmod.
We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke edd45af9ba nir: Drop lower_fmod64 option.
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64.  The version there also better handles imprecisions
due to lowered frcp@64.  Let's consolidate on one version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Alyssa Rosenzweig d2d3cc66cf nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)
This pattern was noticed in glmark's jellyfish scene.

v2: Add inexact qualifier due to NaN behaviour.

Minimal shader-db changes (slightly helped).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-06-04 19:57:19 +00:00
Jonathan Marek f889180ee1 nir: add lower_bitshift option
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Alyssa Rosenzweig 46494c3dc1 nir/algebraic: Remove problematic "optimization"
This line is no longer relevant now that booleans are 1-bit, and in fact
causes issues (infinite progress loop between algebraic optimizations
and copy prop) with constant vector masks.

No shader-db changes on Intel platforms (Jason).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2019-05-16 02:08:37 +00:00
Ian Romanick 32d259713b nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions
The goal is to avoid having an extra MOV instruction to perform the
saturate.  Doing the subtraction first allows the saturate to be applied
to the ADD instruction making the MOV unnecessary.  Values generated in
different block and values from non-ALU instructions (e.g., texture
instructions) almost always need the extra MOV.

Multiply instructions are restricted because doing this rearrangement
can interfere with the generation of flrp and ffma instructions.

v2: Now that the final method has been selected, squash three commits
into one.

All Intel platforms has similar results. (Ice Lake shown)
total instructions in shared programs: 17223214 -> 17219386 (-0.02%)
instructions in affected programs: 1524376 -> 1520548 (-0.25%)
helped: 2686
HURT: 26
helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37%
HURT stats (abs)   min: 1 max: 2 x̄: 1.69 x̃: 2
HURT stats (rel)   min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35%
95% mean confidence interval for instructions value: -1.46 -1.36
95% mean confidence interval for instructions %-change: -0.56% -0.50%
Instructions are helped.

total cycles in shared programs: 360811571 -> 360791896 (<.01%)
cycles in affected programs: 103650214 -> 103630539 (-0.02%)
helped: 1557
HURT: 675
helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16
helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1513 x̄: 66.44 x̃: 14
HURT stats (rel)   min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49%
95% mean confidence interval for cycles value: -14.82 -2.81
95% mean confidence interval for cycles %-change: -0.50% -0.20%
Cycles are helped.

LOST:   2
GAINED: 0

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:23 -07:00
Ian Romanick a7f0c57673 nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)
v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224337 -> 17224269 (<.01%)
instructions in affected programs: 13578 -> 13510 (-0.50%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.05% -0.63%
Instructions are helped.

total cycles in shared programs: 360826090 -> 360825137 (<.01%)
cycles in affected programs: 94867 -> 93914 (-1.00%)
helped: 58
HURT: 1
helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18
helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22%
HURT stats (abs)   min: 76 max: 76 x̄: 76.00 x̃: 76
HURT stats (rel)   min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86%
95% mean confidence interval for cycles value: -19.53 -12.78
95% mean confidence interval for cycles %-change: -1.56% -1.08%
Cycles are helped.

No changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:23 -07:00
Ian Romanick 281f20e26d nir/algebraic: Strip double negatives from comparison sources
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224623 -> 17224337 (<.01%)
instructions in affected programs: 32648 -> 32362 (-0.88%)
helped: 148
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2
helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08%
95% mean confidence interval for instructions value: -1.97 -1.89
95% mean confidence interval for instructions %-change: -1.15% -1.00%
Instructions are helped.

total cycles in shared programs: 360828714 -> 360826090 (<.01%)
cycles in affected programs: 347416 -> 344792 (-0.76%)
helped: 148
HURT: 26
helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18
helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41%
HURT stats (abs)   min: 2 max: 337 x̄: 48.96 x̃: 6
HURT stats (rel)   min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27%
95% mean confidence interval for cycles value: -23.78 -6.38
95% mean confidence interval for cycles %-change: -1.59% -0.79%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick 45c7ff95fc intel/compiler: Repeat nir_opt_algebraic_late
A tiny bit of help seems to come from nir_copy_prop.  Future patches
will benefit from this change.

Doing more copy propagation on the vec4 backend led to a disaster in
hurt cycles.

v2: Fix typo in comment.  Noticed by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224634 -> 17224623 (<.01%)
instructions in affected programs: 4586 -> 4575 (-0.24%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.19%
Instructions are helped.

total cycles in shared programs: 360828542 -> 360828714 (<.01%)
cycles in affected programs: 151159 -> 151331 (0.11%)
helped: 49
HURT: 28
helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6
helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42%
HURT stats (abs)   min: 1 max: 196 x̄: 52.36 x̃: 15
HURT stats (rel)   min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88%
95% mean confidence interval for cycles value: -13.48 17.95
95% mean confidence interval for cycles %-change: -0.69% 0.84%
Inconclusive result (value mean confidence interval includes 0).

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13529544 -> 13529542 (<.01%)
instructions in affected programs: 358 -> 356 (-0.56%)
helped: 2
HURT: 0

total cycles in shared programs: 357290311 -> 357289678 (<.01%)
cycles in affected programs: 178324 -> 177691 (-0.35%)
helped: 48
HURT: 40
helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13
helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66%
HURT stats (abs)   min: 1 max: 224 x̄: 22.00 x̃: 6
HURT stats (rel)   min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31%
95% mean confidence interval for cycles value: -18.28 3.89
95% mean confidence interval for cycles %-change: -1.01% 0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8159110 -> 8158980 (<.01%)
instructions in affected programs: 22719 -> 22589 (-0.57%)
helped: 65
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74%
95% mean confidence interval for instructions value: -2.06 -1.94
95% mean confidence interval for instructions %-change: -0.78% -0.68%
Instructions are helped.

total cycles in shared programs: 188609448 -> 188609214 (<.01%)
cycles in affected programs: 1875852 -> 1875618 (-0.01%)
helped: 109
HURT: 104
helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4
helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07%
HURT stats (abs)   min: 2 max: 20 x̄: 3.31 x̃: 2
HURT stats (rel)   min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for cycles value: -1.95 -0.25
95% mean confidence interval for cycles %-change: -0.04% -0.01%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick d2a9ba03e3 Revert "nir: add late opt to turn inot/b2f combos back to bcsel"
This reverts commit 7acc865226.

With these optimizations in place, the extra constant folding added in
the next commit extends some live ranges of 0.0 and ±1.0 constants, and
that causes several hundred shaders to have more spills and fills.

I believe this optimization we made basically irrelevant by 7725d60938
"intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))".

All Gen7.5+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225303 -> 17224634 (<.01%)
instructions in affected programs: 879402 -> 878733 (-0.08%)
helped: 679
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45%
95% mean confidence interval for instructions value: -1.02 -0.95
95% mean confidence interval for instructions %-change: -0.26% -0.22%
Instructions are helped.

total cycles in shared programs: 360842595 -> 360828542 (<.01%)
cycles in affected programs: 110443594 -> 110429541 (-0.01%)
helped: 389
HURT: 265
helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28
helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11%
HURT stats (abs)   min: 1 max: 7614 x̄: 185.96 x̃: 48
HURT stats (rel)   min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10%
95% mean confidence interval for cycles value: -75.65 32.67
95% mean confidence interval for cycles %-change: -0.49% -0.06%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12159 -> 12161 (0.02%)
spills in affected programs: 13 -> 15 (15.38%)
helped: 0
HURT: 1

total fills in shared programs: 25207 -> 25208 (<.01%)
fills in affected programs: 25 -> 26 (4.00%)
helped: 0
HURT: 1

Ivy Bridge
total instructions in shared programs: 12082019 -> 12082013 (<.01%)
instructions in affected programs: 1033 -> 1027 (-0.58%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.78% -0.45%
Instructions are helped.

total cycles in shared programs: 179849270 -> 179849157 (<.01%)
cycles in affected programs: 4735 -> 4622 (-2.39%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18
helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36%
95% mean confidence interval for cycles value: -82.73 26.23
95% mean confidence interval for cycles %-change: -7.98% 2.28%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10882750 -> 10882748 (<.01%)
instructions in affected programs: 266 -> 264 (-0.75%)
helped: 2
HURT: 0

Iron Lake
total cycles in shared programs: 188609440 -> 188609448 (<.01%)
cycles in affected programs: 4320 -> 4328 (0.19%)
helped: 0
HURT: 2

GM45
total cycles in shared programs: 129016868 -> 129016872 (<.01%)
cycles in affected programs: 2302 -> 2306 (0.17%)
helped: 0
HURT: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick 3cb091f8b4 nir/algebraic: Eliminate a tautological compare
The value-range tracking pass that is coming is not clever enough to
know that the result of the ffma must be non-negative.  Making it that
smart will require quite a bit of work.  It might be possible to add a
special case that detects that a whole tree of fadd(fmul(fsat(a),
fneg(fsat(a))), 1.0) cannot be negative.

For cases when the comparison is used in the domain guard for a
square-root (see nir/algebraic: Simplify fsqrt domain guard), the
compare may be converted to a fmax.  This patch also handles that case.

All of the affected cases are in DiRT: Showdown.

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17225365 -> 17225303 (<.01%)
instructions in affected programs: 40051 -> 39989 (-0.15%)
helped: 62
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.31% -0.22%
Instructions are helped.

total cycles in shared programs: 360842788 -> 360842595 (<.01%)
cycles in affected programs: 1818081 -> 1817888 (-0.01%)
helped: 29
HURT: 22
helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14
helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42%
HURT stats (abs)   min: 1 max: 108 x̄: 18.45 x̃: 7
HURT stats (rel)   min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19%
95% mean confidence interval for cycles value: -14.48 6.91
95% mean confidence interval for cycles %-change: -0.71% 0.21%
Inconclusive result (value mean confidence interval includes 0).

No changes on any other Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick 9725e45b3d nir/algebraic: Simplify fsqrt domain guard
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17228376 -> 17225365 (-0.02%)
instructions in affected programs: 280732 -> 277721 (-1.07%)
helped: 1072
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2
helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07%
95% mean confidence interval for instructions value: -2.92 -2.70
95% mean confidence interval for instructions %-change: -1.48% -1.37%
Instructions are helped.

total cycles in shared programs: 360935690 -> 360842788 (-0.03%)
cycles in affected programs: 7838017 -> 7745115 (-1.19%)
helped: 1569
HURT: 69
helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20
helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12%
HURT stats (abs)   min: 1 max: 2820 x̄: 98.22 x̃: 47
HURT stats (rel)   min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31%
95% mean confidence interval for cycles value: -63.55 -49.89
95% mean confidence interval for cycles %-change: -3.33% -2.96%
Cycles are helped.

No changes on any other platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick 5116646a76 nir/algebraic: Recognize open-coded fsat with modifiers
This change also enables a later change (nir/algebraic: Replace
1-fsat(a) with fsat(1-a)) to affect more shaders.

Almost all of the affected shaders are in Bioshock Infinite, and all of
those shaders all require GLSL 4.10.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17228584 -> 17228376 (<.01%)
instructions in affected programs: 31438 -> 31230 (-0.66%)
helped: 105
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1
helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70%
95% mean confidence interval for instructions value: -2.20 -1.76
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.

total cycles in shared programs: 360936431 -> 360935690 (<.01%)
cycles in affected programs: 420100 -> 419359 (-0.18%)
helped: 71
HURT: 21
helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10
helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48%
HURT stats (abs)   min: 1 max: 198 x̄: 29.90 x̃: 10
HURT stats (rel)   min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90%
95% mean confidence interval for cycles value: -16.77 0.66
95% mean confidence interval for cycles %-change: -0.85% -0.06%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick c769641c8e nir/algebraic: Push unary operations into source operands of fsat source
Pushing a unary operation, like fneg, into the operation that generates
its operand allows the fsat to be applied to the inner instruction
instead of on a separate instruction that performs the unary operation.
This changes

    fmul	ssa_100, ssa_99, ssa_98
    fmov.sat	ssa_101, -ssa_100

into

    fmul.sat	ssa_100, -ssa_99, ssa_98

Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown)
total instructions in shared programs: 17228658 -> 17228584 (<.01%)
instructions in affected programs: 3163 -> 3089 (-2.34%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2
helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51%
95% mean confidence interval for instructions value: -1.66 -1.37
95% mean confidence interval for instructions %-change: -4.37% -3.00%
Instructions are helped.

total cycles in shared programs: 360937144 -> 360936431 (<.01%)
cycles in affected programs: 24029 -> 23316 (-2.97%)
helped: 47
HURT: 2
helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16
helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50%
95% mean confidence interval for cycles value: -16.05 -13.05
95% mean confidence interval for cycles %-change: -4.07% -3.15%
Cycles are helped.

All Gen7 and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 13536059 -> 13535884 (<.01%)
instructions in affected programs: 8797 -> 8622 (-1.99%)
helped: 150
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96%
95% mean confidence interval for instructions value: -1.23 -1.11
95% mean confidence interval for instructions %-change: -3.97% -3.05%
Instructions are helped.

total cycles in shared programs: 357696119 -> 357694193 (<.01%)
cycles in affected programs: 50216 -> 48290 (-3.84%)
helped: 109
HURT: 14
helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16
helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37%
HURT stats (abs)   min: 2 max: 26 x̄: 10.14 x̃: 5
HURT stats (rel)   min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92%
95% mean confidence interval for cycles value: -19.27 -12.05
95% mean confidence interval for cycles %-change: -7.34% -5.31%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick 3b74790941 nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))
All Gen6+ GPUs had similar results. (Skylake shown)
total instructions in shared programs: 15336712 -> 15336622 (<.01%)
instructions in affected programs: 3952 -> 3862 (-2.28%)
helped: 24
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4
helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46%
95% mean confidence interval for instructions value: -4.06 -3.44
95% mean confidence interval for instructions %-change: -2.47% -2.22%
Instructions are helped.

total cycles in shared programs: 355722052 -> 355721235 (<.01%)
cycles in affected programs: 27326 -> 26509 (-2.99%)
helped: 20
HURT: 4
helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14
helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23%
HURT stats (abs)   min: 2 max: 64 x̄: 19.50 x̃: 6
HURT stats (rel)   min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55%
95% mean confidence interval for cycles value: -61.61 -6.47
95% mean confidence interval for cycles %-change: -5.59% -0.39%
Cycles are helped.

No changes on Ice Lake, Iron Lake, or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:21 -07:00
Ian Romanick a7724b1cbb nir/algebraic: Add missing ffma(-1, a, b) pattern
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17229439 -> 17229377 (<.01%)
instructions in affected programs: 9859 -> 9797 (-0.63%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1
helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67%
95% mean confidence interval for instructions value: -1.88 -1.14
95% mean confidence interval for instructions %-change: -2.48% -0.81%
Instructions are helped.

total cycles in shared programs: 360944145 -> 360942989 (<.01%)
cycles in affected programs: 178167 -> 177011 (-0.65%)
helped: 36
HURT: 19
helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5
helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45%
HURT stats (abs)   min: 1 max: 34 x̄: 11.21 x̃: 6
HURT stats (rel)   min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50%
95% mean confidence interval for cycles value: -36.01 -6.02
95% mean confidence interval for cycles %-change: -4.18% -0.57%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:03 -07:00
Ian Romanick 7b4ff6a1af nir: Mark ffma as 2src_commutative
This doesn't make any real difference now, but future work (not in this
series) will add a LOT of ffma patterns.  Having to duplicate all of
them for ffma(a, b, c) and ffma(b, a, c) is just terrible.

No shader-db changes on any Intel platform.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 11:25:02 -07:00
Ian Romanick ab86926156 nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick dc566a033c nir/algebraic: Pull common multiplication out of flrp arguments
All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick a83a6e9690 nir/algebraic: Pull common addition out of flrp arguments
v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Jason Ekstrand 00d4e78ea9 nir/algebraic: Optimize integer cast-of-cast
These have been popping up more and more with the OpenCL work and other
bits causing extra conversions to/from 64-bit.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-04-26 04:26:08 -05:00
Erico Nunes 4577eb7b7c nir/algebraic: add lowering for fsign
The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-19 15:42:23 +00:00
Ian Romanick 6b97fa9a99 nir/algebraic: Strength reduce some compares of x and -x
Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick f3d6df719c nir/algebraic: Fix some 1-bit Boolean weirdness
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick 403aac7500 nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel
All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Ian Romanick 25bfba3335 nir/algebraic: Recognize open-coded copysign(1.0, a)
All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00
Christian Gmeiner b6bed115a5 nir: add lower_ftrunc
Port TGSI TRUNC lowering to nir

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-13 17:54:48 +00:00
Caio Marcelo de Oliveira Filho d08a74d2bf nir/algebraic: Lower CS derivatives to zero when no group defined
In compute shaders if no derivative group is defined, the derivatives
will always be zero.  Specified in NV_compute_shader_derivatives.

To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules.  (Jason)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-08 19:29:32 -07:00
Jason Ekstrand ad8c145658 nir/algebraic: Add some logical OR and AND patterns
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang.  Not sure about the other two new patterns but we
may as well throw them in for completeness.  While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15321227 -> 15321129 (<.01%)
    instructions in affected programs: 3594 -> 3496 (-2.73%)
    helped: 6
    HURT: 0

    total cycles in shared programs: 357481321 -> 357479725 (<.01%)
    cycles in affected programs: 44109 -> 42513 (-3.62%)
    helped: 6
    HURT: 0

VkPipeline-DB results on Kaby Lake:

    total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
    instructions in affected programs: 19058 -> 18288 (-4.04%)
    helped: 163
    HURT: 0

    total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
    cycles in affected programs: 750958 -> 736984 (-1.86%)
    helped: 158
    HURT: 1

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:06 -05:00
Jason Ekstrand 03a72d96d8 nir/algebraic: Drop some @bool specifiers
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time.  We can drop these specifiers and now the optimizations
will apply more generally.

Shader-DB results on Kaby Lake:

    total instructions in shared programs: 15321168 -> 15321227 (<.01%)
    instructions in affected programs: 8836 -> 8895 (0.67%)
    helped: 1
    HURT: 31

    total cycles in shared programs: 357481781 -> 357481321 (<.01%)
    cycles in affected programs: 146524 -> 146064 (-0.31%)
    helped: 22
    HURT: 10

    total spills in shared programs: 23675 -> 23673 (<.01%)
    spills in affected programs: 11 -> 9 (-18.18%)
    helped: 1
    HURT: 0

    total fills in shared programs: 32040 -> 32036 (-0.01%)
    fills in affected programs: 27 -> 23 (-14.81%)
    helped: 1
    HURT: 0

No change in VkPipeline-DB

Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb.  In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL.  Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win?  Also it helps spilling in one Car Chase compute shader.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-05 18:39:00 -05:00
Ian Romanick ae21b52e1d nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick cbad201c2b nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Ian Romanick bc17f5a2a3 nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-28 15:35:52 -07:00
Ian Romanick c152672e68 nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%

total cycles in shared programs: 372286583 -> 372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%

No changes on any other Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8.  This broke ~180 tests.  This
bug was introduced in v2.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
2019-03-28 15:35:52 -07:00
Iago Toral Quiroga 763c8aabed compiler/nir: add lowering for 16-bit ldexp
v2 (Topi):
 - Make bit-size handling order be 16-bit, 32-bit, 64-bit
 - Clamp lower exponent range at -28 instead of -30.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga 3766334923 compiler/nir: add lowering for 16-bit flrp
And enable it on Intel.

v2:
 - Squash the change to enable it on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Iago Toral Quiroga ca31df6f1f compiler/nir: add lowering option for 16-bit fmod
And enable it on Intel.

v2:
 - Squash the change to enable this lowering on Intel (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-25 16:08:25 +01:00
Jason Ekstrand 2b76de9b5d nir/algebraic: Add a couple optimizations for iabs and ishr
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15225213 -> 15222365 (-0.02%)
    instructions in affected programs: 43524 -> 40676 (-6.54%)
    helped: 203
    HURT: 0

Lots of shaders in Shadow Warrior had this pattern along with Deus Ex,
Civ, Shadow of Mordor, and several others.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-15 01:02:19 +00:00
Kenneth Graunke da51e3f1b0 Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers)
This broke piles of image load store tests (179 failures on CI,
mesa_master build #15546, previous build right before this landed
was green).  I'd rather not leave the tree on fire over the weekend,
so let's revert for now, and we can figure out what happened next week.
2019-03-09 01:42:16 -08:00
Ian Romanick 18e4bf65de nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00
Ian Romanick 55c1ac4b75 nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.

v2: Use a loop to generate patterns.  Suggested by Jason.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-08 22:24:19 -08:00