mesa/src/compiler/nir/nir_opt_algebraic.py

1836 lines
94 KiB
Python
Raw Normal View History

#
# Copyright (C) 2014 Intel Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
#
# Authors:
# Jason Ekstrand (jason@jlekstrand.net)
from __future__ import print_function
from collections import OrderedDict
import nir_algebraic
from nir_opcodes import type_sizes
nir: Use De Morgan's Law on logic compounded comparisons The replacement of the comparison operators must happen during this step. If it does not, the next pass of nir_opt_algebraic will reapply De Morgan's Law in the "opposite direction" before performing dead code elimination. The resulting infinite loop will eventually get OOM killed. Haswell, Broadwell, and Skylake had similar results. (Broadwell shown) total instructions in shared programs: 14808185 -> 14808036 (<.01%) instructions in affected programs: 13758 -> 13609 (-1.08%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 3.82 x̃: 3 helped stats (rel) min: 0.44% max: 1.55% x̄: 0.98% x̃: 1.01% 95% mean confidence interval for instructions value: -4.67 -2.97 95% mean confidence interval for instructions %-change: -1.09% -0.88% Instructions are helped. total cycles in shared programs: 559438333 -> 559435832 (<.01%) cycles in affected programs: 199160 -> 196659 (-1.26%) helped: 42 HURT: 3 helped stats (abs) min: 2 max: 184 x̄: 61.50 x̃: 51 helped stats (rel) min: 0.02% max: 6.94% x̄: 1.41% x̃: 1.40% HURT stats (abs) min: 2 max: 40 x̄: 27.33 x̃: 40 HURT stats (rel) min: 0.05% max: 0.74% x̄: 0.51% x̃: 0.74% 95% mean confidence interval for cycles value: -71.47 -39.69 95% mean confidence interval for cycles %-change: -1.64% -0.93% Cycles are helped. Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11811776 -> 11811553 (<.01%) instructions in affected programs: 15201 -> 14978 (-1.47%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 5.72 x̃: 6 helped stats (rel) min: 0.44% max: 2.53% x̄: 1.30% x̃: 1.26% 95% mean confidence interval for instructions value: -7.21 -4.23 95% mean confidence interval for instructions %-change: -1.48% -1.12% Instructions are helped. total cycles in shared programs: 257617270 -> 257614589 (<.01%) cycles in affected programs: 212107 -> 209426 (-1.26%) helped: 45 HURT: 0 helped stats (abs) min: 2 max: 180 x̄: 59.58 x̃: 54 helped stats (rel) min: 0.02% max: 6.02% x̄: 1.30% x̃: 1.32% 95% mean confidence interval for cycles value: -74.02 -45.14 95% mean confidence interval for cycles %-change: -1.59% -1.01% Cycles are helped. Iron Lake total instructions in shared programs: 7886648 -> 7886515 (<.01%) instructions in affected programs: 14106 -> 13973 (-0.94%) helped: 29 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.59 x̃: 4 helped stats (rel) min: 0.35% max: 1.83% x̄: 0.90% x̃: 0.81% 95% mean confidence interval for instructions value: -5.65 -3.52 95% mean confidence interval for instructions %-change: -1.03% -0.76% Instructions are helped. total cycles in shared programs: 178100812 -> 178100396 (<.01%) cycles in affected programs: 67970 -> 67554 (-0.61%) helped: 29 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 14.34 x̃: 12 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.58% x̃: 0.54% 95% mean confidence interval for cycles value: -18.30 -10.39 95% mean confidence interval for cycles %-change: -0.71% -0.45% Cycles are helped. GM45 total instructions in shared programs: 4857939 -> 4857872 (<.01%) instructions in affected programs: 7426 -> 7359 (-0.90%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.47 x̃: 4 helped stats (rel) min: 0.33% max: 1.80% x̄: 0.87% x̃: 0.77% 95% mean confidence interval for instructions value: -6.06 -2.87 95% mean confidence interval for instructions %-change: -1.06% -0.67% Instructions are helped. total cycles in shared programs: 122167930 -> 122167654 (<.01%) cycles in affected programs: 43118 -> 42842 (-0.64%) helped: 15 HURT: 0 helped stats (abs) min: 4 max: 40 x̄: 18.40 x̃: 16 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.62% x̃: 0.54% 95% mean confidence interval for cycles value: -25.03 -11.77 95% mean confidence interval for cycles %-change: -0.82% -0.41% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-07 01:27:53 +00:00
import itertools
nir/algebraic: Simplify some Inf and NaN avoidance code Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.
2019-11-02 02:53:06 +00:00
import struct
from math import pi
# Convenience variables
a = 'a'
b = 'b'
c = 'c'
d = 'd'
e = 'e'
# Written in the form (<search>, <replace>) where <search> is an expression
# and <replace> is either an expression or a value. An expression is
# defined as a tuple of the form ([~]<op>, <src0>, <src1>, <src2>, <src3>)
# where each source is either an expression or a value. A value can be
# either a numeric constant or a string representing a variable name.
#
# If the opcode in a search expression is prefixed by a '~' character, this
# indicates that the operation is inexact. Such operations will only get
# applied to SSA values that do not have the exact bit set. This should be
# used by by any optimizations that are not bit-for-bit exact. It should not,
# however, be used for backend-requested lowering operations as those need to
# happen regardless of precision.
#
# Variable names are specified as "[#]name[@type][(cond)][.swiz]" where:
# "#" indicates that the given variable will only match constants,
# type indicates that the given variable will only match values from ALU
# instructions with the given output type,
# (cond) specifies an additional condition function (see nir_search_helpers.h),
# swiz is a swizzle applied to the variable (only in the <replace> expression)
#
# For constants, you have to be careful to make sure that it is the right
# type because python is unaware of the source and destination types of the
# opcodes.
#
# All expression types can have a bit-size specified. For opcodes, this
# looks like "op@32", for variables it is "a@32" or "a@uint32" to specify a
# type and size. In the search half of the expression this indicates that it
# should only match that particular bit-size. In the replace half of the
# expression this indicates that the constructed value should have that
# bit-size.
#
# If the opcode in a replacement expression is prefixed by a '!' character,
# this indicated that the new expression will be marked exact.
#
# A special condition "many-comm-expr" can be used with expressions to note
# that the expression and its subexpressions have more commutative expressions
# than nir_replace_instr can handle. If this special condition is needed with
# another condition, the two can be separated by a comma (e.g.,
# "(many-comm-expr,is_used_once)").
# based on https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648
def lowered_sincos(c):
x = ('fsub', ('fmul', 2.0, ('ffract', ('fadd', ('fmul', 0.5 / pi, a), c))), 1.0)
x = ('fmul', ('fsub', x, ('fmul', x, ('fabs', x))), 4.0)
return ('ffma', ('ffma', x, ('fabs', x), ('fneg', x)), 0.225, x)
nir/algebraic: Simplify some Inf and NaN avoidance code Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.
2019-11-02 02:53:06 +00:00
def intBitsToFloat(i):
return struct.unpack('!f', struct.pack('!I', i))[0]
optimizations = [
(('imul', a, '#b@32(is_pos_power_of_two)'), ('ishl', a, ('find_lsb', b)), '!options->lower_bitops'),
(('imul', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('ishl', a, ('find_lsb', ('iabs', b)))), '!options->lower_bitops'),
(('ishl', a, '#b@32'), ('imul', a, ('ishl', 1, b)), 'options->lower_bitops'),
(('unpack_64_2x32_split_x', ('imul_2x32_64(is_used_once)', a, b)), ('imul', a, b)),
(('unpack_64_2x32_split_x', ('umul_2x32_64(is_used_once)', a, b)), ('imul', a, b)),
(('imul_2x32_64', a, b), ('pack_64_2x32_split', ('imul', a, b), ('imul_high', a, b)), 'options->lower_mul_2x32_64'),
(('umul_2x32_64', a, b), ('pack_64_2x32_split', ('imul', a, b), ('umul_high', a, b)), 'options->lower_mul_2x32_64'),
(('udiv', a, 1), a),
(('idiv', a, 1), a),
(('umod', a, 1), 0),
(('imod', a, 1), 0),
(('udiv', a, '#b@32(is_pos_power_of_two)'), ('ushr', a, ('find_lsb', b)), '!options->lower_bitops'),
(('idiv', a, '#b@32(is_pos_power_of_two)'), ('imul', ('isign', a), ('ushr', ('iabs', a), ('find_lsb', b))), 'options->lower_idiv'),
(('idiv', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('imul', ('isign', a), ('ushr', ('iabs', a), ('find_lsb', ('iabs', b))))), 'options->lower_idiv'),
(('umod', a, '#b(is_pos_power_of_two)'), ('iand', a, ('isub', b, 1))),
(('~fneg', ('fneg', a)), a),
(('ineg', ('ineg', a)), a),
(('fabs', ('fabs', a)), ('fabs', a)),
(('fabs', ('fneg', a)), ('fabs', a)),
(('fabs', ('u2f', a)), ('u2f', a)),
(('iabs', ('iabs', a)), ('iabs', a)),
(('iabs', ('ineg', a)), ('iabs', a)),
(('f2b', ('fneg', a)), ('f2b', a)),
(('i2b', ('ineg', a)), ('i2b', a)),
(('~fadd', a, 0.0), a),
(('iadd', a, 0), a),
(('usadd_4x8', a, 0), a),
(('usadd_4x8', a, ~0), ~0),
(('~fadd', ('fmul', a, b), ('fmul', a, c)), ('fmul', a, ('fadd', b, c))),
(('iadd', ('imul', a, b), ('imul', a, c)), ('imul', a, ('iadd', b, c))),
(('~fadd', ('fneg', a), a), 0.0),
(('iadd', ('ineg', a), a), 0),
(('iadd', ('ineg', a), ('iadd', a, b)), b),
(('iadd', a, ('iadd', ('ineg', a), b)), b),
(('~fadd', ('fneg', a), ('fadd', a, b)), b),
(('~fadd', a, ('fadd', ('fneg', a), b)), b),
(('fadd', ('fsat', a), ('fsat', ('fneg', a))), ('fsat', ('fabs', a))),
(('~fmul', a, 0.0), 0.0),
(('imul', a, 0), 0),
(('umul_unorm_4x8', a, 0), 0),
(('umul_unorm_4x8', a, ~0), a),
(('~fmul', a, 1.0), a),
(('imul', a, 1), a),
(('fmul', a, -1.0), ('fneg', a)),
(('imul', a, -1), ('ineg', a)),
# If a < 0: fsign(a)*a*a => -1*a*a => -a*a => abs(a)*a
# If a > 0: fsign(a)*a*a => 1*a*a => a*a => abs(a)*a
# If a == 0: fsign(a)*a*a => 0*0*0 => abs(0)*0
(('fmul', ('fsign', a), ('fmul', a, a)), ('fmul', ('fabs', a), a)),
(('fmul', ('fmul', ('fsign', a), a), a), ('fmul', ('fabs', a), a)),
(('~ffma', 0.0, a, b), b),
(('~ffma', a, b, 0.0), ('fmul', a, b)),
(('ffma', 1.0, a, b), ('fadd', a, b)),
(('ffma', -1.0, a, b), ('fadd', ('fneg', a), b)),
(('~flrp', a, b, 0.0), a),
(('~flrp', a, b, 1.0), b),
(('~flrp', a, a, b), a),
(('~flrp', 0.0, a, b), ('fmul', a, b)),
nir/algebraic: Pull common addition out of flrp arguments v2: Augment the late optimization patterns with a couple pre-ffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 -> 15342485 (<.01%) instructions in affected programs: 56304 -> 55807 (-0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: -2.31 -1.92 95% mean confidence interval for instructions %-change: -1.46% -1.09% Instructions are helped. total cycles in shared programs: 355734740 -> 355734320 (<.01%) cycles in affected programs: 1028807 -> 1028387 (-0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: -8.51 4.98 95% mean confidence interval for cycles %-change: -0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 -> 10886390 (<.01%) instructions in affected programs: 36883 -> 36458 (-1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: -3.12 -2.67 95% mean confidence interval for instructions %-change: -1.83% -1.38% Instructions are helped. total cycles in shared programs: 154188360 -> 154186902 (<.01%) cycles in affected programs: 388094 -> 386636 (-0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: -22.62 2.92 95% mean confidence interval for cycles %-change: -0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 -> 8220357 (-0.01%) instructions in affected programs: 54560 -> 53678 (-1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: -5.21 -4.28 95% mean confidence interval for instructions %-change: -2.23% -1.72% Instructions are helped. total cycles in shared programs: 188654442 -> 188650364 (<.01%) cycles in affected programs: 1454384 -> 1450306 (-0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: -22.38 -17.60 95% mean confidence interval for cycles %-change: -0.67% -0.46% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-23 03:15:15 +01:00
# flrp(a, a + b, c) => a + flrp(0, b, c) => a + (b * c)
(('~flrp', a, ('fadd(is_used_once)', a, b), c), ('fadd', ('fmul', b, c), a)),
(('~flrp@32', a, ('fadd', a, b), c), ('fadd', ('fmul', b, c), a), 'options->lower_flrp32'),
(('~flrp@64', a, ('fadd', a, b), c), ('fadd', ('fmul', b, c), a), 'options->lower_flrp64'),
(('~flrp@32', ('fadd', a, b), ('fadd', a, c), d), ('fadd', ('flrp', b, c, d), a), 'options->lower_flrp32'),
(('~flrp@64', ('fadd', a, b), ('fadd', a, c), d), ('fadd', ('flrp', b, c, d), a), 'options->lower_flrp64'),
(('~flrp@32', a, ('fmul(is_used_once)', a, b), c), ('fmul', ('flrp', 1.0, b, c), a), 'options->lower_flrp32'),
(('~flrp@64', a, ('fmul(is_used_once)', a, b), c), ('fmul', ('flrp', 1.0, b, c), a), 'options->lower_flrp64'),
(('~flrp', ('fmul(is_used_once)', a, b), ('fmul(is_used_once)', a, c), d), ('fmul', ('flrp', b, c, d), a)),
(('~flrp', a, b, ('b2f', 'c@1')), ('bcsel', c, b, a), 'options->lower_flrp32'),
(('~flrp', a, 0.0, c), ('fadd', ('fmul', ('fneg', a), c), a)),
(('ftrunc', a), ('bcsel', ('flt', a, 0.0), ('fneg', ('ffloor', ('fabs', a))), ('ffloor', ('fabs', a))), 'options->lower_ftrunc'),
(('ffloor', a), ('fsub', a, ('ffract', a)), 'options->lower_ffloor'),
nir/algebraic: Replace a-fract(a) with floor(a) I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-02-23 00:47:06 +00:00
(('fadd', a, ('fneg', ('ffract', a))), ('ffloor', a), '!options->lower_ffloor'),
(('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
(('fceil', a), ('fneg', ('ffloor', ('fneg', a))), 'options->lower_fceil'),
(('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', 'c@1')))), ('fmul', b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
(('~fadd@32', ('fmul', a, ('fadd', 1.0, ('fneg', c ) )), ('fmul', b, c )), ('flrp', a, b, c), '!options->lower_flrp32'),
(('~fadd@64', ('fmul', a, ('fadd', 1.0, ('fneg', c ) )), ('fmul', b, c )), ('flrp', a, b, c), '!options->lower_flrp64'),
# These are the same as the previous three rules, but it depends on
nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-03-28 06:57:07 +01:00
# 1-fsat(x) <=> fsat(1-x). See below.
(('~fadd@32', ('fmul', a, ('fsat', ('fadd', 1.0, ('fneg', c )))), ('fmul', b, ('fsat', c))), ('flrp', a, b, ('fsat', c)), '!options->lower_flrp32'),
(('~fadd@64', ('fmul', a, ('fsat', ('fadd', 1.0, ('fneg', c )))), ('fmul', b, ('fsat', c))), ('flrp', a, b, ('fsat', c)), '!options->lower_flrp64'),
(('~fadd', a, ('fmul', ('b2f', 'c@1'), ('fadd', b, ('fneg', a)))), ('bcsel', c, b, a), 'options->lower_flrp32'),
(('~fadd@32', a, ('fmul', c , ('fadd', b, ('fneg', a)))), ('flrp', a, b, c), '!options->lower_flrp32'),
(('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a)))), ('flrp', a, b, c), '!options->lower_flrp64'),
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
nir/algebraic: Optimize common array indexing sequence Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 01:47:38 +01:00
(('~fmul', ('fadd', ('iand', ('ineg', ('b2i32', 'a@bool')), ('fmul', b, c)), '#d'), '#e'),
('bcsel', a, ('fmul', ('fadd', ('fmul', b, c), d), e), ('fmul', d, e))),
(('fdph', a, b), ('fdot4', ('vec4', 'a.x', 'a.y', 'a.z', 1.0), b), 'options->lower_fdph'),
(('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d), '!options->lower_fdph'),
nir: Narrow some dot product operations On vector platforms, this helps elide some constant loads. v2: Reorder the transformations. No changes on Broadwell or Skylake. Haswell total instructions in shared programs: 13093793 -> 13060163 (-0.26%) instructions in affected programs: 1277532 -> 1243902 (-2.63%) helped: 13216 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.57 -2.49 95% mean confidence interval for instructions %-change: -3.65% -3.54% Instructions are helped. total cycles in shared programs: 409580819 -> 409268463 (-0.08%) cycles in affected programs: 71730652 -> 71418296 (-0.44%) helped: 9898 HURT: 2352 helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16 helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50% HURT stats (abs) min: 2 max: 276 x̄: 23.25 x̃: 6 HURT stats (rel) min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97% 95% mean confidence interval for cycles value: -33.19 -17.80 95% mean confidence interval for cycles %-change: -4.50% -4.26% Cycles are helped. total fills in shared programs: 82059 -> 82052 (<.01%) fills in affected programs: 21 -> 14 (-33.33%) helped: 7 HURT: 0 Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown) total instructions in shared programs: 11811851 -> 11780605 (-0.26%) instructions in affected programs: 1155007 -> 1123761 (-2.71%) helped: 12304 HURT: 95 helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2 helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86% HURT stats (abs) min: 1 max: 6 x̄: 1.77 x̃: 1 HURT stats (rel) min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19% 95% mean confidence interval for instructions value: -2.56 -2.48 95% mean confidence interval for instructions %-change: -3.71% -3.59% Instructions are helped. total cycles in shared programs: 257618409 -> 257316805 (-0.12%) cycles in affected programs: 71999580 -> 71697976 (-0.42%) helped: 9155 HURT: 2380 helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16 helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62% HURT stats (abs) min: 2 max: 290 x̄: 21.14 x̃: 4 HURT stats (rel) min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33% 95% mean confidence interval for cycles value: -34.32 -17.97 95% mean confidence interval for cycles %-change: -4.55% -4.29% Cycles are helped. GM45 and Iron Lake had nearly identical results (Iron Lake shown) total instructions in shared programs: 7886750 -> 7879944 (-0.09%) instructions in affected programs: 373781 -> 366975 (-1.82%) helped: 3715 HURT: 47 helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1 helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06% HURT stats (abs) min: 1 max: 6 x̄: 2.55 x̃: 2 HURT stats (rel) min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35% 95% mean confidence interval for instructions value: -1.85 -1.77 95% mean confidence interval for instructions %-change: -2.91% -2.73% Instructions are helped. total cycles in shared programs: 178114636 -> 178095452 (-0.01%) cycles in affected programs: 7227666 -> 7208482 (-0.27%) helped: 3349 HURT: 301 helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4 helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63% HURT stats (abs) min: 2 max: 42 x̄: 9.13 x̃: 10 HURT stats (rel) min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50% 95% mean confidence interval for cycles value: -5.52 -4.99 95% mean confidence interval for cycles %-change: -0.81% -0.73% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
2018-02-15 22:49:55 +00:00
(('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
(('fdot4', ('vec4', a, b, 0.0, 0.0), c), ('fdot2', ('vec2', a, b), c)),
(('fdot4', ('vec4', a, b, c, 0.0), d), ('fdot3', ('vec3', a, b, c), d)),
(('fdot3', ('vec3', a, 0.0, 0.0), b), ('fmul', a, b)),
(('fdot3', ('vec3', a, b, 0.0), c), ('fdot2', ('vec2', a, b), c)),
(('fdot2', ('vec2', a, 0.0), b), ('fmul', a, b)),
(('fdot2', a, 1.0), ('fadd', 'a.x', 'a.y')),
# Lower fdot to fsum when it is available
(('fdot2', a, b), ('fsum2', ('fmul', a, b)), 'options->lower_fdot'),
(('fdot3', a, b), ('fsum3', ('fmul', a, b)), 'options->lower_fdot'),
(('fdot4', a, b), ('fsum4', ('fmul', a, b)), 'options->lower_fdot'),
(('fsum2', a), ('fadd', 'a.x', 'a.y'), 'options->lower_fdot'),
nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-03-28 06:57:07 +01:00
# If x >= 0 and x <= 1: fsat(1 - x) == 1 - fsat(x) trivially
# If x < 0: 1 - fsat(x) => 1 - 0 => 1 and fsat(1 - x) => fsat(> 1) => 1
# If x > 1: 1 - fsat(x) => 1 - 1 => 0 and fsat(1 - x) => fsat(< 0) => 0
(('~fadd', ('fneg(is_used_once)', ('fsat(is_used_once)', 'a(is_not_fmul)')), 1.0), ('fsat', ('fadd', 1.0, ('fneg', a)))),
nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Convert the pattern directly to flrp. There were negligible improvements on Gen4 and Gen5, and Gen11 was actually hurt. I believe the problem is this optimization conflicts with the (1-x)*y => ffma(-x, y, y) optimization on Gen11. Skylake total instructions in shared programs: 15046487 -> 15041996 (-0.03%) instructions in affected programs: 194681 -> 190190 (-2.31%) helped: 880 HURT: 20 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17% 95% mean confidence interval for instructions value: -5.25 -4.73 95% mean confidence interval for instructions %-change: -5.11% -4.36% Instructions are helped. total cycles in shared programs: 354340839 -> 354303333 (-0.01%) cycles in affected programs: 1753622 -> 1716116 (-2.14%) helped: 786 HURT: 182 helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22 helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84% HURT stats (abs) min: 1 max: 440 x̄: 37.99 x̃: 9 HURT stats (rel) min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32% 95% mean confidence interval for cycles value: -45.90 -31.59 95% mean confidence interval for cycles %-change: -3.09% -2.50% Cycles are helped. All Gen6-Gen8 platforms had similar results. (Broadwell shown) total instructions in shared programs: 15055907 -> 15051466 (-0.03%) instructions in affected programs: 196370 -> 191929 (-2.26%) helped: 871 HURT: 26 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12% 95% mean confidence interval for instructions value: -5.21 -4.69 95% mean confidence interval for instructions %-change: -4.99% -4.24% Instructions are helped. total cycles in shared programs: 387729170 -> 387699745 (<.01%) cycles in affected programs: 1816409 -> 1786984 (-1.62%) helped: 788 HURT: 172 helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22 helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76% HURT stats (abs) min: 1 max: 404 x̄: 45.59 x̃: 14 HURT stats (rel) min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43% 95% mean confidence interval for cycles value: -35.69 -25.61 95% mean confidence interval for cycles %-change: -2.88% -2.40% Cycles are helped. total fills in shared programs: 34712 -> 34710 (<.01%) fills in affected programs: 7 -> 5 (-28.57%) helped: 1 HURT: 0 LOST: 0 GAINED: 2 Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-25 07:49:30 +01:00
# 1 - ((1 - a) * (1 - b))
# 1 - (1 - a - b + a*b)
# 1 - 1 + a + b - a*b
# a + b - a*b
# a + b*(1 - a)
# b*(1 - a) + 1*a
# flrp(b, 1, a)
(('~fadd@32', 1.0, ('fneg', ('fmul', ('fadd', 1.0, ('fneg', a)), ('fadd', 1.0, ('fneg', b))))),
('flrp', b, 1.0, a), '!options->lower_flrp32'),
nir/algebraic: Optimize common array indexing sequence Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 01:47:38 +01:00
# (a * #b + #c) << #d
# ((a * #b) << #d) + (#c << #d)
# (a * (#b << #d)) + (#c << #d)
(('ishl', ('iadd', ('imul', a, '#b'), '#c'), '#d'),
('iadd', ('imul', a, ('ishl', b, d)), ('ishl', c, d))),
# (a * #b) << #c
# a * (#b << #c)
(('ishl', ('imul', a, '#b'), '#c'), ('imul', a, ('ishl', b, c))),
nir/algebraic: Reassociate shift-by-constant of shift-by-constant v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-06 21:11:56 +01:00
]
# Care must be taken here. Shifts in NIR uses only the lower log2(bitsize)
# bits of the second source. These replacements must correctly handle the
# case where (b % bitsize) + (c % bitsize) >= bitsize.
for s in [8, 16, 32, 64]:
mask = (1 << s) - 1
ishl = "ishl@{}".format(s)
ishr = "ishr@{}".format(s)
ushr = "ushr@{}".format(s)
in_bounds = ('ult', ('iadd', ('iand', b, mask), ('iand', c, mask)), s)
nir/algebraic: Optimize common array indexing sequence Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-20 01:47:38 +01:00
nir/algebraic: Reassociate shift-by-constant of shift-by-constant v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-06 21:11:56 +01:00
optimizations.extend([
((ishl, (ishl, a, '#b'), '#c'), ('bcsel', in_bounds, (ishl, a, ('iadd', b, c)), 0)),
((ushr, (ushr, a, '#b'), '#c'), ('bcsel', in_bounds, (ushr, a, ('iadd', b, c)), 0)),
# To get get -1 for large shifts of negative values, ishr must instead
# clamp the shift count to the maximum value.
((ishr, (ishr, a, '#b'), '#c'),
(ishr, a, ('imin', ('iadd', ('iand', b, mask), ('iand', c, mask)), s - 1))),
])
# Optimize a pattern of address calculation created by DXVK where the offset is
# divided by 4 and then multipled by 4. This can be turned into an iand and the
# additions before can be reassociated to CSE the iand instruction.
for log2 in range(1, 7): # powers of two from 2 to 64
v = 1 << log2
mask = 0xffffffff & ~(v - 1)
b_is_multiple = '#b(is_unsigned_multiple_of_{})'.format(v)
optimizations.extend([
# 'a >> #b << #b' -> 'a & ~((1 << #b) - 1)'
(('ishl@32', ('ushr@32', a, log2), log2), ('iand', a, mask)),
# Reassociate for improved CSE
(('iand@32', ('iadd@32', a, b_is_multiple), mask), ('iadd', ('iand', a, mask), b)),
])
nir/algebraic: Generalize some and-of-shift-right patterns [v2] Generalizes some of the patterns from 76289fbfa84a and 905ff8619824. In particular, some of the soft-fp64 code generates (a & 0x7fffffff) << 1 when constant 0.0 is compared (flt or feq). v2: Reduce the set of added patterns to those that actually help something. This reduces the size of the state transition tables by about 29k. Suggested by Jason. Remove the existing patterns that this commit subsumes. Results on the 308 shaders extracted from the fp64 portion of the OpenGL CTS: Tiger Lake total instructions in shared programs: 903171 -> 903108 (<.01%) instructions in affected programs: 635903 -> 635840 (<.01%) helped: 25 HURT: 11 helped stats (abs) min: 1 max: 16 x̄: 5.04 x̃: 3 helped stats (rel) min: <.01% max: 0.15% x̄: 0.04% x̃: 0.03% HURT stats (abs) min: 2 max: 14 x̄: 5.73 x̃: 5 HURT stats (rel) min: <.01% max: 0.11% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for instructions value: -3.91 0.41 95% mean confidence interval for instructions %-change: -0.03% <.01% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 7059527 -> 7059681 (<.01%) cycles in affected programs: 5249401 -> 5249555 (<.01%) helped: 41 HURT: 9 helped stats (abs) min: 2 max: 76 x̄: 11.90 x̃: 10 helped stats (rel) min: <.01% max: 11.86% x̄: 0.99% x̃: 0.01% HURT stats (abs) min: 2 max: 380 x̄: 71.33 x̃: 12 HURT stats (rel) min: <.01% max: 0.22% x̄: 0.04% x̃: 0.01% 95% mean confidence interval for cycles value: -14.93 21.09 95% mean confidence interval for cycles %-change: -1.40% -0.20% Inconclusive result (value mean confidence interval includes 0). Ice Lake total instructions in shared programs: 895506 -> 895384 (-0.01%) instructions in affected programs: 658800 -> 658678 (-0.02%) helped: 37 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 3.30 x̃: 2 helped stats (rel) min: <.01% max: 0.03% x̄: 0.02% x̃: 0.02% 95% mean confidence interval for instructions value: -4.00 -2.59 95% mean confidence interval for instructions %-change: -0.02% -0.02% Instructions are helped. total cycles in shared programs: 7092748 -> 7092224 (<.01%) cycles in affected programs: 5272008 -> 5271484 (<.01%) helped: 36 HURT: 14 helped stats (abs) min: 2 max: 440 x̄: 21.67 x̃: 8 helped stats (rel) min: <.01% max: 11.86% x̄: 1.12% x̃: 0.02% HURT stats (abs) min: 2 max: 122 x̄: 18.29 x̃: 6 HURT stats (rel) min: <.01% max: 0.07% x̄: 0.01% x̃: <.01% 95% mean confidence interval for cycles value: -29.24 8.28 95% mean confidence interval for cycles %-change: -1.40% -0.21% Inconclusive result (value mean confidence interval includes 0). Regular shader-db results: All Haswell+ platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 17611489 -> 17611408 (<.01%) instructions in affected programs: 21188 -> 21107 (-0.38%) helped: 23 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 3.78 x̃: 3 helped stats (rel) min: 0.03% max: 5.82% x̄: 1.13% x̃: 0.85% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.60% max: 0.60% x̄: 0.60% x̃: 0.60% 95% mean confidence interval for instructions value: -5.27 -1.48 95% mean confidence interval for instructions %-change: -1.70% -0.42% Instructions are helped. total cycles in shared programs: 338418502 -> 338366148 (-0.02%) cycles in affected programs: 2289052 -> 2236698 (-2.29%) helped: 18 HURT: 3 helped stats (abs) min: 4 max: 18000 x̄: 2909.67 x̃: 38 helped stats (rel) min: 0.09% max: 4.07% x̄: 0.96% x̃: 0.43% HURT stats (abs) min: 2 max: 14 x̄: 6.67 x̃: 4 HURT stats (rel) min: 0.22% max: 1.13% x̄: 0.66% x̃: 0.64% 95% mean confidence interval for cycles value: -5204.00 217.91 95% mean confidence interval for cycles %-change: -1.31% -0.14% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 11875617 -> 11875615 (<.01%) instructions in affected programs: 1339 -> 1337 (-0.15%) helped: 2 HURT: 0 No changes on any earlier Intel platforms. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-07 00:23:29 +00:00
# To save space in the state tables, reduce to the set that is known to help.
# Previously, this was range(1, 32). In addition, a couple rules inside the
# loop are commented out. Revisit someday, probably after mesa/#2635 has some
# resolution.
for i in [1, 2, 16, 24]:
lo_mask = 0xffffffff >> i
hi_mask = (0xffffffff << i) & 0xffffffff
optimizations.extend([
# This pattern seems to only help in the soft-fp64 code.
(('ishl@32', ('iand', 'a@32', lo_mask), i), ('ishl', a, i)),
# (('ushr@32', ('iand', 'a@32', hi_mask), i), ('ushr', a, i)),
# (('ishr@32', ('iand', 'a@32', hi_mask), i), ('ishr', a, i)),
(('iand', ('ishl', 'a@32', i), hi_mask), ('ishl', a, i)),
(('iand', ('ushr', 'a@32', i), lo_mask), ('ushr', a, i)),
# (('iand', ('ishr', 'a@32', i), lo_mask), ('ushr', a, i)), # Yes, ushr is correct
])
nir/algebraic: Reassociate shift-by-constant of shift-by-constant v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-06 21:11:56 +01:00
optimizations.extend([
nir/algebraic: Reassociate add-and-shift to be shift-and-add A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 -> 16250704 (-0.17%) instructions in affected programs: 1440284 -> 1413230 (-1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: -5.67 -5.31 95% mean confidence interval for instructions %-change: -2.26% -2.16% Instructions are helped. total cycles in shared programs: 367118526 -> 365895358 (-0.33%) cycles in affected programs: 93504145 -> 92280977 (-1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: -387.31 -220.78 95% mean confidence interval for cycles %-change: -2.11% -1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-11 00:28:38 +01:00
# This is common for address calculations. Reassociating may enable the
# 'a<<c' to be CSE'd. It also helps architectures that have an ISHLADD
# instruction or a constant offset field for in load / store instructions.
(('ishl', ('iadd', a, '#b'), '#c'), ('iadd', ('ishl', a, c), ('ishl', b, c))),
# Comparison simplifications
(('~inot', ('flt', a, b)), ('fge', a, b)),
(('~inot', ('fge', a, b)), ('flt', a, b)),
(('inot', ('feq', a, b)), ('fne', a, b)),
(('inot', ('fne', a, b)), ('feq', a, b)),
(('inot', ('ilt', a, b)), ('ige', a, b)),
(('inot', ('ult', a, b)), ('uge', a, b)),
(('inot', ('ige', a, b)), ('ilt', a, b)),
(('inot', ('uge', a, b)), ('ult', a, b)),
(('inot', ('ieq', a, b)), ('ine', a, b)),
(('inot', ('ine', a, b)), ('ieq', a, b)),
(('iand', ('feq', a, b), ('fne', a, b)), False),
(('iand', ('flt', a, b), ('flt', b, a)), False),
(('iand', ('ieq', a, b), ('ine', a, b)), False),
(('iand', ('ilt', a, b), ('ilt', b, a)), False),
(('iand', ('ult', a, b), ('ult', b, a)), False),
# This helps some shaders because, after some optimizations, they end up
# with patterns like (-a < -b) || (b < a). In an ideal world, this sort of
# matching would be handled by CSE.
(('flt', ('fneg', a), ('fneg', b)), ('flt', b, a)),
(('fge', ('fneg', a), ('fneg', b)), ('fge', b, a)),
(('feq', ('fneg', a), ('fneg', b)), ('feq', b, a)),
(('fne', ('fneg', a), ('fneg', b)), ('fne', b, a)),
(('flt', ('fneg', a), -1.0), ('flt', 1.0, a)),
(('flt', -1.0, ('fneg', a)), ('flt', a, 1.0)),
(('fge', ('fneg', a), -1.0), ('fge', 1.0, a)),
(('fge', -1.0, ('fneg', a)), ('fge', a, 1.0)),
(('fne', ('fneg', a), -1.0), ('fne', 1.0, a)),
(('feq', -1.0, ('fneg', a)), ('feq', a, 1.0)),
(('flt', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('flt', a, b)),
(('flt', '#b(is_gt_0_and_lt_1)', ('fsat(is_used_once)', a)), ('flt', b, a)),
(('fge', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('fge', a, b)),
(('fge', '#b(is_gt_0_and_lt_1)', ('fsat(is_used_once)', a)), ('fge', b, a)),
(('feq', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('feq', a, b)),
(('fne', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('fne', a, b)),
(('fge', ('fsat(is_used_once)', a), 1.0), ('fge', a, 1.0)),
(('flt', ('fsat(is_used_once)', a), 1.0), ('flt', a, 1.0)),
(('fge', 0.0, ('fsat(is_used_once)', a)), ('fge', 0.0, a)),
(('flt', 0.0, ('fsat(is_used_once)', a)), ('flt', 0.0, a)),
# 0.0 >= b2f(a)
# b2f(a) <= 0.0
# b2f(a) == 0.0 because b2f(a) can only be 0 or 1
# inot(a)
(('fge', 0.0, ('b2f', 'a@1')), ('inot', a)),
(('fge', ('fneg', ('b2f', 'a@1')), 0.0), ('inot', a)),
(('fne', ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('ior', a, b)),
(('fne', ('fmax', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('ior', a, b)),
(('fne', ('bcsel', a, 1.0, ('b2f', 'b@1')) , 0.0), ('ior', a, b)),
(('fne', ('b2f', 'a@1'), ('fneg', ('b2f', 'b@1'))), ('ior', a, b)),
(('fne', ('fmul', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('iand', a, b)),
(('fne', ('fmin', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('iand', a, b)),
(('fne', ('bcsel', a, ('b2f', 'b@1'), 0.0) , 0.0), ('iand', a, b)),
(('fne', ('fadd', ('b2f', 'a@1'), ('fneg', ('b2f', 'b@1'))), 0.0), ('ixor', a, b)),
(('fne', ('b2f', 'a@1') , ('b2f', 'b@1') ), ('ixor', a, b)),
(('fne', ('fneg', ('b2f', 'a@1')), ('fneg', ('b2f', 'b@1'))), ('ixor', a, b)),
(('feq', ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('inot', ('ior', a, b))),
(('feq', ('fmax', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('inot', ('ior', a, b))),
(('feq', ('bcsel', a, 1.0, ('b2f', 'b@1')) , 0.0), ('inot', ('ior', a, b))),
(('feq', ('b2f', 'a@1'), ('fneg', ('b2f', 'b@1'))), ('inot', ('ior', a, b))),
(('feq', ('fmul', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('inot', ('iand', a, b))),
(('feq', ('fmin', ('b2f', 'a@1'), ('b2f', 'b@1')), 0.0), ('inot', ('iand', a, b))),
(('feq', ('bcsel', a, ('b2f', 'b@1'), 0.0) , 0.0), ('inot', ('iand', a, b))),
(('feq', ('fadd', ('b2f', 'a@1'), ('fneg', ('b2f', 'b@1'))), 0.0), ('ieq', a, b)),
(('feq', ('b2f', 'a@1') , ('b2f', 'b@1') ), ('ieq', a, b)),
(('feq', ('fneg', ('b2f', 'a@1')), ('fneg', ('b2f', 'b@1'))), ('ieq', a, b)),
nir: Transform expressions of b2f(a) and b2f(b) to a || b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-02 01:18:07 +01:00
# -(b2f(a) + b2f(b)) < 0
# 0 < b2f(a) + b2f(b)
# 0 != b2f(a) + b2f(b) b2f must be 0 or 1, so the sum is non-negative
# a || b
(('flt', ('fneg', ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1'))), 0.0), ('ior', a, b)),
(('flt', 0.0, ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1'))), ('ior', a, b)),
nir: Transform expressions of b2f(a) and b2f(b) to a || b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-02 01:18:07 +01:00
nir: Transform expressions of b2f(a) and b2f(b) to !(a || b) All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276961 -> 14276892 (<.01%) instructions in affected programs: 3215 -> 3146 (-2.15%) helped: 28 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92% 95% mean confidence interval for instructions value: -2.87 -2.06 95% mean confidence interval for instructions %-change: -5.73% -2.95% Instructions are helped. total cycles in shared programs: 532577068 -> 532578400 (<.01%) cycles in affected programs: 121864 -> 123196 (1.09%) helped: 35 HURT: 30 helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22 helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86% HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36 HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58% 95% mean confidence interval for cycles value: -5.02 46.01 95% mean confidence interval for cycles %-change: -0.99% 1.65% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781299 -> 7781342 (<.01%) instructions in affected programs: 22300 -> 22343 (0.19%) helped: 13 HURT: 40 helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3 helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43% 95% mean confidence interval for instructions value: 0.23 1.39 95% mean confidence interval for instructions %-change: -1.18% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 177878928 -> 177879332 (<.01%) cycles in affected programs: 383298 -> 383702 (0.11%) helped: 7 HURT: 43 helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10 helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40% HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12 HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09% 95% mean confidence interval for cycles value: 5.21 10.95 95% mean confidence interval for cycles %-change: -0.51% 0.21% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-02 03:27:01 +01:00
# -(b2f(a) + b2f(b)) >= 0
# 0 >= b2f(a) + b2f(b)
# 0 == b2f(a) + b2f(b) b2f must be 0 or 1, so the sum is non-negative
# !(a || b)
(('fge', ('fneg', ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1'))), 0.0), ('inot', ('ior', a, b))),
(('fge', 0.0, ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1'))), ('inot', ('ior', a, b))),
nir: Transform expressions of b2f(a) and b2f(b) to !(a || b) All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276961 -> 14276892 (<.01%) instructions in affected programs: 3215 -> 3146 (-2.15%) helped: 28 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92% 95% mean confidence interval for instructions value: -2.87 -2.06 95% mean confidence interval for instructions %-change: -5.73% -2.95% Instructions are helped. total cycles in shared programs: 532577068 -> 532578400 (<.01%) cycles in affected programs: 121864 -> 123196 (1.09%) helped: 35 HURT: 30 helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22 helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86% HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36 HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58% 95% mean confidence interval for cycles value: -5.02 46.01 95% mean confidence interval for cycles %-change: -0.99% 1.65% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781299 -> 7781342 (<.01%) instructions in affected programs: 22300 -> 22343 (0.19%) helped: 13 HURT: 40 helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3 helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43% 95% mean confidence interval for instructions value: 0.23 1.39 95% mean confidence interval for instructions %-change: -1.18% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 177878928 -> 177879332 (<.01%) cycles in affected programs: 383298 -> 383702 (0.11%) helped: 7 HURT: 43 helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10 helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40% HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12 HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09% 95% mean confidence interval for cycles value: 5.21 10.95 95% mean confidence interval for cycles %-change: -0.51% 0.21% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-02 03:27:01 +01:00
nir/algebraic: Strength reduce some compares of x and -x Converting the x vs -x comparison to an x vs 0 comparison enable cmod propagation to help. The seems to be a win everywhere except Gen7. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15566733 -> 15566014 (<.01%) instructions in affected programs: 72617 -> 71898 (-0.99%) helped: 302 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98% 95% mean confidence interval for instructions value: -2.55 -2.21 95% mean confidence interval for instructions %-change: -1.40% -1.16% Instructions are helped. total cycles in shared programs: 413014786 -> 413015475 (<.01%) cycles in affected programs: 707594 -> 708283 (0.10%) helped: 227 HURT: 101 helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20 helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49% HURT stats (abs) min: 2 max: 334 x̄: 87.90 x̃: 45 HURT stats (rel) min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36% 95% mean confidence interval for cycles value: -8.12 12.32 95% mean confidence interval for cycles %-change: -0.67% 0.34% Inconclusive result (value mean confidence interval includes 0). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13828220 -> 13827881 (<.01%) instructions in affected programs: 60887 -> 60548 (-0.56%) helped: 253 HURT: 6 helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1 helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27% 95% mean confidence interval for instructions value: -1.39 -1.23 95% mean confidence interval for instructions %-change: -0.85% -0.70% Instructions are helped. total cycles in shared programs: 386870095 -> 386894412 (<.01%) cycles in affected programs: 1537307 -> 1561624 (1.58%) helped: 127 HURT: 188 helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4 helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33% HURT stats (abs) min: 2 max: 5585 x̄: 141.43 x̃: 14 HURT stats (rel) min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06% 95% mean confidence interval for cycles value: 21.95 132.45 95% mean confidence interval for cycles %-change: 0.32% 0.85% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10896339 -> 10896276 (<.01%) instructions in affected programs: 10757 -> 10694 (-0.59%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1 helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89% 95% mean confidence interval for instructions value: -1.42 -1.15 95% mean confidence interval for instructions %-change: -1.03% -0.72% Instructions are helped. total cycles in shared programs: 155091003 -> 155090480 (<.01%) cycles in affected programs: 102761 -> 102238 (-0.51%) helped: 51 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4 helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36% 95% mean confidence interval for cycles value: -12.98 -7.53 95% mean confidence interval for cycles %-change: -0.97% -0.56% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8234667 -> 8234652 (<.01%) instructions in affected programs: 2063 -> 2048 (-0.73%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.97% -0.67% Instructions are helped. total cycles in shared programs: 188700906 -> 188700598 (<.01%) cycles in affected programs: 283480 -> 283172 (-0.11%) helped: 83 HURT: 3 helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4 helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04% 95% mean confidence interval for cycles value: -3.87 -3.29 95% mean confidence interval for cycles %-change: -0.16% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-12-18 06:29:26 +00:00
(('flt', a, ('fneg', a)), ('flt', a, 0.0)),
(('fge', a, ('fneg', a)), ('fge', a, 0.0)),
nir: Transform expressions of b2f(a) and b2f(b) to a || b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-02 01:18:07 +01:00
# Some optimizations (below) convert things like (a < b || c < b) into
# (min(a, c) < b). However, this interfers with the previous optimizations
# that try to remove comparisons with negated sums of b2f. This just
# breaks that apart.
(('flt', ('fmin', c, ('fneg', ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1')))), 0.0),
nir: Transform expressions of b2f(a) and b2f(b) to a || b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-08-02 01:18:07 +01:00
('ior', ('flt', c, 0.0), ('ior', a, b))),
nir: Simplify some comparisons like a+b < a All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14514555 -> 14514547 (<.01%) instructions in affected programs: 1972 -> 1964 (-0.41%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.39% max: 0.42% x̄: 0.41% x̃: 0.41% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.41% -0.40% Instructions are helped. total cycles in shared programs: 533141444 -> 533136780 (<.01%) cycles in affected programs: 164728 -> 160064 (-2.83%) helped: 181 HURT: 3 helped stats (abs) min: 2 max: 94 x̄: 26.17 x̃: 30 helped stats (rel) min: 0.12% max: 5.33% x̄: 3.42% x̃: 3.80% HURT stats (abs) min: 4 max: 54 x̄: 24.00 x̃: 14 HURT stats (rel) min: 0.20% max: 2.39% x̄: 1.09% x̃: 0.68% 95% mean confidence interval for cycles value: -27.12 -23.58 95% mean confidence interval for cycles %-change: -3.54% -3.16% Cycles are helped. Sandy Bridge total instructions in shared programs: 10533667 -> 10533539 (<.01%) instructions in affected programs: 10148 -> 10020 (-1.26%) helped: 124 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.39% max: 4.35% x̄: 2.20% x̃: 2.04% 95% mean confidence interval for instructions value: -1.06 -1.00 95% mean confidence interval for instructions %-change: -2.46% -1.95% Instructions are helped. total cycles in shared programs: 146136887 -> 146132122 (<.01%) cycles in affected programs: 206382 -> 201617 (-2.31%) helped: 171 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 27.87 x̃: 30 helped stats (rel) min: 0.08% max: 5.73% x̄: 2.98% x̃: 2.67% 95% mean confidence interval for cycles value: -29.19 -26.54 95% mean confidence interval for cycles %-change: -3.20% -2.76% Cycles are helped. Iron Lake total instructions in shared programs: 7886515 -> 7886507 (<.01%) instructions in affected programs: 3016 -> 3008 (-0.27%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.25% max: 0.28% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.27% -0.26% Instructions are helped. total cycles in shared programs: 178100396 -> 178100388 (<.01%) cycles in affected programs: 156128 -> 156120 (<.01%) helped: 4 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -3.68 1.68 95% mean confidence interval for cycles %-change: -0.03% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857872 -> 4857868 (<.01%) instructions in affected programs: 1544 -> 1540 (-0.26%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.25% max: 0.27% x̄: 0.26% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.28% -0.24% Instructions are helped. total cycles in shared programs: 122167654 -> 122167662 (<.01%) cycles in affected programs: 96248 -> 96256 (<.01%) helped: 0 HURT: 4 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: <.01% 0.02% Cycles are HURT. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-13 02:58:53 +00:00
(('~flt', ('fadd', a, b), a), ('flt', b, 0.0)),
(('~fge', ('fadd', a, b), a), ('fge', b, 0.0)),
(('~feq', ('fadd', a, b), a), ('feq', b, 0.0)),
(('~fne', ('fadd', a, b), a), ('fne', b, 0.0)),
nir/algebraic: Simplify some comparisons like a+constant < constant v2: Remove unsafe integer versions of the optimizations. This change had no effect on shader-db results. Suggested by Caio. All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333713 -> 16332631 (<.01%) instructions in affected programs: 258112 -> 257030 (-0.42%) helped: 1275 HURT: 407 helped stats (abs) min: 1 max: 7 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.20% max: 8.33% x̄: 1.33% x̃: 0.86% HURT stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 2.94% x̄: 0.98% x̃: 0.98% 95% mean confidence interval for instructions value: -0.70 -0.59 95% mean confidence interval for instructions %-change: -0.84% -0.70% Instructions are helped. total cycles in shared programs: 367596791 -> 367601268 (<.01%) cycles in affected programs: 3420062 -> 3424539 (0.13%) helped: 1553 HURT: 783 helped stats (abs) min: 1 max: 742 x̄: 24.36 x̃: 6 helped stats (rel) min: 0.05% max: 21.12% x̄: 1.47% x̃: 0.65% HURT stats (abs) min: 1 max: 557 x̄: 54.04 x̃: 14 HURT stats (rel) min: 0.01% max: 33.66% x̄: 3.36% x̃: 1.43% 95% mean confidence interval for cycles value: -1.60 5.43 95% mean confidence interval for cycles %-change: -0.03% 0.33% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Iron Lake total instructions in shared programs: 8137992 -> 8137874 (<.01%) instructions in affected programs: 17501 -> 17383 (-0.67%) helped: 104 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.25% max: 2.63% x̄: 0.87% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.22 -1.00 95% mean confidence interval for instructions %-change: -0.94% -0.76% Instructions are helped. total cycles in shared programs: 188540038 -> 188539650 (<.01%) cycles in affected programs: 704574 -> 704186 (-0.06%) helped: 125 HURT: 84 helped stats (abs) min: 2 max: 96 x̄: 6.45 x̃: 4 helped stats (rel) min: <.01% max: 3.47% x̄: 0.42% x̃: 0.25% HURT stats (abs) min: 2 max: 58 x̄: 4.98 x̃: 4 HURT stats (rel) min: 0.01% max: 2.75% x̄: 0.36% x̃: 0.33% 95% mean confidence interval for cycles value: -3.20 -0.52 95% mean confidence interval for cycles %-change: -0.19% -0.03% Cycles are helped. GM45 total instructions in shared programs: 5008889 -> 5008830 (<.01%) instructions in affected programs: 8824 -> 8765 (-0.67%) helped: 52 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.25% max: 2.38% x̄: 0.86% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.27 -0.95 95% mean confidence interval for instructions %-change: -0.96% -0.71% Instructions are helped. total cycles in shared programs: 128969426 -> 128969128 (<.01%) cycles in affected programs: 399798 -> 399500 (-0.07%) helped: 74 HURT: 30 helped stats (abs) min: 2 max: 22 x̄: 6.76 x̃: 6 helped stats (rel) min: <.01% max: 1.83% x̄: 0.46% x̃: 0.29% HURT stats (abs) min: 2 max: 58 x̄: 6.73 x̃: 6 HURT stats (rel) min: 0.06% max: 2.75% x̄: 0.42% x̃: 0.21% 95% mean confidence interval for cycles value: -4.60 -1.14 95% mean confidence interval for cycles %-change: -0.32% -0.08% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-02-14 01:44:00 +00:00
(('~flt', ('fadd(is_used_once)', a, '#b'), '#c'), ('flt', a, ('fadd', c, ('fneg', b)))),
(('~flt', ('fneg(is_used_once)', ('fadd(is_used_once)', a, '#b')), '#c'), ('flt', ('fneg', ('fadd', c, b)), a)),
(('~fge', ('fadd(is_used_once)', a, '#b'), '#c'), ('fge', a, ('fadd', c, ('fneg', b)))),
(('~fge', ('fneg(is_used_once)', ('fadd(is_used_once)', a, '#b')), '#c'), ('fge', ('fneg', ('fadd', c, b)), a)),
(('~feq', ('fadd(is_used_once)', a, '#b'), '#c'), ('feq', a, ('fadd', c, ('fneg', b)))),
(('~feq', ('fneg(is_used_once)', ('fadd(is_used_once)', a, '#b')), '#c'), ('feq', ('fneg', ('fadd', c, b)), a)),
(('~fne', ('fadd(is_used_once)', a, '#b'), '#c'), ('fne', a, ('fadd', c, ('fneg', b)))),
(('~fne', ('fneg(is_used_once)', ('fadd(is_used_once)', a, '#b')), '#c'), ('fne', ('fneg', ('fadd', c, b)), a)),
nir: Simplify some comparisons like a+b < a All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14514555 -> 14514547 (<.01%) instructions in affected programs: 1972 -> 1964 (-0.41%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.39% max: 0.42% x̄: 0.41% x̃: 0.41% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.41% -0.40% Instructions are helped. total cycles in shared programs: 533141444 -> 533136780 (<.01%) cycles in affected programs: 164728 -> 160064 (-2.83%) helped: 181 HURT: 3 helped stats (abs) min: 2 max: 94 x̄: 26.17 x̃: 30 helped stats (rel) min: 0.12% max: 5.33% x̄: 3.42% x̃: 3.80% HURT stats (abs) min: 4 max: 54 x̄: 24.00 x̃: 14 HURT stats (rel) min: 0.20% max: 2.39% x̄: 1.09% x̃: 0.68% 95% mean confidence interval for cycles value: -27.12 -23.58 95% mean confidence interval for cycles %-change: -3.54% -3.16% Cycles are helped. Sandy Bridge total instructions in shared programs: 10533667 -> 10533539 (<.01%) instructions in affected programs: 10148 -> 10020 (-1.26%) helped: 124 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.03 x̃: 1 helped stats (rel) min: 0.39% max: 4.35% x̄: 2.20% x̃: 2.04% 95% mean confidence interval for instructions value: -1.06 -1.00 95% mean confidence interval for instructions %-change: -2.46% -1.95% Instructions are helped. total cycles in shared programs: 146136887 -> 146132122 (<.01%) cycles in affected programs: 206382 -> 201617 (-2.31%) helped: 171 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 27.87 x̃: 30 helped stats (rel) min: 0.08% max: 5.73% x̄: 2.98% x̃: 2.67% 95% mean confidence interval for cycles value: -29.19 -26.54 95% mean confidence interval for cycles %-change: -3.20% -2.76% Cycles are helped. Iron Lake total instructions in shared programs: 7886515 -> 7886507 (<.01%) instructions in affected programs: 3016 -> 3008 (-0.27%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.25% max: 0.28% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.27% -0.26% Instructions are helped. total cycles in shared programs: 178100396 -> 178100388 (<.01%) cycles in affected programs: 156128 -> 156120 (<.01%) helped: 4 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -3.68 1.68 95% mean confidence interval for cycles %-change: -0.03% <.01% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857872 -> 4857868 (<.01%) instructions in affected programs: 1544 -> 1540 (-0.26%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.25% max: 0.27% x̄: 0.26% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.28% -0.24% Instructions are helped. total cycles in shared programs: 122167654 -> 122167662 (<.01%) cycles in affected programs: 96248 -> 96256 (<.01%) helped: 0 HURT: 4 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: <.01% 0.02% Cycles are HURT. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-13 02:58:53 +00:00
# Cannot remove the addition from ilt or ige due to overflow.
(('ieq', ('iadd', a, b), a), ('ieq', b, 0)),
(('ine', ('iadd', a, b), a), ('ine', b, 0)),
nir: Replace an odd comparison involving fmin of -b2f I noticed the fge version while looking at a shader for an unrelated reason. The feq version prevents a regression in a later change that performs strength reduction of some compares. Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14514808 -> 14514796 (<.01%) instructions in affected programs: 750 -> 738 (-1.60%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.83% max: 1.96% x̄: 1.40% x̃: 1.40% 95% mean confidence interval for instructions value: -6.67 0.67 95% mean confidence interval for instructions %-change: -2.43% -0.36% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533144939 -> 533144853 (<.01%) cycles in affected programs: 8911 -> 8825 (-0.97%) helped: 4 HURT: 0 helped stats (abs) min: 16 max: 32 x̄: 21.50 x̃: 19 helped stats (rel) min: 0.60% max: 1.89% x̄: 1.28% x̃: 1.31% 95% mean confidence interval for cycles value: -32.94 -10.06 95% mean confidence interval for cycles %-change: -2.30% -0.26% Cycles are helped. Haswell total instructions in shared programs: 13093785 -> 13093775 (<.01%) instructions in affected programs: 924 -> 914 (-1.08%) helped: 4 HURT: 2 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.82% max: 1.95% x̄: 1.39% x̃: 1.39% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19% 95% mean confidence interval for instructions value: -4.53 1.20 95% mean confidence interval for instructions %-change: -2.02% 0.97% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 409580553 -> 409580118 (<.01%) cycles in affected programs: 10909 -> 10474 (-3.99%) helped: 5 HURT: 1 helped stats (abs) min: 6 max: 222 x̄: 89.60 x̃: 18 helped stats (rel) min: 0.16% max: 24.72% x̄: 9.54% x̃: 1.78% HURT stats (abs) min: 13 max: 13 x̄: 13.00 x̃: 13 HURT stats (rel) min: 0.39% max: 0.39% x̄: 0.39% x̃: 0.39% 95% mean confidence interval for cycles value: -180.68 35.68 95% mean confidence interval for cycles %-change: -19.55% 3.79% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 11811851 -> 11811840 (<.01%) instructions in affected programs: 1032 -> 1021 (-1.07%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 1 helped stats (rel) min: 0.63% max: 1.95% x̄: 1.13% x̃: 0.97% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19% 95% mean confidence interval for instructions value: -4.17 0.51 95% mean confidence interval for instructions %-change: -1.86% 0.36% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 257618403 -> 257618168 (<.01%) cycles in affected programs: 10784 -> 10549 (-2.18%) helped: 4 HURT: 2 helped stats (abs) min: 4 max: 220 x̄: 64.50 x̃: 17 helped stats (rel) min: 0.50% max: 24.34% x̄: 7.07% x̃: 1.72% HURT stats (abs) min: 9 max: 14 x̄: 11.50 x̃: 11 HURT stats (rel) min: 0.24% max: 0.42% x̄: 0.33% x̃: 0.33% 95% mean confidence interval for cycles value: -133.11 54.78 95% mean confidence interval for cycles %-change: -14.79% 5.59% Inconclusive result (value mean confidence interval includes 0). GM45, Iron Lake, and Sandy Bridge had similar results. (Sandy Bridge shown) total instructions in shared programs: 10533871 -> 10533859 (<.01%) instructions in affected programs: 865 -> 853 (-1.39%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.63% max: 1.83% x̄: 1.22% x̃: 1.21% 95% mean confidence interval for instructions value: -6.67 0.67 95% mean confidence interval for instructions %-change: -2.16% -0.29% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 146139904 -> 146139852 (<.01%) cycles in affected programs: 15213 -> 15161 (-0.34%) helped: 4 HURT: 0 helped stats (abs) min: 3 max: 18 x̄: 13.00 x̃: 15 helped stats (rel) min: 0.15% max: 0.84% x̄: 0.39% x̃: 0.29% 95% mean confidence interval for cycles value: -23.79 -2.21 95% mean confidence interval for cycles %-change: -0.88% 0.09% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-31 19:11:02 +00:00
# fmin(-b2f(a), b) >= 0.0
# -b2f(a) >= 0.0 && b >= 0.0
# -b2f(a) == 0.0 && b >= 0.0 -b2f can only be 0 or -1, never >0
# b2f(a) == 0.0 && b >= 0.0
# a == False && b >= 0.0
# !a && b >= 0.0
#
# The fge in the second replacement is not a typo. I leave the proof that
# "fmin(-b2f(a), b) >= 0 <=> fmin(-b2f(a), b) == 0" as an exercise for the
# reader.
(('fge', ('fmin', ('fneg', ('b2f', 'a@1')), 'b@1'), 0.0), ('iand', ('inot', a), ('fge', b, 0.0))),
(('feq', ('fmin', ('fneg', ('b2f', 'a@1')), 'b@1'), 0.0), ('iand', ('inot', a), ('fge', b, 0.0))),
nir: Replace an odd comparison involving fmin of -b2f I noticed the fge version while looking at a shader for an unrelated reason. The feq version prevents a regression in a later change that performs strength reduction of some compares. Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14514808 -> 14514796 (<.01%) instructions in affected programs: 750 -> 738 (-1.60%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.83% max: 1.96% x̄: 1.40% x̃: 1.40% 95% mean confidence interval for instructions value: -6.67 0.67 95% mean confidence interval for instructions %-change: -2.43% -0.36% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533144939 -> 533144853 (<.01%) cycles in affected programs: 8911 -> 8825 (-0.97%) helped: 4 HURT: 0 helped stats (abs) min: 16 max: 32 x̄: 21.50 x̃: 19 helped stats (rel) min: 0.60% max: 1.89% x̄: 1.28% x̃: 1.31% 95% mean confidence interval for cycles value: -32.94 -10.06 95% mean confidence interval for cycles %-change: -2.30% -0.26% Cycles are helped. Haswell total instructions in shared programs: 13093785 -> 13093775 (<.01%) instructions in affected programs: 924 -> 914 (-1.08%) helped: 4 HURT: 2 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.82% max: 1.95% x̄: 1.39% x̃: 1.39% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19% 95% mean confidence interval for instructions value: -4.53 1.20 95% mean confidence interval for instructions %-change: -2.02% 0.97% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 409580553 -> 409580118 (<.01%) cycles in affected programs: 10909 -> 10474 (-3.99%) helped: 5 HURT: 1 helped stats (abs) min: 6 max: 222 x̄: 89.60 x̃: 18 helped stats (rel) min: 0.16% max: 24.72% x̄: 9.54% x̃: 1.78% HURT stats (abs) min: 13 max: 13 x̄: 13.00 x̃: 13 HURT stats (rel) min: 0.39% max: 0.39% x̄: 0.39% x̃: 0.39% 95% mean confidence interval for cycles value: -180.68 35.68 95% mean confidence interval for cycles %-change: -19.55% 3.79% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 11811851 -> 11811840 (<.01%) instructions in affected programs: 1032 -> 1021 (-1.07%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 1 helped stats (rel) min: 0.63% max: 1.95% x̄: 1.13% x̃: 0.97% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 1.19% x̄: 1.19% x̃: 1.19% 95% mean confidence interval for instructions value: -4.17 0.51 95% mean confidence interval for instructions %-change: -1.86% 0.36% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 257618403 -> 257618168 (<.01%) cycles in affected programs: 10784 -> 10549 (-2.18%) helped: 4 HURT: 2 helped stats (abs) min: 4 max: 220 x̄: 64.50 x̃: 17 helped stats (rel) min: 0.50% max: 24.34% x̄: 7.07% x̃: 1.72% HURT stats (abs) min: 9 max: 14 x̄: 11.50 x̃: 11 HURT stats (rel) min: 0.24% max: 0.42% x̄: 0.33% x̃: 0.33% 95% mean confidence interval for cycles value: -133.11 54.78 95% mean confidence interval for cycles %-change: -14.79% 5.59% Inconclusive result (value mean confidence interval includes 0). GM45, Iron Lake, and Sandy Bridge had similar results. (Sandy Bridge shown) total instructions in shared programs: 10533871 -> 10533859 (<.01%) instructions in affected programs: 865 -> 853 (-1.39%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.63% max: 1.83% x̄: 1.22% x̃: 1.21% 95% mean confidence interval for instructions value: -6.67 0.67 95% mean confidence interval for instructions %-change: -2.16% -0.29% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 146139904 -> 146139852 (<.01%) cycles in affected programs: 15213 -> 15161 (-0.34%) helped: 4 HURT: 0 helped stats (abs) min: 3 max: 18 x̄: 13.00 x̃: 15 helped stats (rel) min: 0.15% max: 0.84% x̄: 0.39% x̃: 0.29% 95% mean confidence interval for cycles value: -23.79 -2.21 95% mean confidence interval for cycles %-change: -0.88% 0.09% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-31 19:11:02 +00:00
(('feq', ('b2f', 'a@1'), 0.0), ('inot', a)),
(('~fne', ('b2f', 'a@1'), 0.0), a),
(('ieq', ('b2i', 'a@1'), 0), ('inot', a)),
(('ine', ('b2i', 'a@1'), 0), a),
(('fne', ('u2f', a), 0.0), ('ine', a, 0)),
(('feq', ('u2f', a), 0.0), ('ieq', a, 0)),
(('fge', ('u2f', a), 0.0), True),
(('fge', 0.0, ('u2f', a)), ('uge', 0, a)), # ieq instead?
(('flt', ('u2f', a), 0.0), False),
(('flt', 0.0, ('u2f', a)), ('ult', 0, a)), # ine instead?
(('fne', ('i2f', a), 0.0), ('ine', a, 0)),
(('feq', ('i2f', a), 0.0), ('ieq', a, 0)),
(('fge', ('i2f', a), 0.0), ('ige', a, 0)),
(('fge', 0.0, ('i2f', a)), ('ige', 0, a)),
(('flt', ('i2f', a), 0.0), ('ilt', a, 0)),
(('flt', 0.0, ('i2f', a)), ('ilt', 0, a)),
nir: Don't compare i2f or u2i with zero Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277620 -> 14277230 (<.01%) instructions in affected programs: 36905 -> 36515 (-1.06%) helped: 101 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6 helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51% HURT stats (abs) min: 1 max: 28 x̄: 10.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47% 95% mean confidence interval for instructions value: -4.59 -2.70 95% mean confidence interval for instructions %-change: -1.90% -1.41% Instructions are helped. total cycles in shared programs: 532580716 -> 532577947 (<.01%) cycles in affected programs: 940575 -> 937806 (-0.29%) helped: 92 HURT: 12 helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62 helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41% HURT stats (abs) min: 10 max: 1112 x̄: 160.58 x̃: 63 HURT stats (rel) min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20% 95% mean confidence interval for cycles value: -50.66 -2.59 95% mean confidence interval for cycles %-change: -2.09% -0.73% Cycles are helped. total spills in shared programs: 8116 -> 8124 (0.10%) spills in affected programs: 200 -> 208 (4.00%) helped: 0 HURT: 2 total fills in shared programs: 11086 -> 11094 (0.07%) fills in affected programs: 436 -> 444 (1.83%) helped: 0 HURT: 2 Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12979054 -> 12978067 (<.01%) instructions in affected programs: 33633 -> 32646 (-2.93%) helped: 120 HURT: 2 helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13 helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17% HURT stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 HURT stats (rel) min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00% 95% mean confidence interval for instructions value: -9.19 -6.99 95% mean confidence interval for instructions %-change: -5.27% -3.62% Instructions are helped. total cycles in shared programs: 411212880 -> 411199636 (<.01%) cycles in affected programs: 696441 -> 683197 (-1.90%) helped: 107 HURT: 5 helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146 helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88% HURT stats (abs) min: 2 max: 50 x̄: 24.00 x̃: 22 HURT stats (rel) min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25% 95% mean confidence interval for cycles value: -136.96 -99.54 95% mean confidence interval for cycles %-change: -9.75% -6.53% Cycles are helped. total spills in shared programs: 78623 -> 78631 (0.01%) spills in affected programs: 66 -> 74 (12.12%) helped: 0 HURT: 2 total fills in shared programs: 80104 -> 80108 (<.01%) fills in affected programs: 133 -> 137 (3.01%) helped: 0 HURT: 2 No changes on Sandy Bridge, Iron Lake, or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-07-03 20:39:54 +01:00
# 0.0 < fabs(a)
# fabs(a) > 0.0
# fabs(a) != 0.0 because fabs(a) must be >= 0
# a != 0.0
(('~flt', 0.0, ('fabs', a)), ('fne', a, 0.0)),
# -fabs(a) < 0.0
# fabs(a) > 0.0
(('~flt', ('fneg', ('fabs', a)), 0.0), ('fne', a, 0.0)),
# 0.0 >= fabs(a)
# 0.0 == fabs(a) because fabs(a) must be >= 0
# 0.0 == a
(('fge', 0.0, ('fabs', a)), ('feq', a, 0.0)),
# -fabs(a) >= 0.0
# 0.0 >= fabs(a)
(('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
nir/algebraic: Replace checks that a value is between (or not) [0, 1] v2: Add an extra line to one of the proofs. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16329772 -> 16329427 (<.01%) instructions in affected programs: 41980 -> 41635 (-0.82%) helped: 110 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 3.14 x̃: 2 helped stats (rel) min: 0.19% max: 5.56% x̄: 1.12% x̃: 0.94% 95% mean confidence interval for instructions value: -4.10 -2.17 95% mean confidence interval for instructions %-change: -1.28% -0.96% Instructions are helped. total cycles in shared programs: 367551273 -> 367549979 (<.01%) cycles in affected programs: 492462 -> 491168 (-0.26%) helped: 76 HURT: 25 helped stats (abs) min: 1 max: 400 x̄: 42.86 x̃: 12 helped stats (rel) min: 0.06% max: 10.72% x̄: 1.23% x̃: 0.75% HURT stats (abs) min: 2 max: 730 x̄: 78.52 x̃: 16 HURT stats (rel) min: 0.17% max: 6.89% x̄: 2.08% x̃: 1.23% 95% mean confidence interval for cycles value: -37.79 12.16 95% mean confidence interval for cycles %-change: -0.90% 0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Sandy Bridge total instructions in shared programs: 10831115 -> 10830836 (<.01%) instructions in affected programs: 37830 -> 37551 (-0.74%) helped: 70 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 3.99 x̃: 2 helped stats (rel) min: 0.33% max: 7.14% x̄: 1.21% x̃: 0.97% 95% mean confidence interval for instructions value: -5.47 -2.50 95% mean confidence interval for instructions %-change: -1.49% -0.92% Instructions are helped. total cycles in shared programs: 154029323 -> 154028477 (<.01%) cycles in affected programs: 247909 -> 247063 (-0.34%) helped: 52 HURT: 6 helped stats (abs) min: 2 max: 254 x̄: 25.81 x̃: 4 helped stats (rel) min: 0.07% max: 4.39% x̄: 0.81% x̃: 0.19% HURT stats (abs) min: 4 max: 403 x̄: 82.67 x̃: 8 HURT stats (rel) min: 0.18% max: 1.60% x̄: 0.71% x̃: 0.53% 95% mean confidence interval for cycles value: -34.83 5.65 95% mean confidence interval for cycles %-change: -0.98% -0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake total instructions in shared programs: 8138007 -> 8137966 (<.01%) instructions in affected programs: 4060 -> 4019 (-1.01%) helped: 31 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.32 x̃: 1 helped stats (rel) min: 0.68% max: 8.33% x̄: 1.45% x̃: 0.90% 95% mean confidence interval for instructions value: -1.50 -1.15 95% mean confidence interval for instructions %-change: -2.11% -0.79% Instructions are helped. total cycles in shared programs: 188539492 -> 188539386 (<.01%) cycles in affected programs: 26280 -> 26174 (-0.40%) helped: 25 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4 helped stats (rel) min: 0.08% max: 2.11% x̄: 0.54% x̃: 0.50% 95% mean confidence interval for cycles value: -5.08 -3.40 95% mean confidence interval for cycles %-change: -0.70% -0.37% Cycles are helped. GM45 total instructions in shared programs: 5008897 -> 5008876 (<.01%) instructions in affected programs: 2096 -> 2075 (-1.00%) helped: 16 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1 helped stats (rel) min: 0.68% max: 7.69% x̄: 1.41% x̃: 0.89% 95% mean confidence interval for instructions value: -1.57 -1.06 95% mean confidence interval for instructions %-change: -2.32% -0.49% Instructions are helped. total cycles in shared programs: 128969020 -> 128968950 (<.01%) cycles in affected programs: 18490 -> 18420 (-0.38%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 4.67 x̃: 4 helped stats (rel) min: 0.08% max: 2.11% x̄: 0.51% x̃: 0.48% 95% mean confidence interval for cycles value: -6.03 -3.30 95% mean confidence interval for cycles %-change: -0.78% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-02-02 23:39:49 +00:00
# (a >= 0.0) && (a <= 1.0) -> fsat(a) == a
(('iand', ('fge', a, 0.0), ('fge', 1.0, a)), ('feq', a, ('fsat', a)), '!options->lower_fsat'),
# (a < 0.0) || (a > 1.0)
# !(!(a < 0.0) && !(a > 1.0))
# !((a >= 0.0) && (a <= 1.0))
# !(a == fsat(a))
# a != fsat(a)
(('ior', ('flt', a, 0.0), ('flt', 1.0, a)), ('fne', a, ('fsat', a)), '!options->lower_fsat'),
(('fmax', ('b2f(is_used_once)', 'a@1'), ('b2f', 'b@1')), ('b2f', ('ior', a, b))),
(('fmax', ('fneg(is_used_once)', ('b2f(is_used_once)', 'a@1')), ('fneg', ('b2f', 'b@1'))), ('fneg', ('b2f', ('ior', a, b)))),
(('fmin', ('b2f(is_used_once)', 'a@1'), ('b2f', 'b@1')), ('b2f', ('iand', a, b))),
(('fmin', ('fneg(is_used_once)', ('b2f(is_used_once)', 'a@1')), ('fneg', ('b2f', 'b@1'))), ('fneg', ('b2f', ('iand', a, b)))),
nir: Replace fmin(b2f(a), b) with a bcsel All of the affected shaders are HDR mappers from Serious Sam 3. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14516285 -> 14516273 (<.01%) instructions in affected programs: 348 -> 336 (-3.45%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 2.08% max: 6.67% x̄: 4.31% x̃: 4.17% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -5.55% -3.06% Instructions are helped. total cycles in shared programs: 533163876 -> 533163808 (<.01%) cycles in affected programs: 1144 -> 1076 (-5.94%) helped: 4 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 5.80% max: 6.08% x̄: 5.94% x̃: 5.94% 95% mean confidence interval for cycles value: -18.84 -15.16 95% mean confidence interval for cycles %-change: -6.20% -5.68% Cycles are helped. Sandy Bridge total instructions in shared programs: 10533321 -> 10533309 (<.01%) instructions in affected programs: 372 -> 360 (-3.23%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 2.00% max: 5.88% x̄: 3.91% x̃: 3.85% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -4.96% -2.86% Instructions are helped. total cycles in shared programs: 146136632 -> 146136428 (<.01%) cycles in affected programs: 11668 -> 11464 (-1.75%) helped: 12 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 0.99% max: 3.44% x̄: 2.20% x̃: 2.29% 95% mean confidence interval for cycles value: -17.66 -16.34 95% mean confidence interval for cycles %-change: -2.82% -1.58% Cycles are helped. Iron Lake total instructions in shared programs: 7886301 -> 7886277 (<.01%) instructions in affected programs: 576 -> 552 (-4.17%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 6.06% x̄: 4.51% x̃: 4.65% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -5.30% -3.72% Instructions are helped. total cycles in shared programs: 178113176 -> 178113176 (0.00%) cycles in affected programs: 2116 -> 2116 (0.00%) helped: 2 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.14% max: 1.14% x̄: 1.14% x̃: 1.14% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.50% max: 0.65% x̄: 0.58% x̃: 0.58% 95% mean confidence interval for cycles value: -3.25 3.25 95% mean confidence interval for cycles %-change: -0.93% 0.94% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857756 -> 4857744 (<.01%) instructions in affected programs: 294 -> 282 (-4.08%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 5.71% x̄: 4.40% x̃: 4.55% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -5.71% -3.09% Instructions are helped. total cycles in shared programs: 122178730 -> 122178722 (<.01%) cycles in affected programs: 700 -> 692 (-1.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-03 01:39:54 +00:00
# fmin(b2f(a), b)
# bcsel(a, fmin(b2f(a), b), fmin(b2f(a), b))
# bcsel(a, fmin(b2f(True), b), fmin(b2f(False), b))
# bcsel(a, fmin(1.0, b), fmin(0.0, b))
#
# Since b is a constant, constant folding will eliminate the fmin and the
# fmax. If b is > 1.0, the bcsel will be replaced with a b2f.
(('fmin', ('b2f', 'a@1'), '#b'), ('bcsel', a, ('fmin', b, 1.0), ('fmin', b, 0.0))),
nir: Replace fmin(b2f(a), b) with a bcsel All of the affected shaders are HDR mappers from Serious Sam 3. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14516285 -> 14516273 (<.01%) instructions in affected programs: 348 -> 336 (-3.45%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 2.08% max: 6.67% x̄: 4.31% x̃: 4.17% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -5.55% -3.06% Instructions are helped. total cycles in shared programs: 533163876 -> 533163808 (<.01%) cycles in affected programs: 1144 -> 1076 (-5.94%) helped: 4 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 5.80% max: 6.08% x̄: 5.94% x̃: 5.94% 95% mean confidence interval for cycles value: -18.84 -15.16 95% mean confidence interval for cycles %-change: -6.20% -5.68% Cycles are helped. Sandy Bridge total instructions in shared programs: 10533321 -> 10533309 (<.01%) instructions in affected programs: 372 -> 360 (-3.23%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 2.00% max: 5.88% x̄: 3.91% x̃: 3.85% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -4.96% -2.86% Instructions are helped. total cycles in shared programs: 146136632 -> 146136428 (<.01%) cycles in affected programs: 11668 -> 11464 (-1.75%) helped: 12 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 0.99% max: 3.44% x̄: 2.20% x̃: 2.29% 95% mean confidence interval for cycles value: -17.66 -16.34 95% mean confidence interval for cycles %-change: -2.82% -1.58% Cycles are helped. Iron Lake total instructions in shared programs: 7886301 -> 7886277 (<.01%) instructions in affected programs: 576 -> 552 (-4.17%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 6.06% x̄: 4.51% x̃: 4.65% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -5.30% -3.72% Instructions are helped. total cycles in shared programs: 178113176 -> 178113176 (0.00%) cycles in affected programs: 2116 -> 2116 (0.00%) helped: 2 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.14% max: 1.14% x̄: 1.14% x̃: 1.14% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.50% max: 0.65% x̄: 0.58% x̃: 0.58% 95% mean confidence interval for cycles value: -3.25 3.25 95% mean confidence interval for cycles %-change: -0.93% 0.94% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857756 -> 4857744 (<.01%) instructions in affected programs: 294 -> 282 (-4.08%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 2.94% max: 5.71% x̄: 4.40% x̃: 4.55% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -5.71% -3.09% Instructions are helped. total cycles in shared programs: 122178730 -> 122178722 (<.01%) cycles in affected programs: 700 -> 692 (-1.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-03 01:39:54 +00:00
nir: Don't condition 'a-b < 0' -> 'a < b' on is_not_used_by_conditional Now that i965 recognizes that a-b generates the same conditions as 'a < b', there is no reason to condition this transformation on 'is not used by conditional.' Since this was the only user of the is_not_used_by_conditional function, delete it. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14400775 -> 14400595 (<.01%) instructions in affected programs: 36712 -> 36532 (-0.49%) helped: 182 HURT: 26 helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.15% max: 1.82% x̄: 0.70% x̃: 0.62% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 1.02% x̄: 0.82% x̃: 0.90% 95% mean confidence interval for instructions value: -0.97 -0.76 95% mean confidence interval for instructions %-change: -0.59% -0.43% Instructions are helped. total cycles in shared programs: 532929592 -> 532926345 (<.01%) cycles in affected programs: 478660 -> 475413 (-0.68%) helped: 187 HURT: 22 helped stats (abs) min: 2 max: 200 x̄: 20.99 x̃: 18 helped stats (rel) min: 0.23% max: 24.10% x̄: 1.48% x̃: 1.03% HURT stats (abs) min: 1 max: 214 x̄: 30.86 x̃: 11 HURT stats (rel) min: 0.01% max: 23.06% x̄: 3.12% x̃: 0.86% 95% mean confidence interval for cycles value: -19.50 -11.57 95% mean confidence interval for cycles %-change: -1.42% -0.58% Cycles are helped. GM45 and Iron Lake had similar results. (Iron Lake shown) total cycles in shared programs: 177851578 -> 177851810 (<.01%) cycles in affected programs: 24408 -> 24640 (0.95%) helped: 2 HURT: 4 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.42% max: 0.47% x̄: 0.44% x̃: 0.44% HURT stats (abs) min: 24 max: 108 x̄: 60.00 x̃: 54 HURT stats (rel) min: 0.52% max: 1.62% x̄: 1.04% x̃: 1.02% 95% mean confidence interval for cycles value: -7.75 85.08 95% mean confidence interval for cycles %-change: -0.39% 1.49% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-14 23:25:07 +00:00
(('flt', ('fadd(is_used_once)', a, ('fneg', b)), 0.0), ('flt', a, b)),
(('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
(('~bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
(('~bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
(('~bcsel', ('fge', a, b), b, a), ('fmin', a, b)),
(('~bcsel', ('fge', b, a), b, a), ('fmax', a, b)),
nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-12-04 00:30:44 +00:00
(('bcsel', ('i2b', a), b, c), ('bcsel', ('ine', a, 0), b, c)),
(('bcsel', ('inot', a), b, c), ('bcsel', a, c, b)),
(('bcsel', a, ('bcsel', a, b, c), d), ('bcsel', a, b, d)),
(('bcsel', a, b, ('bcsel', a, c, d)), ('bcsel', a, b, d)),
(('bcsel', a, ('bcsel', b, c, d), ('bcsel(is_used_once)', b, c, 'e')), ('bcsel', b, c, ('bcsel', a, d, 'e'))),
(('bcsel', a, ('bcsel(is_used_once)', b, c, d), ('bcsel', b, c, 'e')), ('bcsel', b, c, ('bcsel', a, d, 'e'))),
(('bcsel', a, ('bcsel', b, c, d), ('bcsel(is_used_once)', b, 'e', d)), ('bcsel', b, ('bcsel', a, c, 'e'), d)),
(('bcsel', a, ('bcsel(is_used_once)', b, c, d), ('bcsel', b, 'e', d)), ('bcsel', b, ('bcsel', a, c, 'e'), d)),
2019-03-30 03:51:20 +00:00
(('bcsel', a, True, b), ('ior', a, b)),
(('bcsel', a, a, b), ('ior', a, b)),
(('bcsel', a, b, False), ('iand', a, b)),
(('bcsel', a, b, a), ('iand', a, b)),
(('~fmin', a, a), a),
(('~fmax', a, a), a),
(('imin', a, a), a),
(('imax', a, a), a),
(('umin', a, a), a),
(('umax', a, a), a),
(('fmax', ('fmax', a, b), b), ('fmax', a, b)),
(('umax', ('umax', a, b), b), ('umax', a, b)),
(('imax', ('imax', a, b), b), ('imax', a, b)),
(('fmin', ('fmin', a, b), b), ('fmin', a, b)),
(('umin', ('umin', a, b), b), ('umin', a, b)),
(('imin', ('imin', a, b), b), ('imin', a, b)),
(('iand@32', a, ('inot', ('ishr', a, 31))), ('imax', a, 0)),
nir/algebraic: Simplify logic to detect sign of an integer This occurs in a handful of places in the soft-fp64 code, and that is the primary reason for the change. v2: Fix a typo in a comment. Noticed by Matt. Copy the correct fp64 shader-db results to the commit message. I realized that I used accidentally used the results from the next commit. Results on the 308 shaders extracted from the fp64 portion of the OpenGL CTS: Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) total instructions in shared programs: 906235 -> 906149 (<.01%) instructions in affected programs: 353966 -> 353880 (-0.02%) helped: 31 HURT: 2 helped stats (abs) min: 1 max: 8 x̄: 3.03 x̃: 3 helped stats (rel) min: 0.01% max: 1.59% x̄: 0.10% x̃: 0.04% HURT stats (abs) min: 3 max: 5 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02% 95% mean confidence interval for instructions value: -3.51 -1.70 95% mean confidence interval for instructions %-change: -0.19% <.01% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 7076552 -> 7076173 (<.01%) cycles in affected programs: 2878361 -> 2877982 (-0.01%) helped: 37 HURT: 2 helped stats (abs) min: 2 max: 48 x̄: 10.81 x̃: 6 helped stats (rel) min: <.01% max: 2.17% x̄: 0.47% x̃: 0.01% HURT stats (abs) min: 1 max: 20 x̄: 10.50 x̃: 10 HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -13.96 -5.48 95% mean confidence interval for cycles %-change: -0.72% -0.16% Cycles are helped. total fills in shared programs: 2064 -> 2065 (0.05%) fills in affected programs: 45 -> 46 (2.22%) helped: 0 HURT: 1 Regular shader-db results: All Gen7+ platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 17611530 -> 17611506 (<.01%) instructions in affected programs: 5934 -> 5910 (-0.40%) helped: 10 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2 helped stats (rel) min: 0.14% max: 1.24% x̄: 0.47% x̃: 0.34% 95% mean confidence interval for instructions value: -3.53 -1.27 95% mean confidence interval for instructions %-change: -0.78% -0.17% Instructions are helped. total cycles in shared programs: 338419178 -> 338419218 (<.01%) cycles in affected programs: 19244 -> 19284 (0.21%) helped: 4 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.05% max: 0.11% x̄: 0.08% x̃: 0.08% HURT stats (abs) min: 26 max: 26 x̄: 26.00 x̃: 26 HURT stats (rel) min: 1.20% max: 1.20% x̄: 1.20% x̃: 1.20% 95% mean confidence interval for cycles value: -9.08 22.41 95% mean confidence interval for cycles %-change: -0.35% 1.04% Inconclusive result (value mean confidence interval includes 0). No changes on any earlier Intel platform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-03 18:51:59 +00:00
# Simplify logic to detect sign of an integer.
(('ieq', ('iand', a, 0x80000000), 0x00000000), ('ige', a, 0)),
(('ine', ('iand', a, 0x80000000), 0x80000000), ('ige', a, 0)),
(('ine', ('iand', a, 0x80000000), 0x00000000), ('ilt', a, 0)),
(('ieq', ('iand', a, 0x80000000), 0x80000000), ('ilt', a, 0)),
(('ine', ('ushr', 'a@32', 31), 0), ('ilt', a, 0)),
(('ieq', ('ushr', 'a@32', 31), 0), ('ige', a, 0)),
(('ieq', ('ushr', 'a@32', 31), 1), ('ilt', a, 0)),
(('ine', ('ushr', 'a@32', 31), 1), ('ige', a, 0)),
(('ine', ('ishr', 'a@32', 31), 0), ('ilt', a, 0)),
(('ieq', ('ishr', 'a@32', 31), 0), ('ige', a, 0)),
(('ieq', ('ishr', 'a@32', 31), -1), ('ilt', a, 0)),
(('ine', ('ishr', 'a@32', 31), -1), ('ige', a, 0)),
(('fmin', a, ('fneg', a)), ('fneg', ('fabs', a))),
(('imin', a, ('ineg', a)), ('ineg', ('iabs', a))),
(('fmin', a, ('fneg', ('fabs', a))), ('fneg', ('fabs', a))),
(('imin', a, ('ineg', ('iabs', a))), ('ineg', ('iabs', a))),
(('~fmin', a, ('fabs', a)), a),
(('imin', a, ('iabs', a)), a),
(('~fmax', a, ('fneg', ('fabs', a))), a),
(('imax', a, ('ineg', ('iabs', a))), a),
(('fmax', a, ('fabs', a)), ('fabs', a)),
(('imax', a, ('iabs', a)), ('iabs', a)),
(('fmax', a, ('fneg', a)), ('fabs', a)),
(('imax', a, ('ineg', a)), ('iabs', a)),
(('~fmax', ('fabs', a), 0.0), ('fabs', a)),
(('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),
(('~fmax', ('fmin', a, 1.0), 0.0), ('fsat', a), '!options->lower_fsat'),
(('~fmin', ('fmax', a, -1.0), 0.0), ('fneg', ('fsat', ('fneg', a))), '!options->lower_fsat'),
(('~fmax', ('fmin', a, 0.0), -1.0), ('fneg', ('fsat', ('fneg', a))), '!options->lower_fsat'),
(('fsat', ('fsign', a)), ('b2f', ('flt', 0.0, a))),
(('fsat', ('b2f', a)), ('b2f', a)),
(('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
(('fsat', ('fsat', a)), ('fsat', a)),
(('fsat', ('fneg(is_used_once)', ('fadd(is_used_once)', a, b))), ('fsat', ('fadd', ('fneg', a), ('fneg', b))), '!options->lower_fsat'),
(('fsat', ('fneg(is_used_once)', ('fmul(is_used_once)', a, b))), ('fsat', ('fmul', ('fneg', a), b)), '!options->lower_fsat'),
nir/algebraic: Push unary operations into source operands of fsat source Pushing a unary operation, like fneg, into the operation that generates its operand allows the fsat to be applied to the inner instruction instead of on a separate instruction that performs the unary operation. This changes fmul ssa_100, ssa_99, ssa_98 fmov.sat ssa_101, -ssa_100 into fmul.sat ssa_100, -ssa_99, ssa_98 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 17228658 -> 17228584 (<.01%) instructions in affected programs: 3163 -> 3089 (-2.34%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2 helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51% 95% mean confidence interval for instructions value: -1.66 -1.37 95% mean confidence interval for instructions %-change: -4.37% -3.00% Instructions are helped. total cycles in shared programs: 360937144 -> 360936431 (<.01%) cycles in affected programs: 24029 -> 23316 (-2.97%) helped: 47 HURT: 2 helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16 helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for cycles value: -16.05 -13.05 95% mean confidence interval for cycles %-change: -4.07% -3.15% Cycles are helped. All Gen7 and earlier platforms had similar results. (Haswell shown) total instructions in shared programs: 13536059 -> 13535884 (<.01%) instructions in affected programs: 8797 -> 8622 (-1.99%) helped: 150 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96% 95% mean confidence interval for instructions value: -1.23 -1.11 95% mean confidence interval for instructions %-change: -3.97% -3.05% Instructions are helped. total cycles in shared programs: 357696119 -> 357694193 (<.01%) cycles in affected programs: 50216 -> 48290 (-3.84%) helped: 109 HURT: 14 helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16 helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37% HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5 HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92% 95% mean confidence interval for cycles value: -19.27 -12.05 95% mean confidence interval for cycles %-change: -7.34% -5.31% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-10-23 22:30:41 +01:00
(('fsat', ('fabs(is_used_once)', ('fmul(is_used_once)', a, b))), ('fsat', ('fmul', ('fabs', a), ('fabs', b))), '!options->lower_fsat'),
(('fmin', ('fmax', ('fmin', ('fmax', a, b), c), b), c), ('fmin', ('fmax', a, b), c)),
(('imin', ('imax', ('imin', ('imax', a, b), c), b), c), ('imin', ('imax', a, b), c)),
(('umin', ('umax', ('umin', ('umax', a, b), c), b), c), ('umin', ('umax', a, b), c)),
(('fmax', ('fsat', a), '#b@32(is_zero_to_one)'), ('fsat', ('fmax', a, b))),
(('fmin', ('fsat', a), '#b@32(is_zero_to_one)'), ('fsat', ('fmin', a, b))),
(('extract_u8', ('imin', ('imax', a, 0), 0xff), 0), ('imin', ('imax', a, 0), 0xff)),
nir: Be more conservative about rearranging or-compounded compares If both comparisons are used as sources for instructions other than the ior, this transformation is detrimental. If the non-identical value in both compares is constant, the fmin or fmax will be constant-folded away, so the transformation is always a win. shader-db results: Skylake total instructions in shared programs: 14526147 -> 14525898 (<.01%) instructions in affected programs: 70239 -> 69990 (-0.35%) helped: 102 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1 helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20% 95% mean confidence interval for instructions value: -2.86 -2.02 95% mean confidence interval for instructions %-change: -0.46% -0.31% Instructions are helped. total cycles in shared programs: 533120531 -> 533119892 (<.01%) cycles in affected programs: 994875 -> 994236 (-0.06%) helped: 76 HURT: 26 helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13 helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18% HURT stats (abs) min: 1 max: 167 x̄: 54.62 x̃: 26 HURT stats (rel) min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39% 95% mean confidence interval for cycles value: -19.44 6.91 95% mean confidence interval for cycles %-change: -0.30% 0.15% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 14816005 -> 14815787 (<.01%) instructions in affected programs: 64658 -> 64440 (-0.34%) helped: 97 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1 helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20% 95% mean confidence interval for instructions value: -2.62 -1.87 95% mean confidence interval for instructions %-change: -0.45% -0.30% Instructions are helped. total cycles in shared programs: 559340386 -> 559339907 (<.01%) cycles in affected programs: 1090491 -> 1090012 (-0.04%) helped: 66 HURT: 28 helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16 helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27% HURT stats (abs) min: 2 max: 226 x̄: 39.07 x̃: 11 HURT stats (rel) min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20% 95% mean confidence interval for cycles value: -15.94 5.75 95% mean confidence interval for cycles %-change: -0.35% 0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 1 Haswell total instructions in shared programs: 9034106 -> 9033948 (<.01%) instructions in affected programs: 24096 -> 23938 (-0.66%) helped: 38 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4 helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64% 95% mean confidence interval for instructions value: -4.71 -3.60 95% mean confidence interval for instructions %-change: -0.84% -0.58% Instructions are helped. total cycles in shared programs: 84631628 -> 84631402 (<.01%) cycles in affected programs: 148674 -> 148448 (-0.15%) helped: 14 HURT: 14 helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12 helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21% HURT stats (abs) min: 1 max: 10 x̄: 6.00 x̃: 5 HURT stats (rel) min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11% 95% mean confidence interval for cycles value: -19.42 3.28 95% mean confidence interval for cycles %-change: -0.59% 0.05% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 10015456 -> 10015293 (<.01%) instructions in affected programs: 27701 -> 27538 (-0.59%) helped: 38 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4 helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52% 95% mean confidence interval for instructions value: -4.87 -3.71 95% mean confidence interval for instructions %-change: -0.82% -0.51% Instructions are helped. total cycles in shared programs: 87524771 -> 87524569 (<.01%) cycles in affected programs: 112324 -> 112122 (-0.18%) helped: 6 HURT: 12 helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20 helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26% HURT stats (abs) min: 1 max: 16 x̄: 5.50 x̃: 5 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -29.14 6.69 95% mean confidence interval for cycles %-change: -0.93% 0.08% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Sandy Bridge total instructions in shared programs: 10545655 -> 10545465 (<.01%) instructions in affected programs: 37198 -> 37008 (-0.51%) helped: 42 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4 helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49% 95% mean confidence interval for instructions value: -5.14 -3.91 95% mean confidence interval for instructions %-change: -0.68% -0.47% Instructions are helped. total cycles in shared programs: 146113059 -> 146112427 (<.01%) cycles in affected programs: 423514 -> 422882 (-0.15%) helped: 32 HURT: 10 helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12 helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11% HURT stats (abs) min: 12 max: 19 x̄: 14.70 x̃: 14 HURT stats (rel) min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14% 95% mean confidence interval for cycles value: -26.03 -4.07 95% mean confidence interval for cycles %-change: -0.43% -0.05% Cycles are helped. Iron Lake total instructions in shared programs: 7886959 -> 7886925 (<.01%) instructions in affected programs: 1340 -> 1306 (-2.54%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8 helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43% 95% mean confidence interval for instructions value: -20.44 3.44 95% mean confidence interval for instructions %-change: -5.78% 0.89% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 178116996 -> 178116888 (<.01%) cycles in affected programs: 6262 -> 6154 (-1.72%) helped: 2 HURT: 2 helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61 helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62% HURT stats (abs) min: 6 max: 8 x̄: 7.00 x̃: 7 HURT stats (rel) min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -93.27 39.27 95% mean confidence interval for cycles %-change: -5.38% 2.27% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857887 -> 4857870 (<.01%) instructions in affected programs: 674 -> 657 (-2.52%) helped: 2 HURT: 0 total cycles in shared programs: 122180816 -> 122180744 (<.01%) cycles in affected programs: 3764 -> 3692 (-1.91%) helped: 1 HURT: 1 helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78 helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-04 21:30:49 +00:00
(('~ior', ('flt(is_used_once)', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))),
(('~ior', ('flt(is_used_once)', a, c), ('flt', b, c)), ('flt', ('fmin', a, b), c)),
(('~ior', ('fge(is_used_once)', a, b), ('fge', a, c)), ('fge', a, ('fmin', b, c))),
(('~ior', ('fge(is_used_once)', a, c), ('fge', b, c)), ('fge', ('fmax', a, b), c)),
(('~ior', ('flt', a, '#b'), ('flt', a, '#c')), ('flt', a, ('fmax', b, c))),
(('~ior', ('flt', '#a', c), ('flt', '#b', c)), ('flt', ('fmin', a, b), c)),
(('~ior', ('fge', a, '#b'), ('fge', a, '#c')), ('fge', a, ('fmin', b, c))),
(('~ior', ('fge', '#a', c), ('fge', '#b', c)), ('fge', ('fmax', a, b), c)),
nir: Rearrange and-compounded float compares If both comparisons are used as sources for instructions other than the iand, this transformation is detrimental. If the non-identical value in both compares is constant, the fmin or fmax will be constant-folded away, so the transformation is always a win. It is interesting to me that on Iron Lake only 81 shaders have instruction counts changed, but 726 shaders have cycle counts changed. shader-db results: Skylake total instructions in shared programs: 14525728 -> 14521017 (-0.03%) instructions in affected programs: 1164726 -> 1160015 (-0.40%) helped: 1692 HURT: 5 helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2 helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33% HURT stats (abs) min: 1 max: 12 x̄: 3.20 x̃: 1 HURT stats (rel) min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86% 95% mean confidence interval for instructions value: -3.52 -2.03 95% mean confidence interval for instructions %-change: -0.86% -0.74% Instructions are helped. total cycles in shared programs: 533115449 -> 532991404 (-0.02%) cycles in affected programs: 119401803 -> 119277758 (-0.10%) helped: 1145 HURT: 467 helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18 helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42% HURT stats (abs) min: 1 max: 1590 x̄: 92.15 x̃: 15 HURT stats (rel) min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39% 95% mean confidence interval for cycles value: -122.16 -31.74 95% mean confidence interval for cycles %-change: -0.94% -0.57% Cycles are helped. total spills in shared programs: 9597 -> 9534 (-0.66%) spills in affected programs: 403 -> 340 (-15.63%) helped: 1 HURT: 1 total fills in shared programs: 13904 -> 13790 (-0.82%) fills in affected programs: 1627 -> 1513 (-7.01%) helped: 2 HURT: 1 LOST: 0 GAINED: 2 Broadwell total instructions in shared programs: 14816966 -> 14812590 (-0.03%) instructions in affected programs: 1499885 -> 1495509 (-0.29%) helped: 1672 HURT: 15 helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33% HURT stats (abs) min: 1 max: 21 x̄: 9.20 x̃: 8 HURT stats (rel) min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53% 95% mean confidence interval for instructions value: -3.14 -2.05 95% mean confidence interval for instructions %-change: -0.85% -0.73% Instructions are helped. total cycles in shared programs: 559353622 -> 559345595 (<.01%) cycles in affected programs: 139893703 -> 139885676 (<.01%) helped: 921 HURT: 697 helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18 helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87% HURT stats (abs) min: 1 max: 2370 x̄: 178.03 x̃: 38 HURT stats (rel) min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14% 95% mean confidence interval for cycles value: -59.64 49.72 95% mean confidence interval for cycles %-change: -1.02% -0.66% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78902 -> 78861 (-0.05%) spills in affected programs: 2418 -> 2377 (-1.70%) helped: 1 HURT: 11 total fills in shared programs: 83782 -> 83678 (-0.12%) fills in affected programs: 3515 -> 3411 (-2.96%) helped: 2 HURT: 11 LOST: 0 GAINED: 5 Haswell and Ivy Bridge had similar results. Haswell shown. total instructions in shared programs: 9033898 -> 9032010 (-0.02%) instructions in affected programs: 308064 -> 306176 (-0.61%) helped: 921 HURT: 4 helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1 helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23% 95% mean confidence interval for instructions value: -2.21 -1.87 95% mean confidence interval for instructions %-change: -0.88% -0.68% Instructions are helped. total cycles in shared programs: 84628949 -> 84620520 (<.01%) cycles in affected programs: 2164913 -> 2156484 (-0.39%) helped: 518 HURT: 359 helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20 helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01% HURT stats (abs) min: 1 max: 586 x̄: 36.43 x̃: 8 HURT stats (rel) min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40% 95% mean confidence interval for cycles value: -15.17 -4.05 95% mean confidence interval for cycles %-change: -0.77% -0.32% Cycles are helped. LOST: 0 GAINED: 4 Sandy Bridge total instructions in shared programs: 10544860 -> 10542933 (-0.02%) instructions in affected programs: 360019 -> 358092 (-0.54%) helped: 931 HURT: 4 helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1 helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33% 95% mean confidence interval for instructions value: -2.23 -1.89 95% mean confidence interval for instructions %-change: -0.76% -0.58% Instructions are helped. total cycles in shared programs: 146106820 -> 146098397 (<.01%) cycles in affected programs: 3435047 -> 3426624 (-0.25%) helped: 572 HURT: 329 helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15 helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33% HURT stats (abs) min: 1 max: 1714 x̄: 30.93 x̃: 6 HURT stats (rel) min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19% 95% mean confidence interval for cycles value: -16.85 -1.85 95% mean confidence interval for cycles %-change: -0.39% -0.01% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 7886925 -> 7886711 (<.01%) instructions in affected programs: 25763 -> 25549 (-0.83%) helped: 75 HURT: 6 helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1 helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53% HURT stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 1 HURT stats (rel) min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86% 95% mean confidence interval for instructions value: -3.69 -1.60 95% mean confidence interval for instructions %-change: -2.54% -0.57% Instructions are helped. total cycles in shared programs: 178116888 -> 178115324 (<.01%) cycles in affected programs: 5858790 -> 5857226 (-0.03%) helped: 484 HURT: 242 helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6 helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06% HURT stats (abs) min: 2 max: 76 x̄: 4.07 x̃: 2 HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03% 95% mean confidence interval for cycles value: -2.76 -1.55 95% mean confidence interval for cycles %-change: -0.12% 0.01% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 4857870 -> 4857762 (<.01%) instructions in affected programs: 13994 -> 13886 (-0.77%) helped: 39 HURT: 5 helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48% HURT stats (abs) min: 1 max: 16 x̄: 4.00 x̃: 1 HURT stats (rel) min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86% 95% mean confidence interval for instructions value: -3.86 -1.05 95% mean confidence interval for instructions %-change: -2.61% 0.04% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 122180744 -> 122179674 (<.01%) cycles in affected programs: 3686646 -> 3685576 (-0.03%) helped: 273 HURT: 141 helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6 helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06% HURT stats (abs) min: 2 max: 76 x̄: 3.66 x̃: 2 HURT stats (rel) min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02% 95% mean confidence interval for cycles value: -3.42 -1.75 95% mean confidence interval for cycles %-change: -0.15% 0.03% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-04 23:21:30 +00:00
(('~iand', ('flt(is_used_once)', a, b), ('flt', a, c)), ('flt', a, ('fmin', b, c))),
(('~iand', ('flt(is_used_once)', a, c), ('flt', b, c)), ('flt', ('fmax', a, b), c)),
(('~iand', ('fge(is_used_once)', a, b), ('fge', a, c)), ('fge', a, ('fmax', b, c))),
(('~iand', ('fge(is_used_once)', a, c), ('fge', b, c)), ('fge', ('fmin', a, b), c)),
(('~iand', ('flt', a, '#b'), ('flt', a, '#c')), ('flt', a, ('fmin', b, c))),
(('~iand', ('flt', '#a', c), ('flt', '#b', c)), ('flt', ('fmax', a, b), c)),
(('~iand', ('fge', a, '#b'), ('fge', a, '#c')), ('fge', a, ('fmax', b, c))),
(('~iand', ('fge', '#a', c), ('fge', '#b', c)), ('fge', ('fmin', a, b), c)),
nir: Be more conservative about rearranging or-compounded compares If both comparisons are used as sources for instructions other than the ior, this transformation is detrimental. If the non-identical value in both compares is constant, the fmin or fmax will be constant-folded away, so the transformation is always a win. shader-db results: Skylake total instructions in shared programs: 14526147 -> 14525898 (<.01%) instructions in affected programs: 70239 -> 69990 (-0.35%) helped: 102 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1 helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20% 95% mean confidence interval for instructions value: -2.86 -2.02 95% mean confidence interval for instructions %-change: -0.46% -0.31% Instructions are helped. total cycles in shared programs: 533120531 -> 533119892 (<.01%) cycles in affected programs: 994875 -> 994236 (-0.06%) helped: 76 HURT: 26 helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13 helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18% HURT stats (abs) min: 1 max: 167 x̄: 54.62 x̃: 26 HURT stats (rel) min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39% 95% mean confidence interval for cycles value: -19.44 6.91 95% mean confidence interval for cycles %-change: -0.30% 0.15% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 14816005 -> 14815787 (<.01%) instructions in affected programs: 64658 -> 64440 (-0.34%) helped: 97 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1 helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20% 95% mean confidence interval for instructions value: -2.62 -1.87 95% mean confidence interval for instructions %-change: -0.45% -0.30% Instructions are helped. total cycles in shared programs: 559340386 -> 559339907 (<.01%) cycles in affected programs: 1090491 -> 1090012 (-0.04%) helped: 66 HURT: 28 helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16 helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27% HURT stats (abs) min: 2 max: 226 x̄: 39.07 x̃: 11 HURT stats (rel) min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20% 95% mean confidence interval for cycles value: -15.94 5.75 95% mean confidence interval for cycles %-change: -0.35% 0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 1 Haswell total instructions in shared programs: 9034106 -> 9033948 (<.01%) instructions in affected programs: 24096 -> 23938 (-0.66%) helped: 38 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4 helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64% 95% mean confidence interval for instructions value: -4.71 -3.60 95% mean confidence interval for instructions %-change: -0.84% -0.58% Instructions are helped. total cycles in shared programs: 84631628 -> 84631402 (<.01%) cycles in affected programs: 148674 -> 148448 (-0.15%) helped: 14 HURT: 14 helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12 helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21% HURT stats (abs) min: 1 max: 10 x̄: 6.00 x̃: 5 HURT stats (rel) min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11% 95% mean confidence interval for cycles value: -19.42 3.28 95% mean confidence interval for cycles %-change: -0.59% 0.05% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 10015456 -> 10015293 (<.01%) instructions in affected programs: 27701 -> 27538 (-0.59%) helped: 38 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4 helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52% 95% mean confidence interval for instructions value: -4.87 -3.71 95% mean confidence interval for instructions %-change: -0.82% -0.51% Instructions are helped. total cycles in shared programs: 87524771 -> 87524569 (<.01%) cycles in affected programs: 112324 -> 112122 (-0.18%) helped: 6 HURT: 12 helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20 helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26% HURT stats (abs) min: 1 max: 16 x̄: 5.50 x̃: 5 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -29.14 6.69 95% mean confidence interval for cycles %-change: -0.93% 0.08% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Sandy Bridge total instructions in shared programs: 10545655 -> 10545465 (<.01%) instructions in affected programs: 37198 -> 37008 (-0.51%) helped: 42 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4 helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49% 95% mean confidence interval for instructions value: -5.14 -3.91 95% mean confidence interval for instructions %-change: -0.68% -0.47% Instructions are helped. total cycles in shared programs: 146113059 -> 146112427 (<.01%) cycles in affected programs: 423514 -> 422882 (-0.15%) helped: 32 HURT: 10 helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12 helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11% HURT stats (abs) min: 12 max: 19 x̄: 14.70 x̃: 14 HURT stats (rel) min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14% 95% mean confidence interval for cycles value: -26.03 -4.07 95% mean confidence interval for cycles %-change: -0.43% -0.05% Cycles are helped. Iron Lake total instructions in shared programs: 7886959 -> 7886925 (<.01%) instructions in affected programs: 1340 -> 1306 (-2.54%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8 helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43% 95% mean confidence interval for instructions value: -20.44 3.44 95% mean confidence interval for instructions %-change: -5.78% 0.89% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 178116996 -> 178116888 (<.01%) cycles in affected programs: 6262 -> 6154 (-1.72%) helped: 2 HURT: 2 helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61 helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62% HURT stats (abs) min: 6 max: 8 x̄: 7.00 x̃: 7 HURT stats (rel) min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -93.27 39.27 95% mean confidence interval for cycles %-change: -5.38% 2.27% Inconclusive result (value mean confidence interval includes 0). GM45 total instructions in shared programs: 4857887 -> 4857870 (<.01%) instructions in affected programs: 674 -> 657 (-2.52%) helped: 2 HURT: 0 total cycles in shared programs: 122180816 -> 122180744 (<.01%) cycles in affected programs: 3764 -> 3692 (-1.91%) helped: 1 HURT: 1 helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78 helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-04 21:30:49 +00:00
(('ior', ('ilt(is_used_once)', a, b), ('ilt', a, c)), ('ilt', a, ('imax', b, c))),
(('ior', ('ilt(is_used_once)', a, c), ('ilt', b, c)), ('ilt', ('imin', a, b), c)),
(('ior', ('ige(is_used_once)', a, b), ('ige', a, c)), ('ige', a, ('imin', b, c))),
(('ior', ('ige(is_used_once)', a, c), ('ige', b, c)), ('ige', ('imax', a, b), c)),
(('ior', ('ult(is_used_once)', a, b), ('ult', a, c)), ('ult', a, ('umax', b, c))),
(('ior', ('ult(is_used_once)', a, c), ('ult', b, c)), ('ult', ('umin', a, b), c)),
(('ior', ('uge(is_used_once)', a, b), ('uge', a, c)), ('uge', a, ('umin', b, c))),
(('ior', ('uge(is_used_once)', a, c), ('uge', b, c)), ('uge', ('umax', a, b), c)),
(('iand', ('ilt(is_used_once)', a, b), ('ilt', a, c)), ('ilt', a, ('imin', b, c))),
(('iand', ('ilt(is_used_once)', a, c), ('ilt', b, c)), ('ilt', ('imax', a, b), c)),
(('iand', ('ige(is_used_once)', a, b), ('ige', a, c)), ('ige', a, ('imax', b, c))),
(('iand', ('ige(is_used_once)', a, c), ('ige', b, c)), ('ige', ('imin', a, b), c)),
(('iand', ('ult(is_used_once)', a, b), ('ult', a, c)), ('ult', a, ('umin', b, c))),
(('iand', ('ult(is_used_once)', a, c), ('ult', b, c)), ('ult', ('umax', a, b), c)),
(('iand', ('uge(is_used_once)', a, b), ('uge', a, c)), ('uge', a, ('umax', b, c))),
(('iand', ('uge(is_used_once)', a, c), ('uge', b, c)), ('uge', ('umin', a, b), c)),
nir/algebraic: Recognize (a < 0 || 0 < b) as min(a, -b) < 0 Similar to commit 97e6c1b9 and f5cf74d8ba8c. First apply 0 < b => -b < 0 to get (a < 0 || -b < 0), then apply some pre-existing rules to get min(a, -b) < 0. v2: Substantially update the comment explaining the use of is_used_once and the duplication of patterns. Suggested by Caio. Also, while flt and fge are not commutative, ior and iand are. Half of the original patterns were redundant, so delete them. As alternate justification for deleting them, fmin(a, -b) < 0 <=> 0 < fmax(-a, b). Proof left as an exercise for the reader. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333789 -> 16333713 (<.01%) instructions in affected programs: 11424 -> 11348 (-0.67%) helped: 32 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.20% max: 1.67% x̄: 0.76% x̃: 0.69% 95% mean confidence interval for instructions value: -3.03 -1.72 95% mean confidence interval for instructions %-change: -0.89% -0.62% Instructions are helped. total cycles in shared programs: 367598295 -> 367596791 (<.01%) cycles in affected programs: 141414 -> 139910 (-1.06%) helped: 23 HURT: 6 helped stats (abs) min: 3 max: 386 x̄: 72.52 x̃: 20 helped stats (rel) min: 0.15% max: 4.86% x̄: 1.01% x̃: 0.76% HURT stats (abs) min: 4 max: 88 x̄: 27.33 x̃: 12 HURT stats (rel) min: 0.22% max: 3.95% x̄: 1.08% x̃: 0.59% 95% mean confidence interval for cycles value: -93.51 -10.21 95% mean confidence interval for cycles %-change: -1.10% -0.05% Cycles are helped. total instructions in shared programs: 10830836 -> 10830779 (<.01%) instructions in affected programs: 6895 -> 6838 (-0.83%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.75 x̃: 1 helped stats (rel) min: 0.14% max: 1.61% x̄: 0.65% x̃: 0.33% 95% mean confidence interval for instructions value: -8.46 -1.04 95% mean confidence interval for instructions %-change: -1.03% -0.27% Instructions are helped. total cycles in shared programs: 154028477 -> 154032740 (<.01%) cycles in affected programs: 178433 -> 182696 (2.39%) helped: 3 HURT: 9 helped stats (abs) min: 3 max: 20 x̄: 11.00 x̃: 10 helped stats (rel) min: 0.07% max: 0.20% x̄: 0.12% x̃: 0.09% HURT stats (abs) min: 27 max: 1415 x̄: 477.33 x̃: 262 HURT stats (rel) min: 0.22% max: 6.45% x̄: 2.49% x̃: 1.76% 95% mean confidence interval for cycles value: 28.68 681.82 95% mean confidence interval for cycles %-change: 0.37% 3.30% Cycles are HURT. Iron Lake total instructions in shared programs: 8137966 -> 8137992 (<.01%) instructions in affected programs: 3281 -> 3307 (0.79%) helped: 0 HURT: 6 HURT stats (abs) min: 3 max: 7 x̄: 4.33 x̃: 3 HURT stats (rel) min: 0.63% max: 1.01% x̄: 0.76% x̃: 0.64% 95% mean confidence interval for instructions value: 2.17 6.50 95% mean confidence interval for instructions %-change: 0.56% 0.96% Instructions are HURT. total cycles in shared programs: 188539386 -> 188540038 (<.01%) cycles in affected programs: 103826 -> 104478 (0.63%) helped: 0 HURT: 7 HURT stats (abs) min: 16 max: 218 x̄: 93.14 x̃: 80 HURT stats (rel) min: 0.14% max: 0.95% x̄: 0.53% x̃: 0.46% 95% mean confidence interval for cycles value: 10.26 176.02 95% mean confidence interval for cycles %-change: 0.24% 0.81% Cycles are HURT. GM45 total instructions in shared programs: 5008876 -> 5008889 (<.01%) instructions in affected programs: 1645 -> 1658 (0.79%) helped: 0 HURT: 3 HURT stats (abs) min: 3 max: 7 x̄: 4.33 x̃: 3 HURT stats (rel) min: 0.63% max: 1.00% x̄: 0.76% x̃: 0.63% total cycles in shared programs: 128968950 -> 128969426 (<.01%) cycles in affected programs: 64854 -> 65330 (0.73%) helped: 0 HURT: 4 HURT stats (abs) min: 18 max: 218 x̄: 119.00 x̃: 120 HURT stats (rel) min: 0.14% max: 0.95% x̄: 0.60% x̃: 0.66% 95% mean confidence interval for cycles value: -62.92 300.92 95% mean confidence interval for cycles %-change: -0.05% 1.26% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-21 21:52:58 +00:00
# These derive from the previous patterns with the application of b < 0 <=>
# 0 < -b. The transformation should be applied if either comparison is
# used once as this ensures that the number of comparisons will not
# increase. The sources to the ior and iand are not symmetric, so the
# rules have to be duplicated to get this behavior.
(('~ior', ('flt(is_used_once)', 0.0, 'a@32'), ('flt', 'b@32', 0.0)), ('flt', 0.0, ('fmax', a, ('fneg', b)))),
(('~ior', ('flt', 0.0, 'a@32'), ('flt(is_used_once)', 'b@32', 0.0)), ('flt', 0.0, ('fmax', a, ('fneg', b)))),
(('~ior', ('fge(is_used_once)', 0.0, 'a@32'), ('fge', 'b@32', 0.0)), ('fge', 0.0, ('fmin', a, ('fneg', b)))),
(('~ior', ('fge', 0.0, 'a@32'), ('fge(is_used_once)', 'b@32', 0.0)), ('fge', 0.0, ('fmin', a, ('fneg', b)))),
(('~iand', ('flt(is_used_once)', 0.0, 'a@32'), ('flt', 'b@32', 0.0)), ('flt', 0.0, ('fmin', a, ('fneg', b)))),
(('~iand', ('flt', 0.0, 'a@32'), ('flt(is_used_once)', 'b@32', 0.0)), ('flt', 0.0, ('fmin', a, ('fneg', b)))),
(('~iand', ('fge(is_used_once)', 0.0, 'a@32'), ('fge', 'b@32', 0.0)), ('fge', 0.0, ('fmax', a, ('fneg', b)))),
(('~iand', ('fge', 0.0, 'a@32'), ('fge(is_used_once)', 'b@32', 0.0)), ('fge', 0.0, ('fmax', a, ('fneg', b)))),
2019-02-12 00:47:34 +00:00
# Common pattern like 'if (i == 0 || i == 1 || ...)'
(('ior', ('ieq', a, 0), ('ieq', a, 1)), ('uge', 1, a)),
(('ior', ('uge', 1, a), ('ieq', a, 2)), ('uge', 2, a)),
(('ior', ('uge', 2, a), ('ieq', a, 3)), ('uge', 3, a)),
# The (i2f32, ...) part is an open-coded fsign. When that is combined with
# the bcsel, it's basically copysign(1.0, a). There is no copysign in NIR,
# so emit an open-coded version of that.
(('bcsel@32', ('feq', a, 0.0), 1.0, ('i2f32', ('iadd', ('b2i32', ('flt', 0.0, 'a@32')), ('ineg', ('b2i32', ('flt', 'a@32', 0.0)))))),
('ior', 0x3f800000, ('iand', a, 0x80000000))),
2019-03-30 03:51:20 +00:00
(('ior', a, ('ieq', a, False)), True),
(('ior', a, ('inot', a)), -1),
(('ine', ('ineg', ('b2i32', 'a@1')), ('ineg', ('b2i32', 'b@1'))), ('ine', a, b)),
(('b2i32', ('ine', 'a@1', 'b@1')), ('b2i32', ('ixor', a, b))),
nir/algebraic: optimize ior(ine(a, 0), ine(b, 0)) to ine(ior(a, b), 0) Like 70f9e2589e6b. Also scrub the unnecessary size qualifier in both replacement patterns. This occurs in a handful of places in the soft-fp64 code, and that is the primary reason for the change. Perhaps the patterns that generate umin should be conditioned on something, but I'm not sure what. lower_bitops might cover the cases that matter, but it seems ugly. Results on the 308 shaders extracted from the fp64 portion of the OpenGL CTS: Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) total instructions in shared programs: 936505 -> 933388 (-0.33%) instructions in affected programs: 925719 -> 922602 (-0.34%) helped: 154 HURT: 1 helped stats (abs) min: 1 max: 211 x̄: 35.45 x̃: 16 helped stats (rel) min: 0.34% max: 9.30% x̄: 2.28% x̃: 0.96% HURT stats (abs) min: 2342 max: 2342 x̄: 2342.00 x̃: 2342 HURT stats (rel) min: 2.28% max: 2.28% x̄: 2.28% x̃: 2.28% 95% mean confidence interval for instructions value: -51.21 10.99 95% mean confidence interval for instructions %-change: -2.61% -1.89% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 7323502 -> 7306184 (-0.24%) cycles in affected programs: 7220376 -> 7203058 (-0.24%) helped: 126 HURT: 1 helped stats (abs) min: 2 max: 946 x̄: 159.10 x̃: 95 helped stats (rel) min: 0.01% max: 9.62% x̄: 0.80% x̃: 0.37% HURT stats (abs) min: 2728 max: 2728 x̄: 2728.00 x̃: 2728 HURT stats (rel) min: 0.37% max: 0.37% x̄: 0.37% x̃: 0.37% 95% mean confidence interval for cycles value: -192.07 -80.66 95% mean confidence interval for cycles %-change: -1.07% -0.51% Cycles are helped. total spills in shared programs: 635 -> 817 (28.66%) spills in affected programs: 635 -> 817 (28.66%) helped: 0 HURT: 3 total fills in shared programs: 2065 -> 2438 (18.06%) fills in affected programs: 2019 -> 2392 (18.47%) helped: 0 HURT: 2 Regular shader-db results: All Haswell+ platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 17611506 -> 17611489 (<.01%) instructions in affected programs: 33442 -> 33425 (-0.05%) helped: 32 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 1.69 x̃: 1 helped stats (rel) min: 0.08% max: 1.90% x̄: 0.27% x̃: 0.11% HURT stats (abs) min: 1 max: 15 x̄: 6.17 x̃: 5 HURT stats (rel) min: 0.09% max: 1.50% x̄: 0.65% x̃: 0.55% 95% mean confidence interval for instructions value: -1.70 0.80 95% mean confidence interval for instructions %-change: -0.30% 0.05% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 338419218 -> 338418502 (<.01%) cycles in affected programs: 385795 -> 385079 (-0.19%) helped: 42 HURT: 3 helped stats (abs) min: 2 max: 192 x̄: 24.57 x̃: 16 helped stats (rel) min: 0.04% max: 2.09% x̄: 0.33% x̃: 0.22% HURT stats (abs) min: 64 max: 164 x̄: 105.33 x̃: 88 HURT stats (rel) min: 0.77% max: 1.58% x̄: 1.09% x̃: 0.93% 95% mean confidence interval for cycles value: -29.76 -2.06 95% mean confidence interval for cycles %-change: -0.40% -0.07% Cycles are helped. Ivy Bridge and Sandy Bridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11875620 -> 11875617 (<.01%) instructions in affected programs: 421 -> 418 (-0.71%) helped: 2 HURT: 0 total cycles in shared programs: 178245336 -> 178245326 (<.01%) cycles in affected programs: 3425 -> 3415 (-0.29%) helped: 2 HURT: 0 No changes on Gen4 or Gen5. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2019-10-31 00:41:41 +00:00
(('iand', ('ieq', 'a@32', 0), ('ieq', 'b@32', 0)), ('ieq', ('ior', a, b), 0), '!options->lower_bitops'),
(('ior', ('ine', 'a@32', 0), ('ine', 'b@32', 0)), ('ine', ('ior', a, b), 0), '!options->lower_bitops'),
nir: Undo possible damage caused by rearranging or-compounded float compares shader-db results: Skylake and Broadwell had similar results (Skylake shown) total instructions in shared programs: 14525898 -> 14525836 (<.01%) instructions in affected programs: 1964 -> 1902 (-3.16%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1 helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86% 95% mean confidence interval for instructions value: -9.46 0.60 95% mean confidence interval for instructions %-change: -3.97% -0.24% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533119892 -> 533115756 (<.01%) cycles in affected programs: 96061 -> 91925 (-4.31%) helped: 13 HURT: 1 helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300 helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46% 95% mean confidence interval for cycles value: -379.43 -211.43 95% mean confidence interval for cycles %-change: -4.84% -3.01% Cycles are helped. Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown). total instructions in shared programs: 9033948 -> 9033898 (<.01%) instructions in affected programs: 535 -> 485 (-9.35%) helped: 2 HURT: 0 total cycles in shared programs: 84631402 -> 84628949 (<.01%) cycles in affected programs: 63197 -> 60744 (-3.88%) helped: 13 HURT: 2 helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140 helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01% HURT stats (abs) min: 4 max: 8 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for cycles value: -253.40 -73.67 95% mean confidence interval for cycles %-change: -4.24% -2.25% Cycles are helped. No changes on GM45 or Iron Lake. v2: Add a couple more tautological compares. Suggested by Elie. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-05 21:29:26 +00:00
# These patterns can result when (a < b || a < c) => (a < min(b, c))
# transformations occur before constant propagation and loop-unrolling.
(('~flt', a, ('fmax', b, a)), ('flt', a, b)),
(('~flt', ('fmin', a, b), a), ('flt', b, a)),
(('~fge', a, ('fmin', b, a)), True),
(('~fge', ('fmax', a, b), a), True),
(('~flt', a, ('fmin', b, a)), False),
(('~flt', ('fmax', a, b), a), False),
nir: Add some missing "optimization undo" patterns d8d18516b0a and 03fb13f6467 added some patterns to undo conversions like (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))) If further optimization cause some of the operands to either be the same or be constants, undoing the transformation can lead to further savings. I don't know why these patterns were not added in those patches. I did not check to see which specific patterns actually helped. I just added all of them for symmetry. This prevents some loop unrolling regressions Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 14369768 -> 14369557 (<.01%) instructions in affected programs: 44076 -> 43865 (-0.48%) helped: 141 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1 helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60% 95% mean confidence interval for instructions value: -1.67 -1.32 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 532430629 -> 532425772 (<.01%) cycles in affected programs: 1170832 -> 1165975 (-0.41%) helped: 101 HURT: 5 helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32 helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03% HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01% 95% mean confidence interval for cycles value: -53.64 -38.00 95% mean confidence interval for cycles %-change: -3.06% -2.20% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-31 02:06:57 +01:00
(('~fge', a, ('fmax', b, a)), ('fge', a, b)),
(('~fge', ('fmin', a, b), a), ('fge', b, a)),
nir: Undo possible damage caused by rearranging or-compounded float compares shader-db results: Skylake and Broadwell had similar results (Skylake shown) total instructions in shared programs: 14525898 -> 14525836 (<.01%) instructions in affected programs: 1964 -> 1902 (-3.16%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1 helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86% 95% mean confidence interval for instructions value: -9.46 0.60 95% mean confidence interval for instructions %-change: -3.97% -0.24% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533119892 -> 533115756 (<.01%) cycles in affected programs: 96061 -> 91925 (-4.31%) helped: 13 HURT: 1 helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300 helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46% 95% mean confidence interval for cycles value: -379.43 -211.43 95% mean confidence interval for cycles %-change: -4.84% -3.01% Cycles are helped. Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown). total instructions in shared programs: 9033948 -> 9033898 (<.01%) instructions in affected programs: 535 -> 485 (-9.35%) helped: 2 HURT: 0 total cycles in shared programs: 84631402 -> 84628949 (<.01%) cycles in affected programs: 63197 -> 60744 (-3.88%) helped: 13 HURT: 2 helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140 helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01% HURT stats (abs) min: 4 max: 8 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for cycles value: -253.40 -73.67 95% mean confidence interval for cycles %-change: -4.24% -2.25% Cycles are helped. No changes on GM45 or Iron Lake. v2: Add a couple more tautological compares. Suggested by Elie. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-05 21:29:26 +00:00
(('ilt', a, ('imax', b, a)), ('ilt', a, b)),
(('ilt', ('imin', a, b), a), ('ilt', b, a)),
(('ige', a, ('imin', b, a)), True),
(('ige', ('imax', a, b), a), True),
(('ult', a, ('umax', b, a)), ('ult', a, b)),
(('ult', ('umin', a, b), a), ('ult', b, a)),
(('uge', a, ('umin', b, a)), True),
(('uge', ('umax', a, b), a), True),
nir: Add some missing "optimization undo" patterns d8d18516b0a and 03fb13f6467 added some patterns to undo conversions like (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))) If further optimization cause some of the operands to either be the same or be constants, undoing the transformation can lead to further savings. I don't know why these patterns were not added in those patches. I did not check to see which specific patterns actually helped. I just added all of them for symmetry. This prevents some loop unrolling regressions Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 14369768 -> 14369557 (<.01%) instructions in affected programs: 44076 -> 43865 (-0.48%) helped: 141 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1 helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60% 95% mean confidence interval for instructions value: -1.67 -1.32 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 532430629 -> 532425772 (<.01%) cycles in affected programs: 1170832 -> 1165975 (-0.41%) helped: 101 HURT: 5 helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32 helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03% HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01% 95% mean confidence interval for cycles value: -53.64 -38.00 95% mean confidence interval for cycles %-change: -3.06% -2.20% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-31 02:06:57 +01:00
(('ilt', a, ('imin', b, a)), False),
(('ilt', ('imax', a, b), a), False),
(('ige', a, ('imax', b, a)), ('ige', a, b)),
(('ige', ('imin', a, b), a), ('ige', b, a)),
(('ult', a, ('umin', b, a)), False),
(('ult', ('umax', a, b), a), False),
(('uge', a, ('umax', b, a)), ('uge', a, b)),
(('uge', ('umin', a, b), a), ('uge', b, a)),
(('ult', a, ('iand', b, a)), False),
(('ult', ('ior', a, b), a), False),
(('uge', a, ('iand', b, a)), True),
(('uge', ('ior', a, b), a), True),
nir: Add some missing "optimization undo" patterns d8d18516b0a and 03fb13f6467 added some patterns to undo conversions like (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))) If further optimization cause some of the operands to either be the same or be constants, undoing the transformation can lead to further savings. I don't know why these patterns were not added in those patches. I did not check to see which specific patterns actually helped. I just added all of them for symmetry. This prevents some loop unrolling regressions Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 14369768 -> 14369557 (<.01%) instructions in affected programs: 44076 -> 43865 (-0.48%) helped: 141 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1 helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60% 95% mean confidence interval for instructions value: -1.67 -1.32 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 532430629 -> 532425772 (<.01%) cycles in affected programs: 1170832 -> 1165975 (-0.41%) helped: 101 HURT: 5 helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32 helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03% HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01% 95% mean confidence interval for cycles value: -53.64 -38.00 95% mean confidence interval for cycles %-change: -3.06% -2.20% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-05-31 02:06:57 +01:00
(('ilt', '#a', ('imax', '#b', c)), ('ior', ('ilt', a, b), ('ilt', a, c))),
(('ilt', ('imin', '#a', b), '#c'), ('ior', ('ilt', a, c), ('ilt', b, c))),
(('ige', '#a', ('imin', '#b', c)), ('ior', ('ige', a, b), ('ige', a, c))),
(('ige', ('imax', '#a', b), '#c'), ('ior', ('ige', a, c), ('ige', b, c))),
(('ult', '#a', ('umax', '#b', c)), ('ior', ('ult', a, b), ('ult', a, c))),
(('ult', ('umin', '#a', b), '#c'), ('ior', ('ult', a, c), ('ult', b, c))),
(('uge', '#a', ('umin', '#b', c)), ('ior', ('uge', a, b), ('uge', a, c))),
(('uge', ('umax', '#a', b), '#c'), ('ior', ('uge', a, c), ('uge', b, c))),
(('ilt', '#a', ('imin', '#b', c)), ('iand', ('ilt', a, b), ('ilt', a, c))),
(('ilt', ('imax', '#a', b), '#c'), ('iand', ('ilt', a, c), ('ilt', b, c))),
(('ige', '#a', ('imax', '#b', c)), ('iand', ('ige', a, b), ('ige', a, c))),
(('ige', ('imin', '#a', b), '#c'), ('iand', ('ige', a, c), ('ige', b, c))),
(('ult', '#a', ('umin', '#b', c)), ('iand', ('ult', a, b), ('ult', a, c))),
(('ult', ('umax', '#a', b), '#c'), ('iand', ('ult', a, c), ('ult', b, c))),
(('uge', '#a', ('umax', '#b', c)), ('iand', ('uge', a, b), ('uge', a, c))),
(('uge', ('umin', '#a', b), '#c'), ('iand', ('uge', a, c), ('uge', b, c))),
# Thanks to sign extension, the ishr(a, b) is negative if and only if a is
# negative.
(('bcsel', ('ilt', a, 0), ('ineg', ('ishr', a, b)), ('ishr', a, b)),
('iabs', ('ishr', a, b))),
(('iabs', ('ishr', ('iabs', a), b)), ('ishr', ('iabs', a), b)),
(('fabs', ('slt', a, b)), ('slt', a, b)),
(('fabs', ('sge', a, b)), ('sge', a, b)),
(('fabs', ('seq', a, b)), ('seq', a, b)),
(('fabs', ('sne', a, b)), ('sne', a, b)),
(('slt', a, b), ('b2f', ('flt', a, b)), 'options->lower_scmp'),
(('sge', a, b), ('b2f', ('fge', a, b)), 'options->lower_scmp'),
(('seq', a, b), ('b2f', ('feq', a, b)), 'options->lower_scmp'),
(('sne', a, b), ('b2f', ('fne', a, b)), 'options->lower_scmp'),
(('seq', ('seq', a, b), 1.0), ('seq', a, b)),
(('seq', ('sne', a, b), 1.0), ('sne', a, b)),
(('seq', ('slt', a, b), 1.0), ('slt', a, b)),
(('seq', ('sge', a, b), 1.0), ('sge', a, b)),
(('sne', ('seq', a, b), 0.0), ('seq', a, b)),
(('sne', ('sne', a, b), 0.0), ('sne', a, b)),
(('sne', ('slt', a, b), 0.0), ('slt', a, b)),
(('sne', ('sge', a, b), 0.0), ('sge', a, b)),
(('seq', ('seq', a, b), 0.0), ('sne', a, b)),
(('seq', ('sne', a, b), 0.0), ('seq', a, b)),
(('seq', ('slt', a, b), 0.0), ('sge', a, b)),
(('seq', ('sge', a, b), 0.0), ('slt', a, b)),
(('sne', ('seq', a, b), 1.0), ('sne', a, b)),
(('sne', ('sne', a, b), 1.0), ('seq', a, b)),
(('sne', ('slt', a, b), 1.0), ('sge', a, b)),
(('sne', ('sge', a, b), 1.0), ('slt', a, b)),
(('fall_equal2', a, b), ('fmin', ('seq', 'a.x', 'b.x'), ('seq', 'a.y', 'b.y')), 'options->lower_vector_cmp'),
(('fall_equal3', a, b), ('seq', ('fany_nequal3', a, b), 0.0), 'options->lower_vector_cmp'),
(('fall_equal4', a, b), ('seq', ('fany_nequal4', a, b), 0.0), 'options->lower_vector_cmp'),
(('fany_nequal2', a, b), ('fmax', ('sne', 'a.x', 'b.x'), ('sne', 'a.y', 'b.y')), 'options->lower_vector_cmp'),
(('fany_nequal3', a, b), ('fsat', ('fdot3', ('sne', a, b), ('sne', a, b))), 'options->lower_vector_cmp'),
(('fany_nequal4', a, b), ('fsat', ('fdot4', ('sne', a, b), ('sne', a, b))), 'options->lower_vector_cmp'),
(('fne', ('fneg', a), a), ('fne', a, 0.0)),
(('feq', ('fneg', a), a), ('feq', a, 0.0)),
# Emulating booleans
(('imul', ('b2i', 'a@1'), ('b2i', 'b@1')), ('b2i', ('iand', a, b))),
(('fmul', ('b2f', 'a@1'), ('b2f', 'b@1')), ('b2f', ('iand', a, b))),
(('fsat', ('fadd', ('b2f', 'a@1'), ('b2f', 'b@1'))), ('b2f', ('ior', a, b))),
(('iand', 'a@bool32', 1.0), ('b2f', a)),
# True/False are ~0 and 0 in NIR. b2i of True is 1, and -1 is ~0 (True).
(('ineg', ('b2i32', 'a@32')), a),
(('flt', ('fneg', ('b2f', 'a@1')), 0), a), # Generated by TGSI KILL_IF.
# Comparison with the same args. Note that these are not done for
# the float versions because NaN always returns false on float
# inequalities.
(('ilt', a, a), False),
(('ige', a, a), True),
(('ieq', a, a), True),
(('ine', a, a), False),
(('ult', a, a), False),
(('uge', a, a), True),
# Logical and bit operations
(('iand', a, a), a),
(('iand', a, ~0), a),
(('iand', a, 0), 0),
(('ior', a, a), a),
(('ior', a, 0), a),
(('ior', a, True), True),
(('ixor', a, a), 0),
(('ixor', a, 0), a),
(('inot', ('inot', a)), a),
(('ior', ('iand', a, b), b), b),
(('ior', ('ior', a, b), b), ('ior', a, b)),
(('iand', ('ior', a, b), b), b),
(('iand', ('iand', a, b), b), ('iand', a, b)),
# DeMorgan's Laws
(('iand', ('inot', a), ('inot', b)), ('inot', ('ior', a, b))),
(('ior', ('inot', a), ('inot', b)), ('inot', ('iand', a, b))),
# Shift optimizations
(('ishl', 0, a), 0),
(('ishl', a, 0), a),
(('ishr', 0, a), 0),
(('ishr', a, 0), a),
(('ushr', 0, a), 0),
(('ushr', a, 0), a),
(('ior', ('ishl@16', a, b), ('ushr@16', a, ('iadd', 16, ('ineg', b)))), ('urol', a, b), '!options->lower_rotate'),
(('ior', ('ishl@16', a, b), ('ushr@16', a, ('isub', 16, b))), ('urol', a, b), '!options->lower_rotate'),
(('ior', ('ishl@32', a, b), ('ushr@32', a, ('iadd', 32, ('ineg', b)))), ('urol', a, b), '!options->lower_rotate'),
(('ior', ('ishl@32', a, b), ('ushr@32', a, ('isub', 32, b))), ('urol', a, b), '!options->lower_rotate'),
(('ior', ('ushr@16', a, b), ('ishl@16', a, ('iadd', 16, ('ineg', b)))), ('uror', a, b), '!options->lower_rotate'),
(('ior', ('ushr@16', a, b), ('ishl@16', a, ('isub', 16, b))), ('uror', a, b), '!options->lower_rotate'),
(('ior', ('ushr@32', a, b), ('ishl@32', a, ('iadd', 32, ('ineg', b)))), ('uror', a, b), '!options->lower_rotate'),
(('ior', ('ushr@32', a, b), ('ishl@32', a, ('isub', 32, b))), ('uror', a, b), '!options->lower_rotate'),
(('urol@16', a, b), ('ior', ('ishl', a, b), ('ushr', a, ('isub', 16, b))), 'options->lower_rotate'),
(('urol@32', a, b), ('ior', ('ishl', a, b), ('ushr', a, ('isub', 32, b))), 'options->lower_rotate'),
(('uror@16', a, b), ('ior', ('ushr', a, b), ('ishl', a, ('isub', 16, b))), 'options->lower_rotate'),
(('uror@32', a, b), ('ior', ('ushr', a, b), ('ishl', a, ('isub', 32, b))), 'options->lower_rotate'),
# Exponential/logarithmic identities
(('~fexp2', ('flog2', a)), a), # 2^lg2(a) = a
(('~flog2', ('fexp2', a)), a), # lg2(2^a) = a
(('fpow', a, b), ('fexp2', ('fmul', ('flog2', a), b)), 'options->lower_fpow'), # a^b = 2^(lg2(a)*b)
(('~fexp2', ('fmul', ('flog2', a), b)), ('fpow', a, b), '!options->lower_fpow'), # 2^(lg2(a)*b) = a^b
(('~fexp2', ('fadd', ('fmul', ('flog2', a), b), ('fmul', ('flog2', c), d))),
('~fmul', ('fpow', a, b), ('fpow', c, d)), '!options->lower_fpow'), # 2^(lg2(a) * b + lg2(c) + d) = a^b * c^d
(('~fexp2', ('fmul', ('flog2', a), 2.0)), ('fmul', a, a)),
(('~fexp2', ('fmul', ('flog2', a), 4.0)), ('fmul', ('fmul', a, a), ('fmul', a, a))),
(('~fpow', a, 1.0), a),
(('~fpow', a, 2.0), ('fmul', a, a)),
(('~fpow', a, 4.0), ('fmul', ('fmul', a, a), ('fmul', a, a))),
(('~fpow', 2.0, a), ('fexp2', a)),
(('~fpow', ('fpow', a, 2.2), 0.454545), a),
(('~fpow', ('fabs', ('fpow', a, 2.2)), 0.454545), ('fabs', a)),
(('~fsqrt', ('fexp2', a)), ('fexp2', ('fmul', 0.5, a))),
(('~frcp', ('fexp2', a)), ('fexp2', ('fneg', a))),
(('~frsq', ('fexp2', a)), ('fexp2', ('fmul', -0.5, a))),
(('~flog2', ('fsqrt', a)), ('fmul', 0.5, ('flog2', a))),
(('~flog2', ('frcp', a)), ('fneg', ('flog2', a))),
(('~flog2', ('frsq', a)), ('fmul', -0.5, ('flog2', a))),
(('~flog2', ('fpow', a, b)), ('fmul', b, ('flog2', a))),
(('~fmul', ('fexp2(is_used_once)', a), ('fexp2(is_used_once)', b)), ('fexp2', ('fadd', a, b))),
(('bcsel', ('flt', a, 0.0), 0.0, ('fsqrt', a)), ('fsqrt', ('fmax', a, 0.0))),
(('~fmul', ('fsqrt', a), ('fsqrt', a)), ('fabs',a)),
# Division and reciprocal
(('~fdiv', 1.0, a), ('frcp', a)),
(('fdiv', a, b), ('fmul', a, ('frcp', b)), 'options->lower_fdiv'),
(('~frcp', ('frcp', a)), a),
(('~frcp', ('fsqrt', a)), ('frsq', a)),
(('fsqrt', a), ('frcp', ('frsq', a)), 'options->lower_fsqrt'),
(('~frcp', ('frsq', a)), ('fsqrt', a), '!options->lower_fsqrt'),
# Trig
(('fsin', a), lowered_sincos(0.5), 'options->lower_sincos'),
(('fcos', a), lowered_sincos(0.75), 'options->lower_sincos'),
# Boolean simplifications
nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-12-04 00:30:44 +00:00
(('i2b32(is_used_by_if)', a), ('ine32', a, 0)),
(('i2b1(is_used_by_if)', a), ('ine', a, 0)),
2019-03-30 03:51:20 +00:00
(('ieq', a, True), a),
(('ine(is_not_used_by_if)', a, True), ('inot', a)),
(('ine', a, False), a),
(('ieq(is_not_used_by_if)', a, False), ('inot', 'a')),
(('bcsel', a, True, False), a),
(('bcsel', a, False, True), ('inot', a)),
(('bcsel@32', a, 1.0, 0.0), ('b2f', a)),
(('bcsel@32', a, 0.0, 1.0), ('b2f', ('inot', a))),
(('bcsel@32', a, -1.0, -0.0), ('fneg', ('b2f', a))),
(('bcsel@32', a, -0.0, -1.0), ('fneg', ('b2f', ('inot', a)))),
(('bcsel', True, b, c), b),
(('bcsel', False, b, c), c),
(('bcsel', a, ('b2f(is_used_once)', 'b@32'), ('b2f', 'c@32')), ('b2f', ('bcsel', a, b, c))),
(('bcsel', a, b, b), b),
(('~fcsel', a, b, b), b),
# D3D Boolean emulation
(('bcsel', a, -1, 0), ('ineg', ('b2i', 'a@1'))),
(('bcsel', a, 0, -1), ('ineg', ('b2i', ('inot', a)))),
(('iand', ('ineg', ('b2i', 'a@1')), ('ineg', ('b2i', 'b@1'))),
('ineg', ('b2i', ('iand', a, b)))),
(('ior', ('ineg', ('b2i','a@1')), ('ineg', ('b2i', 'b@1'))),
('ineg', ('b2i', ('ior', a, b)))),
(('ieq', ('ineg', ('b2i', 'a@1')), 0), ('inot', a)),
(('ieq', ('ineg', ('b2i', 'a@1')), -1), a),
(('ine', ('ineg', ('b2i', 'a@1')), 0), a),
(('ine', ('ineg', ('b2i', 'a@1')), -1), ('inot', a)),
(('iand', ('ineg', ('b2i', a)), 1.0), ('b2f', a)),
nir/algebraic: Additional D3D Boolean optimization I observed this pattern in several shaders in Hand of Fate 2 while investigating bugzilla #111490. This also led to the related bugzilla #111578. The shaders from HoF2 are *not* in shader-db. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Skylake and Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 16222621 -> 16205419 (-0.11%) instructions in affected programs: 798418 -> 781216 (-2.15%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 158 x̄: 31.39 x̃: 35 helped stats (rel) min: 0.45% max: 28.64% x̄: 2.83% x̃: 2.09% 95% mean confidence interval for instructions value: -33.22 -29.56 95% mean confidence interval for instructions %-change: -3.11% -2.56% Instructions are helped. total cycles in shared programs: 364676209 -> 363345763 (-0.36%) cycles in affected programs: 112810504 -> 111480058 (-1.18%) helped: 546 HURT: 7 helped stats (abs) min: 2 max: 118913 x̄: 2439.77 x̃: 2340 helped stats (rel) min: 0.08% max: 37.56% x̄: 1.46% x̃: 1.08% HURT stats (abs) min: 2 max: 770 x̄: 238.00 x̃: 43 HURT stats (rel) min: 0.02% max: 11.24% x̄: 3.71% x̃: 0.35% 95% mean confidence interval for cycles value: -2884.33 -1927.41 95% mean confidence interval for cycles %-change: -1.59% -1.21% Cycles are helped. total spills in shared programs: 8870 -> 8514 (-4.01%) spills in affected programs: 1230 -> 874 (-28.94%) helped: 161 HURT: 0 total fills in shared programs: 21901 -> 21348 (-2.52%) fills in affected programs: 2120 -> 1567 (-26.08%) helped: 155 HURT: 5 Broadwell and Haswell had similar results. (Broadwell shown) total instructions in shared programs: 14994910 -> 14975495 (-0.13%) instructions in affected programs: 839033 -> 819618 (-2.31%) helped: 548 HURT: 0 helped stats (abs) min: 2 max: 299 x̄: 35.43 x̃: 49 helped stats (rel) min: 0.39% max: 19.89% x̄: 2.91% x̃: 2.22% 95% mean confidence interval for instructions value: -37.46 -33.40 95% mean confidence interval for instructions %-change: -3.12% -2.70% Instructions are helped. total cycles in shared programs: 386032453 -> 384450722 (-0.41%) cycles in affected programs: 117807357 -> 116225626 (-1.34%) helped: 547 HURT: 6 helped stats (abs) min: 2 max: 22096 x̄: 2892.01 x̃: 3926 helped stats (rel) min: 0.17% max: 10.34% x̄: 1.56% x̃: 1.31% HURT stats (abs) min: 4 max: 60 x̄: 32.83 x̃: 29 HURT stats (rel) min: 0.38% max: 12.79% x̄: 5.86% x̃: 4.65% 95% mean confidence interval for cycles value: -3060.28 -2660.27 95% mean confidence interval for cycles %-change: -1.59% -1.37% Cycles are helped. total spills in shared programs: 23372 -> 21869 (-6.43%) spills in affected programs: 11730 -> 10227 (-12.81%) helped: 352 HURT: 0 total fills in shared programs: 34747 -> 35351 (1.74%) fills in affected programs: 11013 -> 11617 (5.48%) helped: 3 HURT: 347 Ivy Bridge and Sandybridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956420 -> 11956126 (<.01%) instructions in affected programs: 14898 -> 14604 (-1.97%) helped: 98 HURT: 0 helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 helped stats (rel) min: 1.30% max: 3.57% x̄: 2.08% x̃: 2.00% 95% mean confidence interval for instructions value: -3.00 -3.00 95% mean confidence interval for instructions %-change: -2.18% -1.98% Instructions are helped. total cycles in shared programs: 178791217 -> 178790792 (<.01%) cycles in affected programs: 149763 -> 149338 (-0.28%) helped: 91 HURT: 7 helped stats (abs) min: 3 max: 107 x̄: 20.63 x̃: 16 helped stats (rel) min: 0.13% max: 6.91% x̄: 1.40% x̃: 1.18% HURT stats (abs) min: 3 max: 322 x̄: 207.43 x̃: 322 HURT stats (rel) min: 0.14% max: 19.85% x̄: 12.73% x̃: 17.41% 95% mean confidence interval for cycles value: -18.94 10.27 95% mean confidence interval for cycles %-change: -1.28% 0.49% Inconclusive result (value mean confidence interval includes 0).
2019-09-09 23:47:48 +01:00
(('iand', ('ineg', ('b2i', a)), 1), ('b2i', a)),
# SM5 32-bit shifts are defined to use the 5 least significant bits
(('ishl', 'a@32', ('iand', 31, b)), ('ishl', a, b)),
(('ishr', 'a@32', ('iand', 31, b)), ('ishr', a, b)),
(('ushr', 'a@32', ('iand', 31, b)), ('ushr', a, b)),
# Conversions
(('i2b32', ('b2i', 'a@32')), a),
(('f2i', ('ftrunc', a)), ('f2i', a)),
(('f2u', ('ftrunc', a)), ('f2u', a)),
nir: Simplify i2b with negated or abs operand This enables removing ssa_201 and ssa_202 in sequences like: vec1 ssa_200 = flt ssa_199, ssa_194 vec1 ssa_201 = b2i ssa_200 vec1 ssa_202 = i2b -ssa_201 shader-db results: Sandy Bridge total instructions in shared programs: 8462257 -> 8462180 (-0.00%) instructions in affected programs: 3846 -> 3769 (-2.00%) helped: 35 HURT: 0 total cycles in shared programs: 117542934 -> 117542462 (-0.00%) cycles in affected programs: 20072 -> 19600 (-2.35%) helped: 20 HURT: 1 Ivy Bridge total instructions in shared programs: 7775252 -> 7775137 (-0.00%) instructions in affected programs: 3645 -> 3530 (-3.16%) helped: 35 HURT: 0 total cycles in shared programs: 65760522 -> 65760068 (-0.00%) cycles in affected programs: 21082 -> 20628 (-2.15%) helped: 25 HURT: 2 Haswell total instructions in shared programs: 7108666 -> 7108589 (-0.00%) instructions in affected programs: 3253 -> 3176 (-2.37%) helped: 35 HURT: 0 total cycles in shared programs: 64675726 -> 64675272 (-0.00%) cycles in affected programs: 21034 -> 20580 (-2.16%) helped: 26 HURT: 1 Broadwell / Skylake total instructions in shared programs: 8980912 -> 8980835 (-0.00%) instructions in affected programs: 3223 -> 3146 (-2.39%) helped: 35 HURT: 0 total cycles in shared programs: 70077926 -> 70077904 (-0.00%) cycles in affected programs: 21886 -> 21864 (-0.10%) helped: 21 HURT: 6 G45 and Ironlake showed no change. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-02 02:59:57 +00:00
(('i2b', ('ineg', a)), ('i2b', a)),
(('i2b', ('iabs', a)), ('i2b', a)),
(('inot', ('f2b1', a)), ('feq', a, 0.0)),
# The C spec says, "If the value of the integral part cannot be represented
# by the integer type, the behavior is undefined." "Undefined" can mean
# "the conversion doesn't happen at all."
(('~i2f32', ('f2i32', 'a@32')), ('ftrunc', a)),
nir: Remove f2i(i2f(x)) conversions Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277978 -> 14277620 (<.01%) instructions in affected programs: 36957 -> 36599 (-0.97%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4 helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36% 95% mean confidence interval for instructions value: -7.06 -2.24 95% mean confidence interval for instructions %-change: -1.28% -0.77% Instructions are helped. total cycles in shared programs: 532584581 -> 532580716 (<.01%) cycles in affected programs: 973591 -> 969726 (-0.40%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32 helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19% HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280 HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10% 95% mean confidence interval for cycles value: -386.98 286.59 95% mean confidence interval for cycles %-change: -1.41% -0.81% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8127 -> 8116 (-0.14%) spills in affected programs: 108 -> 97 (-10.19%) helped: 1 HURT: 0 total fills in shared programs: 11090 -> 11086 (-0.04%) fills in affected programs: 440 -> 436 (-0.91%) helped: 1 HURT: 1 Haswell total instructions in shared programs: 12979174 -> 12979054 (<.01%) instructions in affected programs: 9040 -> 8920 (-1.33%) helped: 14 HURT: 1 helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6 helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19% 95% mean confidence interval for instructions value: -13.58 -2.42 95% mean confidence interval for instructions %-change: -3.94% -1.01% Instructions are helped. total cycles in shared programs: 411227148 -> 411212880 (<.01%) cycles in affected programs: 630506 -> 616238 (-2.26%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38 helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17% 95% mean confidence interval for cycles value: -2544.28 641.88 95% mean confidence interval for cycles %-change: -6.89% -0.94% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78626 -> 78623 (<.01%) spills in affected programs: 42 -> 39 (-7.14%) helped: 1 HURT: 0 total fills in shared programs: 80111 -> 80104 (<.01%) fills in affected programs: 140 -> 133 (-5.00%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11684101 -> 11684030 (<.01%) instructions in affected programs: 3080 -> 3009 (-2.31%) helped: 4 HURT: 1 helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5 helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for instructions value: -45.59 17.19 95% mean confidence interval for instructions %-change: -9.38% -1.56% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 258407697 -> 258389653 (<.01%) cycles in affected programs: 328323 -> 310279 (-5.50%) helped: 5 HURT: 0 helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32 helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60% 95% mean confidence interval for cycles value: -11616.71 4399.11 95% mean confidence interval for cycles %-change: -16.56% -2.03% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4537 -> 4528 (-0.20%) spills in affected programs: 64 -> 55 (-14.06%) helped: 1 HURT: 0 total fills in shared programs: 4823 -> 4815 (-0.17%) fills in affected programs: 189 -> 181 (-4.23%) helped: 1 HURT: 1 Sandy Bridge total instructions in shared programs: 10488464 -> 10488449 (<.01%) instructions in affected programs: 272 -> 257 (-5.51%) helped: 3 HURT: 0 helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49% total cycles in shared programs: 150263359 -> 150263263 (<.01%) cycles in affected programs: 7978 -> 7882 (-1.20%) helped: 3 HURT: 0 helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32 helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-07-03 21:57:06 +01:00
# Ironically, mark these as imprecise because removing the conversions may
# preserve more precision than doing the conversions (e.g.,
# uint(float(0x81818181u)) == 0x81818200).
(('~f2i32', ('i2f', 'a@32')), a),
(('~f2i32', ('u2f', 'a@32')), a),
(('~f2u32', ('i2f', 'a@32')), a),
(('~f2u32', ('u2f', 'a@32')), a),
nir: Remove f2i(i2f(x)) conversions Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277978 -> 14277620 (<.01%) instructions in affected programs: 36957 -> 36599 (-0.97%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4 helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36% 95% mean confidence interval for instructions value: -7.06 -2.24 95% mean confidence interval for instructions %-change: -1.28% -0.77% Instructions are helped. total cycles in shared programs: 532584581 -> 532580716 (<.01%) cycles in affected programs: 973591 -> 969726 (-0.40%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32 helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19% HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280 HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10% 95% mean confidence interval for cycles value: -386.98 286.59 95% mean confidence interval for cycles %-change: -1.41% -0.81% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8127 -> 8116 (-0.14%) spills in affected programs: 108 -> 97 (-10.19%) helped: 1 HURT: 0 total fills in shared programs: 11090 -> 11086 (-0.04%) fills in affected programs: 440 -> 436 (-0.91%) helped: 1 HURT: 1 Haswell total instructions in shared programs: 12979174 -> 12979054 (<.01%) instructions in affected programs: 9040 -> 8920 (-1.33%) helped: 14 HURT: 1 helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6 helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19% 95% mean confidence interval for instructions value: -13.58 -2.42 95% mean confidence interval for instructions %-change: -3.94% -1.01% Instructions are helped. total cycles in shared programs: 411227148 -> 411212880 (<.01%) cycles in affected programs: 630506 -> 616238 (-2.26%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38 helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17% 95% mean confidence interval for cycles value: -2544.28 641.88 95% mean confidence interval for cycles %-change: -6.89% -0.94% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78626 -> 78623 (<.01%) spills in affected programs: 42 -> 39 (-7.14%) helped: 1 HURT: 0 total fills in shared programs: 80111 -> 80104 (<.01%) fills in affected programs: 140 -> 133 (-5.00%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11684101 -> 11684030 (<.01%) instructions in affected programs: 3080 -> 3009 (-2.31%) helped: 4 HURT: 1 helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5 helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for instructions value: -45.59 17.19 95% mean confidence interval for instructions %-change: -9.38% -1.56% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 258407697 -> 258389653 (<.01%) cycles in affected programs: 328323 -> 310279 (-5.50%) helped: 5 HURT: 0 helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32 helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60% 95% mean confidence interval for cycles value: -11616.71 4399.11 95% mean confidence interval for cycles %-change: -16.56% -2.03% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4537 -> 4528 (-0.20%) spills in affected programs: 64 -> 55 (-14.06%) helped: 1 HURT: 0 total fills in shared programs: 4823 -> 4815 (-0.17%) fills in affected programs: 189 -> 181 (-4.23%) helped: 1 HURT: 1 Sandy Bridge total instructions in shared programs: 10488464 -> 10488449 (<.01%) instructions in affected programs: 272 -> 257 (-5.51%) helped: 3 HURT: 0 helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49% total cycles in shared programs: 150263359 -> 150263263 (<.01%) cycles in affected programs: 7978 -> 7882 (-1.20%) helped: 3 HURT: 0 helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32 helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-07-03 21:57:06 +01:00
# Conversions from float16 to float32 and back can always be removed
(('f2f16', ('f2f32', 'a@16')), a),
(('f2fmp', ('f2f32', 'a@16')), a),
# Conversions to float16 would be lossy so they should only be removed if
# the instruction was generated by the precision lowering pass.
(('f2f32', ('f2fmp', 'a@32')), a),
nir/algebraic: Use value range analysis to eliminate useless unary ops Sandy Bridge is the big winner because it lies at something of a crossroads. It supports a fairly high OpenGL version, and it still has the old style math box. The high OpenGL version means a lot more shaders can run on it. The old style math box means extra moves are necessary to resolve source modifiers on operands to complex math instructions like COS, SQRT, and RCP. v2: Remove a couple patterns that are now redundant. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16282006 -> 16278207 (-0.02%) instructions in affected programs: 174555 -> 170756 (-2.18%) helped: 661 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 5.75 x̃: 3 helped stats (rel) min: 0.06% max: 23.68% x̄: 2.81% x̃: 1.94% 95% mean confidence interval for instructions value: -6.16 -5.34 95% mean confidence interval for instructions %-change: -3.02% -2.60% Instructions are helped. total cycles in shared programs: 367168597 -> 367134284 (<.01%) cycles in affected programs: 1105276 -> 1070963 (-3.10%) helped: 460 HURT: 150 helped stats (abs) min: 1 max: 568 x̄: 96.60 x̃: 82 helped stats (rel) min: 0.02% max: 32.50% x̄: 7.99% x̃: 4.27% HURT stats (abs) min: 1 max: 901 x̄: 67.49 x̃: 39 HURT stats (rel) min: 0.07% max: 20.00% x̄: 4.90% x̃: 4.22% 95% mean confidence interval for cycles value: -65.68 -46.82 95% mean confidence interval for cycles %-change: -5.59% -4.05% Cycles are helped. Sandy Bridge total instructions in shared programs: 10824272 -> 10802557 (-0.20%) instructions in affected programs: 1237988 -> 1216273 (-1.75%) helped: 8199 HURT: 0 helped stats (abs) min: 1 max: 41 x̄: 2.65 x̃: 2 helped stats (rel) min: 0.12% max: 20.00% x̄: 2.04% x̃: 1.73% 95% mean confidence interval for instructions value: -2.70 -2.59 95% mean confidence interval for instructions %-change: -2.07% -2.00% Instructions are helped. total cycles in shared programs: 154009894 -> 153843598 (-0.11%) cycles in affected programs: 10650486 -> 10484190 (-1.56%) helped: 4973 HURT: 1533 helped stats (abs) min: 1 max: 3904 x̄: 40.20 x̃: 20 helped stats (rel) min: 0.02% max: 41.72% x̄: 2.63% x̃: 1.67% HURT stats (abs) min: 1 max: 453 x̄: 21.94 x̃: 8 HURT stats (rel) min: 0.02% max: 41.91% x̄: 1.54% x̃: 0.58% 95% mean confidence interval for cycles value: -28.02 -23.10 95% mean confidence interval for cycles %-change: -1.74% -1.56% Cycles are helped. LOST: 0 GAINED: 2 GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8135196 -> 8134888 (<.01%) instructions in affected programs: 31920 -> 31612 (-0.96%) helped: 169 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 1.82 x̃: 2 helped stats (rel) min: 0.43% max: 3.23% x̄: 1.23% x̃: 1.16% 95% mean confidence interval for instructions value: -2.01 -1.64 95% mean confidence interval for instructions %-change: -1.32% -1.15% Instructions are helped. total cycles in shared programs: 188575724 -> 188574092 (<.01%) cycles in affected programs: 406840 -> 405208 (-0.40%) helped: 169 HURT: 0 helped stats (abs) min: 4 max: 72 x̄: 9.66 x̃: 10 helped stats (rel) min: 0.07% max: 2.16% x̄: 0.57% x̃: 0.47% 95% mean confidence interval for cycles value: -10.72 -8.59 95% mean confidence interval for cycles %-change: -0.63% -0.50% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 09:35:51 +00:00
(('ffloor', 'a(is_integral)'), a),
(('fceil', 'a(is_integral)'), a),
(('ftrunc', 'a(is_integral)'), a),
# fract(x) = x - floor(x), so fract(NaN) = NaN
(('~ffract', 'a(is_integral)'), 0.0),
nir/algebraic: Use value range analysis to eliminate useless unary ops Sandy Bridge is the big winner because it lies at something of a crossroads. It supports a fairly high OpenGL version, and it still has the old style math box. The high OpenGL version means a lot more shaders can run on it. The old style math box means extra moves are necessary to resolve source modifiers on operands to complex math instructions like COS, SQRT, and RCP. v2: Remove a couple patterns that are now redundant. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16282006 -> 16278207 (-0.02%) instructions in affected programs: 174555 -> 170756 (-2.18%) helped: 661 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 5.75 x̃: 3 helped stats (rel) min: 0.06% max: 23.68% x̄: 2.81% x̃: 1.94% 95% mean confidence interval for instructions value: -6.16 -5.34 95% mean confidence interval for instructions %-change: -3.02% -2.60% Instructions are helped. total cycles in shared programs: 367168597 -> 367134284 (<.01%) cycles in affected programs: 1105276 -> 1070963 (-3.10%) helped: 460 HURT: 150 helped stats (abs) min: 1 max: 568 x̄: 96.60 x̃: 82 helped stats (rel) min: 0.02% max: 32.50% x̄: 7.99% x̃: 4.27% HURT stats (abs) min: 1 max: 901 x̄: 67.49 x̃: 39 HURT stats (rel) min: 0.07% max: 20.00% x̄: 4.90% x̃: 4.22% 95% mean confidence interval for cycles value: -65.68 -46.82 95% mean confidence interval for cycles %-change: -5.59% -4.05% Cycles are helped. Sandy Bridge total instructions in shared programs: 10824272 -> 10802557 (-0.20%) instructions in affected programs: 1237988 -> 1216273 (-1.75%) helped: 8199 HURT: 0 helped stats (abs) min: 1 max: 41 x̄: 2.65 x̃: 2 helped stats (rel) min: 0.12% max: 20.00% x̄: 2.04% x̃: 1.73% 95% mean confidence interval for instructions value: -2.70 -2.59 95% mean confidence interval for instructions %-change: -2.07% -2.00% Instructions are helped. total cycles in shared programs: 154009894 -> 153843598 (-0.11%) cycles in affected programs: 10650486 -> 10484190 (-1.56%) helped: 4973 HURT: 1533 helped stats (abs) min: 1 max: 3904 x̄: 40.20 x̃: 20 helped stats (rel) min: 0.02% max: 41.72% x̄: 2.63% x̃: 1.67% HURT stats (abs) min: 1 max: 453 x̄: 21.94 x̃: 8 HURT stats (rel) min: 0.02% max: 41.91% x̄: 1.54% x̃: 0.58% 95% mean confidence interval for cycles value: -28.02 -23.10 95% mean confidence interval for cycles %-change: -1.74% -1.56% Cycles are helped. LOST: 0 GAINED: 2 GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8135196 -> 8134888 (<.01%) instructions in affected programs: 31920 -> 31612 (-0.96%) helped: 169 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 1.82 x̃: 2 helped stats (rel) min: 0.43% max: 3.23% x̄: 1.23% x̃: 1.16% 95% mean confidence interval for instructions value: -2.01 -1.64 95% mean confidence interval for instructions %-change: -1.32% -1.15% Instructions are helped. total cycles in shared programs: 188575724 -> 188574092 (<.01%) cycles in affected programs: 406840 -> 405208 (-0.40%) helped: 169 HURT: 0 helped stats (abs) min: 4 max: 72 x̄: 9.66 x̃: 10 helped stats (rel) min: 0.07% max: 2.16% x̄: 0.57% x̃: 0.47% 95% mean confidence interval for cycles value: -10.72 -8.59 95% mean confidence interval for cycles %-change: -0.63% -0.50% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 09:35:51 +00:00
(('fabs', 'a(is_not_negative)'), a),
(('iabs', 'a(is_not_negative)'), a),
(('fsat', 'a(is_not_positive)'), 0.0),
nir/algebraic: Convert some f2u to f2i Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floating-point value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shader-db results compairing *before* "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 -> 383007896 (<.01%) cycles in affected programs: 85208 -> 85108 (-0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: -9.31 -0.21 95% mean confidence interval for cycles %-change: -0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 -> 415251370 (<.01%) cycles in affected programs: 83750 -> 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %-change: <.01% 0.37% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 -> 13823886 (<.01%) instructions in affected programs: 2249 -> 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 -> 390094001 (<.01%) cycles in affected programs: 85640 -> 85398 (-0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: -17.36 -5.69 95% mean confidence interval for cycles %-change: -0.44% -0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 -> 180986552 (<.01%) cycles in affected programs: 34835 -> 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %-change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 -> 154603970 (<.01%) cycles in affected programs: 171514 -> 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: -0.91 0.96 95% mean confidence interval for cycles %-change: -0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-13 03:59:39 +00:00
# Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec
# says:
#
# It is undefined to convert a negative floating-point value to an
# uint.
#
# Assuming that (uint)some_float behaves like (uint)(int)some_float allows
# some optimizations in the i965 backend to proceed.
(('ige', ('f2u', a), b), ('ige', ('f2i', a), b)),
(('ige', b, ('f2u', a)), ('ige', b, ('f2i', a))),
(('ilt', ('f2u', a), b), ('ilt', ('f2i', a), b)),
(('ilt', b, ('f2u', a)), ('ilt', b, ('f2i', a))),
nir/algebraic: Use value range analysis to convert fmin to fsat All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16297320 -> 16282006 (-0.09%) instructions in affected programs: 2434498 -> 2419184 (-0.63%) helped: 8091 HURT: 1 helped stats (abs) min: 1 max: 51 x̄: 1.89 x̃: 2 helped stats (rel) min: 0.04% max: 14.29% x̄: 0.98% x̃: 0.95% HURT stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 HURT stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for instructions value: -1.94 -1.85 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 367221624 -> 367168597 (-0.01%) cycles in affected programs: 126409635 -> 126356608 (-0.04%) helped: 5612 HURT: 1023 helped stats (abs) min: 1 max: 2332 x̄: 31.11 x̃: 16 helped stats (rel) min: <.01% max: 30.31% x̄: 1.69% x̃: 1.42% HURT stats (abs) min: 1 max: 2372 x̄: 118.84 x̃: 16 HURT stats (rel) min: <.01% max: 46.98% x̄: 1.46% x̃: 0.35% 95% mean confidence interval for cycles value: -11.52 -4.46 95% mean confidence interval for cycles %-change: -1.26% -1.14% Cycles are helped. total spills in shared programs: 8868 -> 8870 (0.02%) spills in affected programs: 28 -> 30 (7.14%) helped: 0 HURT: 1 total fills in shared programs: 21903 -> 21904 (<.01%) fills in affected programs: 42 -> 43 (2.38%) helped: 0 HURT: 1 Haswell total instructions in shared programs: 13353925 -> 13338728 (-0.11%) instructions in affected programs: 2265850 -> 2250653 (-0.67%) helped: 8127 HURT: 5 helped stats (abs) min: 1 max: 51 x̄: 1.88 x̃: 2 helped stats (rel) min: 0.04% max: 20.00% x̄: 1.13% x̃: 1.07% HURT stats (abs) min: 5 max: 16 x̄: 9.00 x̃: 6 HURT stats (rel) min: 0.19% max: 0.52% x̄: 0.35% x̃: 0.28% 95% mean confidence interval for instructions value: -1.91 -1.83 95% mean confidence interval for instructions %-change: -1.15% -1.11% Instructions are helped. total cycles in shared programs: 375535444 -> 375536343 (<.01%) cycles in affected programs: 131206582 -> 131207481 (<.01%) helped: 5590 HURT: 1055 helped stats (abs) min: 1 max: 2844 x̄: 34.15 x̃: 16 helped stats (rel) min: <.01% max: 21.57% x̄: 2.08% x̃: 1.60% HURT stats (abs) min: 1 max: 2487 x̄: 181.78 x̃: 21 HURT stats (rel) min: <.01% max: 40.66% x̄: 1.96% x̃: 0.37% 95% mean confidence interval for cycles value: -4.74 5.01 95% mean confidence interval for cycles %-change: -1.51% -1.37% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23401 -> 23407 (0.03%) spills in affected programs: 248 -> 254 (2.42%) helped: 2 HURT: 5 total fills in shared programs: 34850 -> 34845 (-0.01%) fills in affected programs: 383 -> 378 (-1.31%) helped: 2 HURT: 5 Ivy Bridge total instructions in shared programs: 11975423 -> 11968117 (-0.06%) instructions in affected programs: 845703 -> 838397 (-0.86%) helped: 4071 HURT: 0 helped stats (abs) min: 1 max: 51 x̄: 1.79 x̃: 1 helped stats (rel) min: 0.08% max: 8.21% x̄: 1.04% x̃: 0.93% 95% mean confidence interval for instructions value: -1.87 -1.71 95% mean confidence interval for instructions %-change: -1.06% -1.02% Instructions are helped. total cycles in shared programs: 179674318 -> 179635552 (-0.02%) cycles in affected programs: 5100065 -> 5061299 (-0.76%) helped: 2650 HURT: 611 helped stats (abs) min: 1 max: 900 x̄: 21.85 x̃: 16 helped stats (rel) min: <.01% max: 21.55% x̄: 2.39% x̃: 1.40% HURT stats (abs) min: 1 max: 1841 x̄: 31.33 x̃: 6 HURT stats (rel) min: <.01% max: 58.71% x̄: 1.64% x̃: 0.37% 95% mean confidence interval for cycles value: -14.14 -9.64 95% mean confidence interval for cycles %-change: -1.75% -1.52% Cycles are helped. LOST: 3 GAINED: 7 Sandy Bridge total instructions in shared programs: 10828844 -> 10824272 (-0.04%) instructions in affected programs: 525678 -> 521106 (-0.87%) helped: 2386 HURT: 0 helped stats (abs) min: 1 max: 51 x̄: 1.92 x̃: 2 helped stats (rel) min: 0.11% max: 7.96% x̄: 1.05% x̃: 0.94% 95% mean confidence interval for instructions value: -2.04 -1.80 95% mean confidence interval for instructions %-change: -1.08% -1.03% Instructions are helped. total cycles in shared programs: 154024591 -> 154009894 (<.01%) cycles in affected programs: 4005766 -> 3991069 (-0.37%) helped: 1245 HURT: 506 helped stats (abs) min: 1 max: 585 x̄: 21.07 x̃: 16 helped stats (rel) min: 0.02% max: 11.57% x̄: 1.98% x̃: 0.83% HURT stats (abs) min: 1 max: 639 x̄: 22.81 x̃: 6 HURT stats (rel) min: 0.01% max: 26.21% x̄: 1.07% x̃: 0.26% 95% mean confidence interval for cycles value: -10.57 -6.21 95% mean confidence interval for cycles %-change: -1.23% -0.97% Cycles are helped. GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8137248 -> 8135196 (-0.03%) instructions in affected programs: 148322 -> 146270 (-1.38%) helped: 992 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.07 x̃: 2 helped stats (rel) min: 0.41% max: 9.73% x̄: 1.74% x̃: 1.51% 95% mean confidence interval for instructions value: -2.16 -1.98 95% mean confidence interval for instructions %-change: -1.80% -1.67% Instructions are helped. total cycles in shared programs: 188583424 -> 188575724 (<.01%) cycles in affected programs: 4409620 -> 4401920 (-0.17%) helped: 956 HURT: 6 helped stats (abs) min: 2 max: 168 x̄: 8.09 x̃: 8 helped stats (rel) min: 0.04% max: 6.76% x̄: 0.27% x̃: 0.18% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.10% max: 0.10% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: -8.41 -7.60 95% mean confidence interval for cycles %-change: -0.29% -0.25% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 13:11:53 +00:00
(('~fmin', 'a(is_not_negative)', 1.0), ('fsat', a), '!options->lower_fsat'),
nir/algebraic: Eliminate a tautological compare The value-range tracking pass that is coming is not clever enough to know that the result of the ffma must be non-negative. Making it that smart will require quite a bit of work. It might be possible to add a special case that detects that a whole tree of fadd(fmul(fsat(a), fneg(fsat(a))), 1.0) cannot be negative. For cases when the comparison is used in the domain guard for a square-root (see nir/algebraic: Simplify fsqrt domain guard), the compare may be converted to a fmax. This patch also handles that case. All of the affected cases are in DiRT: Showdown. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225365 -> 17225303 (<.01%) instructions in affected programs: 40051 -> 39989 (-0.15%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.31% -0.22% Instructions are helped. total cycles in shared programs: 360842788 -> 360842595 (<.01%) cycles in affected programs: 1818081 -> 1817888 (-0.01%) helped: 29 HURT: 22 helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14 helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42% HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7 HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19% 95% mean confidence interval for cycles value: -14.48 6.91 95% mean confidence interval for cycles %-change: -0.71% 0.21% Inconclusive result (value mean confidence interval includes 0). No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-10-25 04:13:45 +01:00
# The result of the multiply must be in [-1, 0], so the result of the ffma
# must be in [0, 1].
(('flt', ('fadd', ('fmul', ('fsat', a), ('fneg', ('fsat', a))), 1.0), 0.0), False),
(('flt', ('fadd', ('fneg', ('fmul', ('fsat', a), ('fsat', a))), 1.0), 0.0), False),
(('fmax', ('fadd', ('fmul', ('fsat', a), ('fneg', ('fsat', a))), 1.0), 0.0), ('fadd', ('fmul', ('fsat', a), ('fneg', ('fsat', a))), 1.0)),
(('fmax', ('fadd', ('fneg', ('fmul', ('fsat', a), ('fsat', a))), 1.0), 0.0), ('fadd', ('fneg', ('fmul', ('fsat', a), ('fsat', a))), 1.0)),
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('fne', 'a(is_not_zero)', 0.0), True),
(('feq', 'a(is_not_zero)', 0.0), False),
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
# In this chart, + means value > 0 and - means value < 0.
#
# + >= + -> unknown 0 >= + -> false - >= + -> false
# + >= 0 -> true 0 >= 0 -> true - >= 0 -> false
# + >= - -> true 0 >= - -> true - >= - -> unknown
#
# Using grouping conceptually similar to a Karnaugh map...
#
# (+ >= 0, + >= -, 0 >= 0, 0 >= -) == (is_not_negative >= is_not_positive) -> true
# (0 >= +, - >= +) == (is_not_positive >= gt_zero) -> false
# (- >= +, - >= 0) == (lt_zero >= is_not_negative) -> false
#
# The flt / ilt cases just invert the expected result.
#
# The results expecting true, must be marked imprecise. The results
# expecting false are fine because NaN compared >= or < anything is false.
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
(('~fge', 'a(is_not_negative)', 'b(is_not_positive)'), True),
(('fge', 'a(is_not_positive)', 'b(is_gt_zero)'), False),
(('fge', 'a(is_lt_zero)', 'b(is_not_negative)'), False),
(('flt', 'a(is_not_negative)', 'b(is_not_positive)'), False),
(('~flt', 'a(is_not_positive)', 'b(is_gt_zero)'), True),
(('~flt', 'a(is_lt_zero)', 'b(is_not_negative)'), True),
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('ine', 'a(is_not_zero)', 0), True),
(('ieq', 'a(is_not_zero)', 0), False),
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('ige', 'a(is_not_negative)', 'b(is_not_positive)'), True),
(('ige', 'a(is_not_positive)', 'b(is_gt_zero)'), False),
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('ige', 'a(is_lt_zero)', 'b(is_not_negative)'), False),
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('ilt', 'a(is_not_negative)', 'b(is_not_positive)'), False),
(('ilt', 'a(is_not_positive)', 'b(is_gt_zero)'), True),
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('ilt', 'a(is_lt_zero)', 'b(is_not_negative)'), True),
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-01-23 02:00:55 +00:00
(('ult', 0, 'a(is_gt_zero)'), True),
nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: 8368829 -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-10-11 22:21:42 +01:00
# Packing and then unpacking does nothing
(('unpack_64_2x32_split_x', ('pack_64_2x32_split', a, b)), a),
(('unpack_64_2x32_split_y', ('pack_64_2x32_split', a, b)), b),
(('pack_64_2x32_split', ('unpack_64_2x32_split_x', a),
('unpack_64_2x32_split_y', a)), a),
# Comparing two halves of an unpack separately. While this optimization
# should be correct for non-constant values, it's less obvious that it's
# useful in that case. For constant values, the pack will fold and we're
# guaranteed to reduce the whole tree to one instruction.
(('iand', ('ieq', ('unpack_32_2x16_split_x', a), '#b'),
('ieq', ('unpack_32_2x16_split_y', a), '#c')),
('ieq', a, ('pack_32_2x16_split', b, c))),
# Byte extraction
(('ushr', 'a@16', 8), ('extract_u8', a, 1), '!options->lower_extract_byte'),
(('ushr', 'a@32', 24), ('extract_u8', a, 3), '!options->lower_extract_byte'),
(('ushr', 'a@64', 56), ('extract_u8', a, 7), '!options->lower_extract_byte'),
(('ishr', 'a@16', 8), ('extract_i8', a, 1), '!options->lower_extract_byte'),
(('ishr', 'a@32', 24), ('extract_i8', a, 3), '!options->lower_extract_byte'),
(('ishr', 'a@64', 56), ('extract_i8', a, 7), '!options->lower_extract_byte'),
(('iand', 0xff, a), ('extract_u8', a, 0), '!options->lower_extract_byte'),
# Useless masking before unpacking
(('unpack_half_2x16_split_x', ('iand', a, 0xffff)), ('unpack_half_2x16_split_x', a)),
(('unpack_32_2x16_split_x', ('iand', a, 0xffff)), ('unpack_32_2x16_split_x', a)),
(('unpack_64_2x32_split_x', ('iand', a, 0xffffffff)), ('unpack_64_2x32_split_x', a)),
(('unpack_half_2x16_split_y', ('iand', a, 0xffff0000)), ('unpack_half_2x16_split_y', a)),
(('unpack_32_2x16_split_y', ('iand', a, 0xffff0000)), ('unpack_32_2x16_split_y', a)),
(('unpack_64_2x32_split_y', ('iand', a, 0xffffffff00000000)), ('unpack_64_2x32_split_y', a)),
# Optimize half packing
(('ishl', ('pack_half_2x16', ('vec2', a, 0)), 16), ('pack_half_2x16', ('vec2', 0, a))),
(('ishr', ('pack_half_2x16', ('vec2', 0, a)), 16), ('pack_half_2x16', ('vec2', a, 0))),
(('iadd', ('pack_half_2x16', ('vec2', a, 0)), ('pack_half_2x16', ('vec2', 0, b))),
('pack_half_2x16', ('vec2', a, b))),
(('ior', ('pack_half_2x16', ('vec2', a, 0)), ('pack_half_2x16', ('vec2', 0, b))),
('pack_half_2x16', ('vec2', a, b))),
nir/algebraic: Reassociate shift-by-constant of shift-by-constant v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-06 21:11:56 +01:00
])
# After the ('extract_u8', a, 0) pattern, above, triggers, there will be
# patterns like those below.
for op in ('ushr', 'ishr'):
optimizations.extend([(('extract_u8', (op, 'a@16', 8), 0), ('extract_u8', a, 1))])
optimizations.extend([(('extract_u8', (op, 'a@32', 8 * i), 0), ('extract_u8', a, i)) for i in range(1, 4)])
optimizations.extend([(('extract_u8', (op, 'a@64', 8 * i), 0), ('extract_u8', a, i)) for i in range(1, 8)])
optimizations.extend([(('extract_u8', ('extract_u16', a, 1), 0), ('extract_u8', a, 2))])
# After the ('extract_[iu]8', a, 3) patterns, above, trigger, there will be
# patterns like those below.
for op in ('extract_u8', 'extract_i8'):
optimizations.extend([((op, ('ishl', 'a@16', 8), 1), (op, a, 0))])
optimizations.extend([((op, ('ishl', 'a@32', 24 - 8 * i), 3), (op, a, i)) for i in range(2, -1, -1)])
optimizations.extend([((op, ('ishl', 'a@64', 56 - 8 * i), 7), (op, a, i)) for i in range(6, -1, -1)])
optimizations.extend([
# Word extraction
(('ushr', ('ishl', 'a@32', 16), 16), ('extract_u16', a, 0), '!options->lower_extract_word'),
(('ushr', 'a@32', 16), ('extract_u16', a, 1), '!options->lower_extract_word'),
(('ishr', ('ishl', 'a@32', 16), 16), ('extract_i16', a, 0), '!options->lower_extract_word'),
(('ishr', 'a@32', 16), ('extract_i16', a, 1), '!options->lower_extract_word'),
(('iand', 0xffff, a), ('extract_u16', a, 0), '!options->lower_extract_word'),
# Subtracts
(('ussub_4x8', a, 0), a),
(('ussub_4x8', a, ~0), 0),
# Lower all Subtractions first - they can get recombined later
(('fsub', a, b), ('fadd', a, ('fneg', b))),
(('isub', a, b), ('iadd', a, ('ineg', b))),
(('uabs_usub', a, b), ('bcsel', ('ult', a, b), ('ineg', ('isub', a, b)), ('isub', a, b))),
# This is correct. We don't need isub_sat because the result type is unsigned, so it cannot overflow.
(('uabs_isub', a, b), ('bcsel', ('ilt', a, b), ('ineg', ('isub', a, b)), ('isub', a, b))),
# Propagate negation up multiplication chains
nir/algebraic: Push unary operations into source operands of fsat source Pushing a unary operation, like fneg, into the operation that generates its operand allows the fsat to be applied to the inner instruction instead of on a separate instruction that performs the unary operation. This changes fmul ssa_100, ssa_99, ssa_98 fmov.sat ssa_101, -ssa_100 into fmul.sat ssa_100, -ssa_99, ssa_98 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 17228658 -> 17228584 (<.01%) instructions in affected programs: 3163 -> 3089 (-2.34%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2 helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51% 95% mean confidence interval for instructions value: -1.66 -1.37 95% mean confidence interval for instructions %-change: -4.37% -3.00% Instructions are helped. total cycles in shared programs: 360937144 -> 360936431 (<.01%) cycles in affected programs: 24029 -> 23316 (-2.97%) helped: 47 HURT: 2 helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16 helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for cycles value: -16.05 -13.05 95% mean confidence interval for cycles %-change: -4.07% -3.15% Cycles are helped. All Gen7 and earlier platforms had similar results. (Haswell shown) total instructions in shared programs: 13536059 -> 13535884 (<.01%) instructions in affected programs: 8797 -> 8622 (-1.99%) helped: 150 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96% 95% mean confidence interval for instructions value: -1.23 -1.11 95% mean confidence interval for instructions %-change: -3.97% -3.05% Instructions are helped. total cycles in shared programs: 357696119 -> 357694193 (<.01%) cycles in affected programs: 50216 -> 48290 (-3.84%) helped: 109 HURT: 14 helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16 helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37% HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5 HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92% 95% mean confidence interval for cycles value: -19.27 -12.05 95% mean confidence interval for cycles %-change: -7.34% -5.31% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-10-23 22:30:41 +01:00
(('fmul(is_used_by_non_fsat)', ('fneg', a), b), ('fneg', ('fmul', a, b))),
2016-03-31 03:18:16 +01:00
(('imul', ('ineg', a), b), ('ineg', ('imul', a, b))),
nir: shuffle constants to the top V2: mark float opts as inexact If one of the inputs to an mul/add is the result of another mul/add there is a chance that we can reuse the result of that mul/add in other calls if we do the multiplication in the right order. Also by attempting to move all constants to the top we increase the chance of constant folding. For example it is a fairly common pattern for shaders to do something similar to this: const float a = 0.5; in vec4 b; in float c; ... b.x = b.x * c; b.y = b.y * c; ... b.x = b.x * a + a; b.y = b.y * a + a; So by simply detecting that constant a is part of the multiplication in ffma and switching it with previous fmul that updates b we end up with: ... c = a * c; ... b.x = b.x * c + a; b.y = b.y * c + a; Shader-db results BDW: total instructions in shared programs: 13011050 -> 12967888 (-0.33%) instructions in affected programs: 4118366 -> 4075204 (-1.05%) helped: 17739 HURT: 1343 total cycles in shared programs: 246717952 -> 246410716 (-0.12%) cycles in affected programs: 166870802 -> 166563566 (-0.18%) helped: 18493 HURT: 7965 total spills in shared programs: 14937 -> 14560 (-2.52%) spills in affected programs: 9331 -> 8954 (-4.04%) helped: 284 HURT: 33 total fills in shared programs: 20211 -> 19671 (-2.67%) fills in affected programs: 12586 -> 12046 (-4.29%) helped: 286 HURT: 33 LOST: 39 GAINED: 33 Some of the hurt will go away when we shuffle things back down to the bottom in the following patch. It's also noteworthy that almost all of the spill changes are in Deus Ex both hurt and helped. Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-12 02:10:55 +00:00
# Propagate constants up multiplication chains
(('~fmul(is_used_once)', ('fmul(is_used_once)', 'a(is_not_const)', 'b(is_not_const)'), '#c'), ('fmul', ('fmul', a, c), b)),
(('imul(is_used_once)', ('imul(is_used_once)', 'a(is_not_const)', 'b(is_not_const)'), '#c'), ('imul', ('imul', a, c), b)),
(('~fadd(is_used_once)', ('fadd(is_used_once)', 'a(is_not_const)', 'b(is_not_const)'), '#c'), ('fadd', ('fadd', a, c), b)),
(('iadd(is_used_once)', ('iadd(is_used_once)', 'a(is_not_const)', 'b(is_not_const)'), '#c'), ('iadd', ('iadd', a, c), b)),
# Reassociate constants in add/mul chains so they can be folded together.
nir: shuffle constants to the top V2: mark float opts as inexact If one of the inputs to an mul/add is the result of another mul/add there is a chance that we can reuse the result of that mul/add in other calls if we do the multiplication in the right order. Also by attempting to move all constants to the top we increase the chance of constant folding. For example it is a fairly common pattern for shaders to do something similar to this: const float a = 0.5; in vec4 b; in float c; ... b.x = b.x * c; b.y = b.y * c; ... b.x = b.x * a + a; b.y = b.y * a + a; So by simply detecting that constant a is part of the multiplication in ffma and switching it with previous fmul that updates b we end up with: ... c = a * c; ... b.x = b.x * c + a; b.y = b.y * c + a; Shader-db results BDW: total instructions in shared programs: 13011050 -> 12967888 (-0.33%) instructions in affected programs: 4118366 -> 4075204 (-1.05%) helped: 17739 HURT: 1343 total cycles in shared programs: 246717952 -> 246410716 (-0.12%) cycles in affected programs: 166870802 -> 166563566 (-0.18%) helped: 18493 HURT: 7965 total spills in shared programs: 14937 -> 14560 (-2.52%) spills in affected programs: 9331 -> 8954 (-4.04%) helped: 284 HURT: 33 total fills in shared programs: 20211 -> 19671 (-2.67%) fills in affected programs: 12586 -> 12046 (-4.29%) helped: 286 HURT: 33 LOST: 39 GAINED: 33 Some of the hurt will go away when we shuffle things back down to the bottom in the following patch. It's also noteworthy that almost all of the spill changes are in Deus Ex both hurt and helped. Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-12 02:10:55 +00:00
# For now, we mostly only handle cases where the constants are separated by
# a single non-constant. We could do better eventually.
(('~fmul', '#a', ('fmul', 'b(is_not_const)', '#c')), ('fmul', ('fmul', a, c), b)),
(('imul', '#a', ('imul', 'b(is_not_const)', '#c')), ('imul', ('imul', a, c), b)),
(('~fadd', '#a', ('fadd', 'b(is_not_const)', '#c')), ('fadd', ('fadd', a, c), b)),
(('~fadd', '#a', ('fneg', ('fadd', 'b(is_not_const)', '#c'))), ('fadd', ('fadd', a, ('fneg', c)), ('fneg', b))),
(('iadd', '#a', ('iadd', 'b(is_not_const)', '#c')), ('iadd', ('iadd', a, c), b)),
# Drop mul-div by the same value when there's no wrapping.
(('idiv', ('imul(no_signed_wrap)', a, b), b), a),
# By definition...
(('bcsel', ('ige', ('find_lsb', a), 0), ('find_lsb', a), -1), ('find_lsb', a)),
(('bcsel', ('ige', ('ifind_msb', a), 0), ('ifind_msb', a), -1), ('ifind_msb', a)),
(('bcsel', ('ige', ('ufind_msb', a), 0), ('ufind_msb', a), -1), ('ufind_msb', a)),
(('bcsel', ('ine', a, 0), ('find_lsb', a), -1), ('find_lsb', a)),
(('bcsel', ('ine', a, 0), ('ifind_msb', a), -1), ('ifind_msb', a)),
(('bcsel', ('ine', a, 0), ('ufind_msb', a), -1), ('ufind_msb', a)),
(('bcsel', ('ine', a, -1), ('ifind_msb', a), -1), ('ifind_msb', a)),
# Misc. lowering
(('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b)))), 'options->lower_fmod'),
(('frem', a, b), ('fsub', a, ('fmul', b, ('ftrunc', ('fdiv', a, b)))), 'options->lower_fmod'),
(('uadd_carry@32', a, b), ('b2i', ('ult', ('iadd', a, b), a)), 'options->lower_uadd_carry'),
(('usub_borrow@32', a, b), ('b2i', ('ult', a, b)), 'options->lower_usub_borrow'),
(('bitfield_insert', 'base', 'insert', 'offset', 'bits'),
('bcsel', ('ult', 31, 'bits'), 'insert',
('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')),
'options->lower_bitfield_insert'),
(('ihadd', a, b), ('iadd', ('iand', a, b), ('ishr', ('ixor', a, b), 1)), 'options->lower_hadd'),
(('uhadd', a, b), ('iadd', ('iand', a, b), ('ushr', ('ixor', a, b), 1)), 'options->lower_hadd'),
(('irhadd', a, b), ('isub', ('ior', a, b), ('ishr', ('ixor', a, b), 1)), 'options->lower_hadd'),
(('urhadd', a, b), ('isub', ('ior', a, b), ('ushr', ('ixor', a, b), 1)), 'options->lower_hadd'),
(('ihadd@64', a, b), ('iadd', ('iand', a, b), ('ishr', ('ixor', a, b), 1)), 'options->lower_hadd64 || (options->lower_int64_options & nir_lower_iadd64) != 0'),
(('uhadd@64', a, b), ('iadd', ('iand', a, b), ('ushr', ('ixor', a, b), 1)), 'options->lower_hadd64 || (options->lower_int64_options & nir_lower_iadd64) != 0'),
(('irhadd@64', a, b), ('isub', ('ior', a, b), ('ishr', ('ixor', a, b), 1)), 'options->lower_hadd64 || (options->lower_int64_options & nir_lower_iadd64) != 0'),
(('urhadd@64', a, b), ('isub', ('ior', a, b), ('ushr', ('ixor', a, b), 1)), 'options->lower_hadd64 || (options->lower_int64_options & nir_lower_iadd64) != 0'),
(('uadd_sat@64', a, b), ('bcsel', ('ult', ('iadd', a, b), a), -1, ('iadd', a, b)), 'options->lower_add_sat || (options->lower_int64_options & nir_lower_iadd64) != 0'),
(('uadd_sat', a, b), ('bcsel', ('ult', ('iadd', a, b), a), -1, ('iadd', a, b)), 'options->lower_add_sat'),
(('usub_sat', a, b), ('bcsel', ('ult', a, b), 0, ('isub', a, b)), 'options->lower_add_sat'),
(('usub_sat@64', a, b), ('bcsel', ('ult', a, b), 0, ('isub', a, b)), 'options->lower_usub_sat64 || (options->lower_int64_options & nir_lower_iadd64) != 0'),
# int64_t sum = a + b;
#
# if (a < 0 && b < 0 && a < sum)
# sum = INT64_MIN;
# } else if (a >= 0 && b >= 0 && sum < a)
# sum = INT64_MAX;
# }
#
# A couple optimizations are applied.
#
# 1. a < sum => sum >= 0. This replacement works because it is known that
# a < 0 and b < 0, so sum should also be < 0 unless there was
# underflow.
#
# 2. sum < a => sum < 0. This replacement works because it is known that
# a >= 0 and b >= 0, so sum should also be >= 0 unless there was
# overflow.
#
# 3. Invert the second if-condition and swap the order of parameters for
# the bcsel. !(a >= 0 && b >= 0 && sum < 0) becomes !(a >= 0) || !(b >=
# 0) || !(sum < 0), and that becomes (a < 0) || (b < 0) || (sum >= 0)
#
# On Intel Gen11, this saves ~11 instructions.
(('iadd_sat@64', a, b), ('bcsel',
('iand', ('iand', ('ilt', a, 0), ('ilt', b, 0)), ('ige', ('iadd', a, b), 0)),
0x8000000000000000,
('bcsel',
('ior', ('ior', ('ilt', a, 0), ('ilt', b, 0)), ('ige', ('iadd', a, b), 0)),
('iadd', a, b),
0x7fffffffffffffff)),
'(options->lower_int64_options & nir_lower_iadd64) != 0'),
# int64_t sum = a - b;
#
# if (a < 0 && b >= 0 && a < sum)
# sum = INT64_MIN;
# } else if (a >= 0 && b < 0 && a >= sum)
# sum = INT64_MAX;
# }
#
# Optimizations similar to the iadd_sat case are applied here.
(('isub_sat@64', a, b), ('bcsel',
('iand', ('iand', ('ilt', a, 0), ('ige', b, 0)), ('ige', ('isub', a, b), 0)),
0x8000000000000000,
('bcsel',
('ior', ('ior', ('ilt', a, 0), ('ige', b, 0)), ('ige', ('isub', a, b), 0)),
('isub', a, b),
0x7fffffffffffffff)),
'(options->lower_int64_options & nir_lower_iadd64) != 0'),
# These are done here instead of in the backend because the int64 lowering
# pass will make a mess of the patterns. The first patterns are
# conditioned on nir_lower_minmax64 because it was not clear that it was
# always an improvement on platforms that have real int64 support. No
# shaders in shader-db hit this, so it was hard to say one way or the
# other.
(('ilt', ('imax(is_used_once)', 'a@64', 'b@64'), 0), ('ilt', ('imax', ('unpack_64_2x32_split_y', a), ('unpack_64_2x32_split_y', b)), 0), '(options->lower_int64_options & nir_lower_minmax64) != 0'),
(('ilt', ('imin(is_used_once)', 'a@64', 'b@64'), 0), ('ilt', ('imin', ('unpack_64_2x32_split_y', a), ('unpack_64_2x32_split_y', b)), 0), '(options->lower_int64_options & nir_lower_minmax64) != 0'),
(('ige', ('imax(is_used_once)', 'a@64', 'b@64'), 0), ('ige', ('imax', ('unpack_64_2x32_split_y', a), ('unpack_64_2x32_split_y', b)), 0), '(options->lower_int64_options & nir_lower_minmax64) != 0'),
(('ige', ('imin(is_used_once)', 'a@64', 'b@64'), 0), ('ige', ('imin', ('unpack_64_2x32_split_y', a), ('unpack_64_2x32_split_y', b)), 0), '(options->lower_int64_options & nir_lower_minmax64) != 0'),
(('ilt', 'a@64', 0), ('ilt', ('unpack_64_2x32_split_y', a), 0), '(options->lower_int64_options & nir_lower_icmp64) != 0'),
(('ige', 'a@64', 0), ('ige', ('unpack_64_2x32_split_y', a), 0), '(options->lower_int64_options & nir_lower_icmp64) != 0'),
(('ine', 'a@64', 0), ('ine', ('ior', ('unpack_64_2x32_split_x', a), ('unpack_64_2x32_split_y', a)), 0), '(options->lower_int64_options & nir_lower_icmp64) != 0'),
(('ieq', 'a@64', 0), ('ieq', ('ior', ('unpack_64_2x32_split_x', a), ('unpack_64_2x32_split_y', a)), 0), '(options->lower_int64_options & nir_lower_icmp64) != 0'),
# 0u < uint(a) <=> uint(a) != 0u
(('ult', 0, 'a@64'), ('ine', ('ior', ('unpack_64_2x32_split_x', a), ('unpack_64_2x32_split_y', a)), 0), '(options->lower_int64_options & nir_lower_icmp64) != 0'),
# Alternative lowering that doesn't rely on bfi.
(('bitfield_insert', 'base', 'insert', 'offset', 'bits'),
('bcsel', ('ult', 31, 'bits'),
'insert',
(('ior',
('iand', 'base', ('inot', ('ishl', ('isub', ('ishl', 1, 'bits'), 1), 'offset'))),
('iand', ('ishl', 'insert', 'offset'), ('ishl', ('isub', ('ishl', 1, 'bits'), 1), 'offset'))))),
'options->lower_bitfield_insert_to_shifts'),
# Alternative lowering that uses bitfield_select.
(('bitfield_insert', 'base', 'insert', 'offset', 'bits'),
('bcsel', ('ult', 31, 'bits'), 'insert',
('bitfield_select', ('bfm', 'bits', 'offset'), ('ishl', 'insert', 'offset'), 'base')),
'options->lower_bitfield_insert_to_bitfield_select'),
(('ibitfield_extract', 'value', 'offset', 'bits'),
('bcsel', ('ult', 31, 'bits'), 'value',
('ibfe', 'value', 'offset', 'bits')),
'options->lower_bitfield_extract'),
(('ubitfield_extract', 'value', 'offset', 'bits'),
('bcsel', ('ult', 31, 'bits'), 'value',
('ubfe', 'value', 'offset', 'bits')),
'options->lower_bitfield_extract'),
# Note that these opcodes are defined to only use the five least significant bits of 'offset' and 'bits'
(('ubfe', 'value', 'offset', ('iand', 31, 'bits')), ('ubfe', 'value', 'offset', 'bits')),
(('ubfe', 'value', ('iand', 31, 'offset'), 'bits'), ('ubfe', 'value', 'offset', 'bits')),
(('ibfe', 'value', 'offset', ('iand', 31, 'bits')), ('ibfe', 'value', 'offset', 'bits')),
(('ibfe', 'value', ('iand', 31, 'offset'), 'bits'), ('ibfe', 'value', 'offset', 'bits')),
(('bfm', 'bits', ('iand', 31, 'offset')), ('bfm', 'bits', 'offset')),
(('bfm', ('iand', 31, 'bits'), 'offset'), ('bfm', 'bits', 'offset')),
(('ibitfield_extract', 'value', 'offset', 'bits'),
('bcsel', ('ieq', 0, 'bits'),
0,
('ishr',
('ishl', 'value', ('isub', ('isub', 32, 'bits'), 'offset')),
('isub', 32, 'bits'))),
'options->lower_bitfield_extract_to_shifts'),
(('ubitfield_extract', 'value', 'offset', 'bits'),
('iand',
('ushr', 'value', 'offset'),
('bcsel', ('ieq', 'bits', 32),
0xffffffff,
('isub', ('ishl', 1, 'bits'), 1))),
'options->lower_bitfield_extract_to_shifts'),
(('ifind_msb', 'value'),
('ufind_msb', ('bcsel', ('ilt', 'value', 0), ('inot', 'value'), 'value')),
'options->lower_ifind_msb'),
(('find_lsb', 'value'),
('ufind_msb', ('iand', 'value', ('ineg', 'value'))),
'options->lower_find_lsb'),
nir: Shift count for shift opcodes is always 32-bits Previously both sources were unsized. This caused problems when the thing being shifted was 64-bit but the shift count was 32-bit. The expectation in NIR is that all unsized sources (and destination) will ultimately have the same size. The changes in nir_opt_algebraic.py are to prevent errors like: Failed to parse transformation: 03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte') 03:12:25 Traceback (most recent call last): 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__ 03:12:25 xform = SearchAndReplace(xform) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__ 03:12:25 BitSizeValidator(varset).validate(self.search, self.replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate 03:12:25 validate_dst_class = self._validate_bit_class_up(replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up 03:12:25 src_class = self._validate_bit_class_up(val.sources[i]) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up 03:12:25 assert src_class == src_type_bits 03:12:25 AssertionError Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: Jason Ekstrand <jason@jlekstrand.net>
2016-10-27 11:02:58 +01:00
(('extract_i8', a, 'b@32'),
('ishr', ('ishl', a, ('imul', ('isub', 3, b), 8)), 24),
'options->lower_extract_byte'),
nir: Shift count for shift opcodes is always 32-bits Previously both sources were unsized. This caused problems when the thing being shifted was 64-bit but the shift count was 32-bit. The expectation in NIR is that all unsized sources (and destination) will ultimately have the same size. The changes in nir_opt_algebraic.py are to prevent errors like: Failed to parse transformation: 03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte') 03:12:25 Traceback (most recent call last): 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__ 03:12:25 xform = SearchAndReplace(xform) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__ 03:12:25 BitSizeValidator(varset).validate(self.search, self.replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate 03:12:25 validate_dst_class = self._validate_bit_class_up(replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up 03:12:25 src_class = self._validate_bit_class_up(val.sources[i]) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up 03:12:25 assert src_class == src_type_bits 03:12:25 AssertionError Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: Jason Ekstrand <jason@jlekstrand.net>
2016-10-27 11:02:58 +01:00
(('extract_u8', a, 'b@32'),
('iand', ('ushr', a, ('imul', b, 8)), 0xff),
'options->lower_extract_byte'),
nir: Shift count for shift opcodes is always 32-bits Previously both sources were unsized. This caused problems when the thing being shifted was 64-bit but the shift count was 32-bit. The expectation in NIR is that all unsized sources (and destination) will ultimately have the same size. The changes in nir_opt_algebraic.py are to prevent errors like: Failed to parse transformation: 03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte') 03:12:25 Traceback (most recent call last): 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__ 03:12:25 xform = SearchAndReplace(xform) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__ 03:12:25 BitSizeValidator(varset).validate(self.search, self.replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate 03:12:25 validate_dst_class = self._validate_bit_class_up(replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up 03:12:25 src_class = self._validate_bit_class_up(val.sources[i]) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up 03:12:25 assert src_class == src_type_bits 03:12:25 AssertionError Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: Jason Ekstrand <jason@jlekstrand.net>
2016-10-27 11:02:58 +01:00
(('extract_i16', a, 'b@32'),
('ishr', ('ishl', a, ('imul', ('isub', 1, b), 16)), 16),
'options->lower_extract_word'),
nir: Shift count for shift opcodes is always 32-bits Previously both sources were unsized. This caused problems when the thing being shifted was 64-bit but the shift count was 32-bit. The expectation in NIR is that all unsized sources (and destination) will ultimately have the same size. The changes in nir_opt_algebraic.py are to prevent errors like: Failed to parse transformation: 03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte') 03:12:25 Traceback (most recent call last): 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__ 03:12:25 xform = SearchAndReplace(xform) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__ 03:12:25 BitSizeValidator(varset).validate(self.search, self.replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate 03:12:25 validate_dst_class = self._validate_bit_class_up(replace) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up 03:12:25 src_class = self._validate_bit_class_up(val.sources[i]) 03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up 03:12:25 assert src_class == src_type_bits 03:12:25 AssertionError Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: Jason Ekstrand <jason@jlekstrand.net>
2016-10-27 11:02:58 +01:00
(('extract_u16', a, 'b@32'),
('iand', ('ushr', a, ('imul', b, 16)), 0xffff),
'options->lower_extract_word'),
(('pack_unorm_2x16', 'v'),
('pack_uvec2_to_uint',
('f2u32', ('fround_even', ('fmul', ('fsat', 'v'), 65535.0)))),
'options->lower_pack_unorm_2x16'),
(('pack_unorm_4x8', 'v'),
('pack_uvec4_to_uint',
('f2u32', ('fround_even', ('fmul', ('fsat', 'v'), 255.0)))),
'options->lower_pack_unorm_4x8'),
(('pack_snorm_2x16', 'v'),
('pack_uvec2_to_uint',
('f2i32', ('fround_even', ('fmul', ('fmin', 1.0, ('fmax', -1.0, 'v')), 32767.0)))),
'options->lower_pack_snorm_2x16'),
(('pack_snorm_4x8', 'v'),
('pack_uvec4_to_uint',
('f2i32', ('fround_even', ('fmul', ('fmin', 1.0, ('fmax', -1.0, 'v')), 127.0)))),
'options->lower_pack_snorm_4x8'),
(('unpack_unorm_2x16', 'v'),
('fdiv', ('u2f32', ('vec2', ('extract_u16', 'v', 0),
('extract_u16', 'v', 1))),
65535.0),
'options->lower_unpack_unorm_2x16'),
(('unpack_unorm_4x8', 'v'),
('fdiv', ('u2f32', ('vec4', ('extract_u8', 'v', 0),
('extract_u8', 'v', 1),
('extract_u8', 'v', 2),
('extract_u8', 'v', 3))),
255.0),
'options->lower_unpack_unorm_4x8'),
(('unpack_snorm_2x16', 'v'),
('fmin', 1.0, ('fmax', -1.0, ('fdiv', ('i2f', ('vec2', ('extract_i16', 'v', 0),
('extract_i16', 'v', 1))),
32767.0))),
'options->lower_unpack_snorm_2x16'),
(('unpack_snorm_4x8', 'v'),
('fmin', 1.0, ('fmax', -1.0, ('fdiv', ('i2f', ('vec4', ('extract_i8', 'v', 0),
('extract_i8', 'v', 1),
('extract_i8', 'v', 2),
('extract_i8', 'v', 3))),
127.0))),
'options->lower_unpack_snorm_4x8'),
(('pack_half_2x16_split', 'a@32', 'b@32'),
('ior', ('ishl', ('u2u32', ('f2f16', b)), 16), ('u2u32', ('f2f16', a))),
'options->lower_pack_half_2x16_split'),
(('unpack_half_2x16_split_x', 'a@32'),
('f2f32', ('u2u16', a)),
'options->lower_unpack_half_2x16_split'),
(('unpack_half_2x16_split_y', 'a@32'),
('f2f32', ('u2u16', ('ushr', a, 16))),
'options->lower_unpack_half_2x16_split'),
(('isign', a), ('imin', ('imax', a, -1), 1), 'options->lower_isign'),
(('fsign', a), ('fsub', ('b2f', ('flt', 0.0, a)), ('b2f', ('flt', a, 0.0))), 'options->lower_fsign'),
# Address/offset calculations:
# Drivers supporting imul24 should use the nir_lower_amul() pass, this
# rule converts everyone else to imul:
(('amul', a, b), ('imul', a, b), '!options->has_imul24'),
(('imad24_ir3', a, b, 0), ('imul24', a, b)),
(('imad24_ir3', a, 0, c), (c)),
(('imad24_ir3', a, 1, c), ('iadd', a, c)),
# if first two srcs are const, crack apart the imad so constant folding
# can clean up the imul:
# TODO ffma should probably get a similar rule:
(('imad24_ir3', '#a', '#b', c), ('iadd', ('imul', a, b), c)),
# These will turn 24b address/offset calc back into 32b shifts, but
# it should be safe to get back some of the bits of precision that we
# already decided were no necessary:
(('imul24', a, '#b@32(is_pos_power_of_two)'), ('ishl', a, ('find_lsb', b)), '!options->lower_bitops'),
(('imul24', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('ishl', a, ('find_lsb', ('iabs', b)))), '!options->lower_bitops'),
(('imul24', a, 0), (0)),
])
# bit_size dependent lowerings
for bit_size in [8, 16, 32, 64]:
# convenience constants
intmax = (1 << (bit_size - 1)) - 1
intmin = 1 << (bit_size - 1)
optimizations += [
(('iadd_sat@' + str(bit_size), a, b),
('bcsel', ('ige', b, 1), ('bcsel', ('ilt', ('iadd', a, b), a), intmax, ('iadd', a, b)),
('bcsel', ('ilt', a, ('iadd', a, b)), intmin, ('iadd', a, b))), 'options->lower_add_sat'),
(('isub_sat@' + str(bit_size), a, b),
('bcsel', ('ilt', b, 0), ('bcsel', ('ilt', ('isub', a, b), a), intmax, ('isub', a, b)),
('bcsel', ('ilt', a, ('isub', a, b)), intmin, ('isub', a, b))), 'options->lower_add_sat'),
]
invert = OrderedDict([('feq', 'fne'), ('fne', 'feq')])
nir: Use De Morgan's Law on logic compounded comparisons The replacement of the comparison operators must happen during this step. If it does not, the next pass of nir_opt_algebraic will reapply De Morgan's Law in the "opposite direction" before performing dead code elimination. The resulting infinite loop will eventually get OOM killed. Haswell, Broadwell, and Skylake had similar results. (Broadwell shown) total instructions in shared programs: 14808185 -> 14808036 (<.01%) instructions in affected programs: 13758 -> 13609 (-1.08%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 3.82 x̃: 3 helped stats (rel) min: 0.44% max: 1.55% x̄: 0.98% x̃: 1.01% 95% mean confidence interval for instructions value: -4.67 -2.97 95% mean confidence interval for instructions %-change: -1.09% -0.88% Instructions are helped. total cycles in shared programs: 559438333 -> 559435832 (<.01%) cycles in affected programs: 199160 -> 196659 (-1.26%) helped: 42 HURT: 3 helped stats (abs) min: 2 max: 184 x̄: 61.50 x̃: 51 helped stats (rel) min: 0.02% max: 6.94% x̄: 1.41% x̃: 1.40% HURT stats (abs) min: 2 max: 40 x̄: 27.33 x̃: 40 HURT stats (rel) min: 0.05% max: 0.74% x̄: 0.51% x̃: 0.74% 95% mean confidence interval for cycles value: -71.47 -39.69 95% mean confidence interval for cycles %-change: -1.64% -0.93% Cycles are helped. Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11811776 -> 11811553 (<.01%) instructions in affected programs: 15201 -> 14978 (-1.47%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 5.72 x̃: 6 helped stats (rel) min: 0.44% max: 2.53% x̄: 1.30% x̃: 1.26% 95% mean confidence interval for instructions value: -7.21 -4.23 95% mean confidence interval for instructions %-change: -1.48% -1.12% Instructions are helped. total cycles in shared programs: 257617270 -> 257614589 (<.01%) cycles in affected programs: 212107 -> 209426 (-1.26%) helped: 45 HURT: 0 helped stats (abs) min: 2 max: 180 x̄: 59.58 x̃: 54 helped stats (rel) min: 0.02% max: 6.02% x̄: 1.30% x̃: 1.32% 95% mean confidence interval for cycles value: -74.02 -45.14 95% mean confidence interval for cycles %-change: -1.59% -1.01% Cycles are helped. Iron Lake total instructions in shared programs: 7886648 -> 7886515 (<.01%) instructions in affected programs: 14106 -> 13973 (-0.94%) helped: 29 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.59 x̃: 4 helped stats (rel) min: 0.35% max: 1.83% x̄: 0.90% x̃: 0.81% 95% mean confidence interval for instructions value: -5.65 -3.52 95% mean confidence interval for instructions %-change: -1.03% -0.76% Instructions are helped. total cycles in shared programs: 178100812 -> 178100396 (<.01%) cycles in affected programs: 67970 -> 67554 (-0.61%) helped: 29 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 14.34 x̃: 12 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.58% x̃: 0.54% 95% mean confidence interval for cycles value: -18.30 -10.39 95% mean confidence interval for cycles %-change: -0.71% -0.45% Cycles are helped. GM45 total instructions in shared programs: 4857939 -> 4857872 (<.01%) instructions in affected programs: 7426 -> 7359 (-0.90%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.47 x̃: 4 helped stats (rel) min: 0.33% max: 1.80% x̄: 0.87% x̃: 0.77% 95% mean confidence interval for instructions value: -6.06 -2.87 95% mean confidence interval for instructions %-change: -1.06% -0.67% Instructions are helped. total cycles in shared programs: 122167930 -> 122167654 (<.01%) cycles in affected programs: 43118 -> 42842 (-0.64%) helped: 15 HURT: 0 helped stats (abs) min: 4 max: 40 x̄: 18.40 x̃: 16 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.62% x̃: 0.54% 95% mean confidence interval for cycles value: -25.03 -11.77 95% mean confidence interval for cycles %-change: -0.82% -0.41% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-07 01:27:53 +00:00
for left, right in itertools.combinations_with_replacement(invert.keys(), 2):
nir: Use De Morgan's Law on logic compounded comparisons The replacement of the comparison operators must happen during this step. If it does not, the next pass of nir_opt_algebraic will reapply De Morgan's Law in the "opposite direction" before performing dead code elimination. The resulting infinite loop will eventually get OOM killed. Haswell, Broadwell, and Skylake had similar results. (Broadwell shown) total instructions in shared programs: 14808185 -> 14808036 (<.01%) instructions in affected programs: 13758 -> 13609 (-1.08%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 3.82 x̃: 3 helped stats (rel) min: 0.44% max: 1.55% x̄: 0.98% x̃: 1.01% 95% mean confidence interval for instructions value: -4.67 -2.97 95% mean confidence interval for instructions %-change: -1.09% -0.88% Instructions are helped. total cycles in shared programs: 559438333 -> 559435832 (<.01%) cycles in affected programs: 199160 -> 196659 (-1.26%) helped: 42 HURT: 3 helped stats (abs) min: 2 max: 184 x̄: 61.50 x̃: 51 helped stats (rel) min: 0.02% max: 6.94% x̄: 1.41% x̃: 1.40% HURT stats (abs) min: 2 max: 40 x̄: 27.33 x̃: 40 HURT stats (rel) min: 0.05% max: 0.74% x̄: 0.51% x̃: 0.74% 95% mean confidence interval for cycles value: -71.47 -39.69 95% mean confidence interval for cycles %-change: -1.64% -0.93% Cycles are helped. Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11811776 -> 11811553 (<.01%) instructions in affected programs: 15201 -> 14978 (-1.47%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 5.72 x̃: 6 helped stats (rel) min: 0.44% max: 2.53% x̄: 1.30% x̃: 1.26% 95% mean confidence interval for instructions value: -7.21 -4.23 95% mean confidence interval for instructions %-change: -1.48% -1.12% Instructions are helped. total cycles in shared programs: 257617270 -> 257614589 (<.01%) cycles in affected programs: 212107 -> 209426 (-1.26%) helped: 45 HURT: 0 helped stats (abs) min: 2 max: 180 x̄: 59.58 x̃: 54 helped stats (rel) min: 0.02% max: 6.02% x̄: 1.30% x̃: 1.32% 95% mean confidence interval for cycles value: -74.02 -45.14 95% mean confidence interval for cycles %-change: -1.59% -1.01% Cycles are helped. Iron Lake total instructions in shared programs: 7886648 -> 7886515 (<.01%) instructions in affected programs: 14106 -> 13973 (-0.94%) helped: 29 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.59 x̃: 4 helped stats (rel) min: 0.35% max: 1.83% x̄: 0.90% x̃: 0.81% 95% mean confidence interval for instructions value: -5.65 -3.52 95% mean confidence interval for instructions %-change: -1.03% -0.76% Instructions are helped. total cycles in shared programs: 178100812 -> 178100396 (<.01%) cycles in affected programs: 67970 -> 67554 (-0.61%) helped: 29 HURT: 0 helped stats (abs) min: 2 max: 40 x̄: 14.34 x̃: 12 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.58% x̃: 0.54% 95% mean confidence interval for cycles value: -18.30 -10.39 95% mean confidence interval for cycles %-change: -0.71% -0.45% Cycles are helped. GM45 total instructions in shared programs: 4857939 -> 4857872 (<.01%) instructions in affected programs: 7426 -> 7359 (-0.90%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 10 x̄: 4.47 x̃: 4 helped stats (rel) min: 0.33% max: 1.80% x̄: 0.87% x̃: 0.77% 95% mean confidence interval for instructions value: -6.06 -2.87 95% mean confidence interval for instructions %-change: -1.06% -0.67% Instructions are helped. total cycles in shared programs: 122167930 -> 122167654 (<.01%) cycles in affected programs: 43118 -> 42842 (-0.64%) helped: 15 HURT: 0 helped stats (abs) min: 4 max: 40 x̄: 18.40 x̃: 16 helped stats (rel) min: 0.15% max: 1.69% x̄: 0.62% x̃: 0.54% 95% mean confidence interval for cycles value: -25.03 -11.77 95% mean confidence interval for cycles %-change: -0.82% -0.41% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-02-07 01:27:53 +00:00
optimizations.append((('inot', ('ior(is_used_once)', (left, a, b), (right, c, d))),
('iand', (invert[left], a, b), (invert[right], c, d))))
optimizations.append((('inot', ('iand(is_used_once)', (left, a, b), (right, c, d))),
('ior', (invert[left], a, b), (invert[right], c, d))))
# Optimize x2bN(b2x(x)) -> x
for size in type_sizes('bool'):
aN = 'a@' + str(size)
f2bN = 'f2b' + str(size)
i2bN = 'i2b' + str(size)
optimizations.append(((f2bN, ('b2f', aN)), a))
optimizations.append(((i2bN, ('b2i', aN)), a))
# Optimize x2yN(b2x(x)) -> b2y
for x, y in itertools.product(['f', 'u', 'i'], ['f', 'u', 'i']):
if x != 'f' and y != 'f' and x != y:
continue
b2x = 'b2f' if x == 'f' else 'b2i'
b2y = 'b2f' if y == 'f' else 'b2i'
x2yN = '{}2{}'.format(x, y)
optimizations.append(((x2yN, (b2x, a)), (b2y, a)))
# Optimize away x2xN(a@N)
for t in ['int', 'uint', 'float']:
for N in type_sizes(t):
x2xN = '{0}2{0}{1}'.format(t[0], N)
aN = 'a@{0}'.format(N)
optimizations.append(((x2xN, aN), a))
# Optimize x2xN(y2yM(a@P)) -> y2yN(a) for integers
# In particular, we can optimize away everything except upcast of downcast and
# upcasts where the type differs from the other cast
for N, M in itertools.product(type_sizes('uint'), type_sizes('uint')):
if N < M:
# The outer cast is a down-cast. It doesn't matter what the size of the
# argument of the inner cast is because we'll never been in the upcast
# of downcast case. Regardless of types, we'll always end up with y2yN
# in the end.
for x, y in itertools.product(['i', 'u'], ['i', 'u']):
x2xN = '{0}2{0}{1}'.format(x, N)
y2yM = '{0}2{0}{1}'.format(y, M)
y2yN = '{0}2{0}{1}'.format(y, N)
optimizations.append(((x2xN, (y2yM, a)), (y2yN, a)))
elif N > M:
# If the outer cast is an up-cast, we have to be more careful about the
# size of the argument of the inner cast and with types. In this case,
# the type is always the type of type up-cast which is given by the
# outer cast.
for P in type_sizes('uint'):
# We can't optimize away up-cast of down-cast.
if M < P:
continue
# Because we're doing down-cast of down-cast, the types always have
# to match between the two casts
for x in ['i', 'u']:
x2xN = '{0}2{0}{1}'.format(x, N)
x2xM = '{0}2{0}{1}'.format(x, M)
aP = 'a@{0}'.format(P)
optimizations.append(((x2xN, (x2xM, aP)), (x2xN, a)))
else:
# The N == M case is handled by other optimizations
pass
# Optimize comparisons with up-casts
for t in ['int', 'uint', 'float']:
for N, M in itertools.product(type_sizes(t), repeat=2):
if N == 1 or N >= M:
continue
x2xM = '{0}2{0}{1}'.format(t[0], M)
x2xN = '{0}2{0}{1}'.format(t[0], N)
aN = 'a@' + str(N)
bN = 'b@' + str(N)
xeq = 'feq' if t == 'float' else 'ieq'
xne = 'fne' if t == 'float' else 'ine'
xge = '{0}ge'.format(t[0])
xlt = '{0}lt'.format(t[0])
# Up-casts are lossless so for correctly signed comparisons of
# up-casted values we can do the comparison at the largest of the two
# original sizes and drop one or both of the casts. (We have
# optimizations to drop the no-op casts which this may generate.)
for P in type_sizes(t):
if P == 1 or P > N:
continue
bP = 'b@' + str(P)
optimizations += [
((xeq, (x2xM, aN), (x2xM, bP)), (xeq, a, (x2xN, b))),
((xne, (x2xM, aN), (x2xM, bP)), (xne, a, (x2xN, b))),
((xge, (x2xM, aN), (x2xM, bP)), (xge, a, (x2xN, b))),
((xlt, (x2xM, aN), (x2xM, bP)), (xlt, a, (x2xN, b))),
((xge, (x2xM, bP), (x2xM, aN)), (xge, (x2xN, b), a)),
((xlt, (x2xM, bP), (x2xM, aN)), (xlt, (x2xN, b), a)),
]
# The next bit doesn't work on floats because the range checks would
# get way too complicated.
if t in ['int', 'uint']:
if t == 'int':
xN_min = -(1 << (N - 1))
xN_max = (1 << (N - 1)) - 1
elif t == 'uint':
xN_min = 0
xN_max = (1 << N) - 1
else:
assert False
# If we're up-casting and comparing to a constant, we can unfold
# the comparison into a comparison with the shrunk down constant
# and a check that the constant fits in the smaller bit size.
optimizations += [
((xeq, (x2xM, aN), '#b'),
('iand', (xeq, a, (x2xN, b)), (xeq, (x2xM, (x2xN, b)), b))),
((xne, (x2xM, aN), '#b'),
('ior', (xne, a, (x2xN, b)), (xne, (x2xM, (x2xN, b)), b))),
((xlt, (x2xM, aN), '#b'),
('iand', (xlt, xN_min, b),
('ior', (xlt, xN_max, b), (xlt, a, (x2xN, b))))),
((xlt, '#a', (x2xM, bN)),
('iand', (xlt, a, xN_max),
('ior', (xlt, a, xN_min), (xlt, (x2xN, a), b)))),
((xge, (x2xM, aN), '#b'),
('iand', (xge, xN_max, b),
('ior', (xge, xN_min, b), (xge, a, (x2xN, b))))),
((xge, '#a', (x2xM, bN)),
('iand', (xge, a, xN_min),
('ior', (xge, a, xN_max), (xge, (x2xN, a), b)))),
]
def fexp2i(exp, bits):
# Generate an expression which constructs value 2.0^exp or 0.0.
#
# We assume that exp is already in a valid range:
#
# * [-15, 15] for 16-bit float
# * [-127, 127] for 32-bit float
# * [-1023, 1023] for 16-bit float
#
# If exp is the lowest value in the valid range, a value of 0.0 is
# constructed. Otherwise, the value 2.0^exp is constructed.
if bits == 16:
return ('i2i16', ('ishl', ('iadd', exp, 15), 10))
elif bits == 32:
return ('ishl', ('iadd', exp, 127), 23)
elif bits == 64:
return ('pack_64_2x32_split', 0, ('ishl', ('iadd', exp, 1023), 20))
else:
assert False
def ldexp(f, exp, bits):
# The maximum possible range for a normal exponent is [-126, 127] and,
# throwing in denormals, you get a maximum range of [-149, 127]. This
# means that we can potentially have a swing of +-276. If you start with
# FLT_MAX, you actually have to do ldexp(FLT_MAX, -278) to get it to flush
# all the way to zero. The GLSL spec only requires that we handle a subset
# of this range. From version 4.60 of the spec:
#
# "If exp is greater than +128 (single-precision) or +1024
# (double-precision), the value returned is undefined. If exp is less
# than -126 (single-precision) or -1022 (double-precision), the value
# returned may be flushed to zero. Additionally, splitting the value
# into a significand and exponent using frexp() and then reconstructing
# a floating-point value using ldexp() should yield the original input
# for zero and all finite non-denormalized values."
#
# The SPIR-V spec has similar language.
#
# In order to handle the maximum value +128 using the fexp2i() helper
# above, we have to split the exponent in half and do two multiply
# operations.
#
# First, we clamp exp to a reasonable range. Specifically, we clamp to
# twice the full range that is valid for the fexp2i() function above. If
# exp/2 is the bottom value of that range, the fexp2i() expression will
# yield 0.0f which, when multiplied by f, will flush it to zero which is
# allowed by the GLSL and SPIR-V specs for low exponent values. If the
# value is clamped from above, then it must have been above the supported
# range of the GLSL built-in and therefore any return value is acceptable.
if bits == 16:
exp = ('imin', ('imax', exp, -30), 30)
elif bits == 32:
exp = ('imin', ('imax', exp, -254), 254)
elif bits == 64:
exp = ('imin', ('imax', exp, -2046), 2046)
else:
assert False
# Now we compute two powers of 2, one for exp/2 and one for exp-exp/2.
# (We use ishr which isn't the same for -1, but the -1 case still works
# since we use exp-exp/2 as the second exponent.) While the spec
# technically defines ldexp as f * 2.0^exp, simply multiplying once doesn't
# work with denormals and doesn't allow for the full swing in exponents
# that you can get with normalized values. Instead, we create two powers
# of two and multiply by them each in turn. That way the effective range
# of our exponent is doubled.
pow2_1 = fexp2i(('ishr', exp, 1), bits)
pow2_2 = fexp2i(('isub', exp, ('ishr', exp, 1)), bits)
return ('fmul', ('fmul', f, pow2_1), pow2_2)
optimizations += [
(('ldexp@16', 'x', 'exp'), ldexp('x', 'exp', 16), 'options->lower_ldexp'),
(('ldexp@32', 'x', 'exp'), ldexp('x', 'exp', 32), 'options->lower_ldexp'),
(('ldexp@64', 'x', 'exp'), ldexp('x', 'exp', 64), 'options->lower_ldexp'),
]
# Unreal Engine 4 demo applications open-codes bitfieldReverse()
def bitfield_reverse(u):
step1 = ('ior', ('ishl', u, 16), ('ushr', u, 16))
step2 = ('ior', ('ishl', ('iand', step1, 0x00ff00ff), 8), ('ushr', ('iand', step1, 0xff00ff00), 8))
step3 = ('ior', ('ishl', ('iand', step2, 0x0f0f0f0f), 4), ('ushr', ('iand', step2, 0xf0f0f0f0), 4))
step4 = ('ior', ('ishl', ('iand', step3, 0x33333333), 2), ('ushr', ('iand', step3, 0xcccccccc), 2))
step5 = ('ior(many-comm-expr)', ('ishl', ('iand', step4, 0x55555555), 1), ('ushr', ('iand', step4, 0xaaaaaaaa), 1))
return step5
optimizations += [(bitfield_reverse('x@32'), ('bitfield_reverse', 'x'), '!options->lower_bitfield_reverse')]
# For any float comparison operation, "cmp", if you have "a == a && a cmp b"
# then the "a == a" is redundant because it's equivalent to "a is not NaN"
# and, if a is a NaN then the second comparison will fail anyway.
for op in ['flt', 'fge', 'feq']:
optimizations += [
2019-10-24 22:41:20 +01:00
(('iand', ('feq', a, a), (op, a, b)), ('!' + op, a, b)),
(('iand', ('feq', a, a), (op, b, a)), ('!' + op, b, a)),
]
# Add optimizations to handle the case where the result of a ternary is
# compared to a constant. This way we can take things like
#
# (a ? 0 : 1) > 0
#
# and turn it into
#
# a ? (0 > 0) : (1 > 0)
#
# which constant folding will eat for lunch. The resulting ternary will
# further get cleaned up by the boolean reductions above and we will be
# left with just the original variable "a".
for op in ['flt', 'fge', 'feq', 'fne',
'ilt', 'ige', 'ieq', 'ine', 'ult', 'uge']:
optimizations += [
((op, ('bcsel', 'a', '#b', '#c'), '#d'),
('bcsel', 'a', (op, 'b', 'd'), (op, 'c', 'd'))),
((op, '#d', ('bcsel', a, '#b', '#c')),
('bcsel', 'a', (op, 'd', 'b'), (op, 'd', 'c'))),
]
nir: Distribute binary operations with constants into bcsel This was specifically designed to simplify 1+mix(0, a-1, condition) to mix(1, a, condition) by pushing the 1+ inside. Skylake, Broadwell, and Haswell had similar results. Skylake shown. total instructions in shared programs: 14521753 -> 14521716 (<.01%) instructions in affected programs: 10619 -> 10582 (-0.35%) helped: 51 HURT: 14 helped stats (abs) min: 1 max: 12 x̄: 1.43 x̃: 1 helped stats (rel) min: 0.20% max: 3.58% x̄: 1.01% x̃: 0.95% HURT stats (abs) min: 1 max: 11 x̄: 2.57 x̃: 1 HURT stats (rel) min: 0.22% max: 1.75% x̄: 1.20% x̃: 1.32% 95% mean confidence interval for instructions value: -1.31 0.17 95% mean confidence interval for instructions %-change: -0.80% -0.27% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 533000205 -> 533003533 (<.01%) cycles in affected programs: 110610 -> 113938 (3.01%) helped: 43 HURT: 28 helped stats (abs) min: 6 max: 440 x̄: 27.12 x̃: 16 helped stats (rel) min: 0.39% max: 4.84% x̄: 1.60% x̃: 1.67% HURT stats (abs) min: 2 max: 3066 x̄: 160.50 x̃: 14 HURT stats (rel) min: 0.08% max: 77.78% x̄: 5.16% x̃: 0.62% 95% mean confidence interval for cycles value: -43.81 137.56 95% mean confidence interval for cycles %-change: -1.47% 3.60% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 10018840 -> 10018713 (<.01%) instructions in affected programs: 9431 -> 9304 (-1.35%) helped: 51 HURT: 3 helped stats (abs) min: 1 max: 80 x̄: 2.76 x̃: 1 helped stats (rel) min: 0.20% max: 16.43% x̄: 1.16% x̃: 0.81% HURT stats (abs) min: 1 max: 12 x̄: 4.67 x̃: 1 HURT stats (rel) min: 0.22% max: 1.33% x̄: 0.59% x̃: 0.22% 95% mean confidence interval for instructions value: -5.36 0.66 95% mean confidence interval for instructions %-change: -1.66% -0.46% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 87571944 -> 87572785 (<.01%) cycles in affected programs: 117234 -> 118075 (0.72%) helped: 42 HURT: 23 helped stats (abs) min: 2 max: 114 x̄: 51.90 x̃: 30 helped stats (rel) min: 0.11% max: 11.01% x̄: 4.45% x̃: 2.74% HURT stats (abs) min: 1 max: 2341 x̄: 131.35 x̃: 10 HURT stats (rel) min: 0.06% max: 37.11% x̄: 2.75% x̃: 0.61% 95% mean confidence interval for cycles value: -61.05 86.93 95% mean confidence interval for cycles %-change: -3.47% -0.33% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10542933 -> 10542844 (<.01%) instructions in affected programs: 11487 -> 11398 (-0.77%) helped: 52 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 1.96 x̃: 1 helped stats (rel) min: 0.08% max: 8.16% x̄: 0.90% x̃: 0.72% HURT stats (abs) min: 1 max: 11 x̄: 4.33 x̃: 1 HURT stats (rel) min: 0.22% max: 1.22% x̄: 0.55% x̃: 0.22% 95% mean confidence interval for instructions value: -3.17 -0.07 95% mean confidence interval for instructions %-change: -1.13% -0.52% Instructions are helped. total cycles in shared programs: 146098397 -> 146097094 (<.01%) cycles in affected programs: 128140 -> 126837 (-1.02%) helped: 47 HURT: 8 helped stats (abs) min: 2 max: 333 x̄: 29.21 x̃: 18 helped stats (rel) min: 0.13% max: 5.04% x̄: 1.18% x̃: 0.95% HURT stats (abs) min: 1 max: 16 x̄: 8.75 x̃: 9 HURT stats (rel) min: 0.08% max: 0.43% x̄: 0.30% x̃: 0.34% 95% mean confidence interval for cycles value: -37.49 -9.90 95% mean confidence interval for cycles %-change: -1.22% -0.71% Cycles are helped. Iron Lake total instructions in shared programs: 7886711 -> 7886509 (<.01%) instructions in affected programs: 10425 -> 10223 (-1.94%) helped: 50 HURT: 2 helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1 helped stats (rel) min: 0.34% max: 15.38% x̄: 1.12% x̃: 0.54% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.86% max: 0.91% x̄: 0.89% x̃: 0.89% 95% mean confidence interval for instructions value: -8.05 0.28 95% mean confidence interval for instructions %-change: -1.83% -0.26% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 178115324 -> 178114612 (<.01%) cycles in affected programs: 765726 -> 765014 (-0.09%) helped: 39 HURT: 1 helped stats (abs) min: 2 max: 276 x̄: 18.31 x̃: 8 helped stats (rel) min: <.01% max: 8.47% x̄: 0.39% x̃: 0.04% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -32.07 -3.53 95% mean confidence interval for cycles %-change: -0.86% 0.10% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 4857762 -> 4857661 (<.01%) instructions in affected programs: 5523 -> 5422 (-1.83%) helped: 25 HURT: 1 helped stats (abs) min: 1 max: 78 x̄: 4.08 x̃: 1 helped stats (rel) min: 0.34% max: 13.61% x̄: 1.04% x̃: 0.52% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.86% max: 0.86% x̄: 0.86% x̃: 0.86% 95% mean confidence interval for instructions value: -9.99 2.22 95% mean confidence interval for instructions %-change: -2.01% 0.08% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 122179674 -> 122179194 (<.01%) cycles in affected programs: 530162 -> 529682 (-0.09%) helped: 22 HURT: 1 helped stats (abs) min: 2 max: 292 x̄: 21.91 x̃: 7 helped stats (rel) min: <.01% max: 8.65% x̄: 0.44% x̃: 0.04% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -46.56 4.82 95% mean confidence interval for cycles %-change: -1.20% 0.36% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2016-03-02 23:39:09 +00:00
# For example, this converts things like
#
# 1 + mix(0, a - 1, condition)
#
# into
#
# mix(1, (a-1)+1, condition)
#
# Other optimizations will rearrange the constants.
for op in ['fadd', 'fmul', 'iadd', 'imul']:
optimizations += [
((op, ('bcsel(is_used_once)', a, '#b', c), '#d'), ('bcsel', a, (op, b, d), (op, c, d)))
]
# For derivatives in compute shaders, GLSL_NV_compute_shader_derivatives
# states:
#
# If neither layout qualifier is specified, derivatives in compute shaders
# return zero, which is consistent with the handling of built-in texture
# functions like texture() in GLSL 4.50 compute shaders.
for op in ['fddx', 'fddx_fine', 'fddx_coarse',
'fddy', 'fddy_fine', 'fddy_coarse']:
optimizations += [
((op, 'a'), 0.0, 'info->stage == MESA_SHADER_COMPUTE && info->cs.derivative_group == DERIVATIVE_GROUP_NONE')
]
# Some optimizations for ir3-specific instructions.
optimizations += [
# 'al * bl': If either 'al' or 'bl' is zero, return zero.
(('umul_low', '#a(is_lower_half_zero)', 'b'), (0)),
# '(ah * bl) << 16 + c': If either 'ah' or 'bl' is zero, return 'c'.
(('imadsh_mix16', '#a@32(is_lower_half_zero)', 'b@32', 'c@32'), ('c')),
(('imadsh_mix16', 'a@32', '#b@32(is_upper_half_zero)', 'c@32'), ('c')),
]
# These kinds of sequences can occur after nir_opt_peephole_select.
#
# NOTE: fadd is not handled here because that gets in the way of ffma
# generation in the i965 driver. Instead, fadd and ffma are handled in
# late_optimizations.
for op in ['flrp']:
optimizations += [
(('bcsel', a, (op + '(is_used_once)', b, c, d), (op, b, c, e)), (op, b, c, ('bcsel', a, d, e))),
(('bcsel', a, (op, b, c, d), (op + '(is_used_once)', b, c, e)), (op, b, c, ('bcsel', a, d, e))),
(('bcsel', a, (op + '(is_used_once)', b, c, d), (op, b, e, d)), (op, b, ('bcsel', a, c, e), d)),
(('bcsel', a, (op, b, c, d), (op + '(is_used_once)', b, e, d)), (op, b, ('bcsel', a, c, e), d)),
(('bcsel', a, (op + '(is_used_once)', b, c, d), (op, e, c, d)), (op, ('bcsel', a, b, e), c, d)),
(('bcsel', a, (op, b, c, d), (op + '(is_used_once)', e, c, d)), (op, ('bcsel', a, b, e), c, d)),
]
for op in ['fmul', 'iadd', 'imul', 'iand', 'ior', 'ixor', 'fmin', 'fmax', 'imin', 'imax', 'umin', 'umax']:
optimizations += [
(('bcsel', a, (op + '(is_used_once)', b, c), (op, b, 'd(is_not_const)')), (op, b, ('bcsel', a, c, d))),
(('bcsel', a, (op + '(is_used_once)', b, 'c(is_not_const)'), (op, b, d)), (op, b, ('bcsel', a, c, d))),
(('bcsel', a, (op, b, 'c(is_not_const)'), (op + '(is_used_once)', b, d)), (op, b, ('bcsel', a, c, d))),
(('bcsel', a, (op, b, c), (op + '(is_used_once)', b, 'd(is_not_const)')), (op, b, ('bcsel', a, c, d))),
]
for op in ['fpow']:
optimizations += [
(('bcsel', a, (op + '(is_used_once)', b, c), (op, b, d)), (op, b, ('bcsel', a, c, d))),
(('bcsel', a, (op, b, c), (op + '(is_used_once)', b, d)), (op, b, ('bcsel', a, c, d))),
(('bcsel', a, (op + '(is_used_once)', b, c), (op, d, c)), (op, ('bcsel', a, b, d), c)),
(('bcsel', a, (op, b, c), (op + '(is_used_once)', d, c)), (op, ('bcsel', a, b, d), c)),
]
for op in ['frcp', 'frsq', 'fsqrt', 'fexp2', 'flog2', 'fsign', 'fsin', 'fcos']:
optimizations += [
(('bcsel', a, (op + '(is_used_once)', b), (op, c)), (op, ('bcsel', a, b, c))),
(('bcsel', a, (op, b), (op + '(is_used_once)', c)), (op, ('bcsel', a, b, c))),
]
# This section contains "late" optimizations that should be run before
# creating ffmas and calling regular optimizations for the final time.
# Optimizations should go here if they help code generation and conflict
# with the regular optimizations.
before_ffma_optimizations = [
# Propagate constants down multiplication chains
(('~fmul(is_used_once)', ('fmul(is_used_once)', 'a(is_not_const)', '#b'), 'c(is_not_const)'), ('fmul', ('fmul', a, c), b)),
(('imul(is_used_once)', ('imul(is_used_once)', 'a(is_not_const)', '#b'), 'c(is_not_const)'), ('imul', ('imul', a, c), b)),
(('~fadd(is_used_once)', ('fadd(is_used_once)', 'a(is_not_const)', '#b'), 'c(is_not_const)'), ('fadd', ('fadd', a, c), b)),
(('iadd(is_used_once)', ('iadd(is_used_once)', 'a(is_not_const)', '#b'), 'c(is_not_const)'), ('iadd', ('iadd', a, c), b)),
(('~fadd', ('fmul', a, b), ('fmul', a, c)), ('fmul', a, ('fadd', b, c))),
(('iadd', ('imul', a, b), ('imul', a, c)), ('imul', a, ('iadd', b, c))),
(('~fadd', ('fneg', a), a), 0.0),
(('iadd', ('ineg', a), a), 0),
(('iadd', ('ineg', a), ('iadd', a, b)), b),
(('iadd', a, ('iadd', ('ineg', a), b)), b),
(('~fadd', ('fneg', a), ('fadd', a, b)), b),
(('~fadd', a, ('fadd', ('fneg', a), b)), b),
nir/algebraic: Pull common addition out of flrp arguments v2: Augment the late optimization patterns with a couple pre-ffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 -> 15342485 (<.01%) instructions in affected programs: 56304 -> 55807 (-0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: -2.31 -1.92 95% mean confidence interval for instructions %-change: -1.46% -1.09% Instructions are helped. total cycles in shared programs: 355734740 -> 355734320 (<.01%) cycles in affected programs: 1028807 -> 1028387 (-0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: -8.51 4.98 95% mean confidence interval for cycles %-change: -0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 -> 10886390 (<.01%) instructions in affected programs: 36883 -> 36458 (-1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: -3.12 -2.67 95% mean confidence interval for instructions %-change: -1.83% -1.38% Instructions are helped. total cycles in shared programs: 154188360 -> 154186902 (<.01%) cycles in affected programs: 388094 -> 386636 (-0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: -22.62 2.92 95% mean confidence interval for cycles %-change: -0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 -> 8220357 (-0.01%) instructions in affected programs: 54560 -> 53678 (-1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: -5.21 -4.28 95% mean confidence interval for instructions %-change: -2.23% -1.72% Instructions are helped. total cycles in shared programs: 188654442 -> 188650364 (<.01%) cycles in affected programs: 1454384 -> 1450306 (-0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: -22.38 -17.60 95% mean confidence interval for cycles %-change: -0.67% -0.46% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-23 03:15:15 +01:00
(('~flrp@32', ('fadd(is_used_once)', a, -1.0), ('fadd(is_used_once)', a, 1.0), d), ('fadd', ('flrp', -1.0, 1.0, d), a)),
(('~flrp@32', ('fadd(is_used_once)', a, 1.0), ('fadd(is_used_once)', a, -1.0), d), ('fadd', ('flrp', 1.0, -1.0, d), a)),
(('~flrp@32', ('fadd(is_used_once)', a, '#b'), ('fadd(is_used_once)', a, '#c'), d), ('fadd', ('fmul', d, ('fadd', c, ('fneg', b))), ('fadd', a, b))),
]
# This section contains "late" optimizations that should be run after the
# regular optimizations have finished. Optimizations should go here if
# they help code generation but do not necessarily produce code that is
# more easily optimizable.
late_optimizations = [
# Most of these optimizations aren't quite safe when you get infinity or
# Nan involved but the first one should be fine.
nir: See through an fneg to apply existing optimizations Doing the same for the existing feq and fne transformations didn't help anything in shader-db. shader-db results: Broadwell and Skylake (Skylake shown) total instructions in shared programs: 14529463 -> 14526147 (-0.02%) instructions in affected programs: 402420 -> 399104 (-0.82%) helped: 2136 HURT: 131 helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1 helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57% 95% mean confidence interval for instructions value: -1.51 -1.41 95% mean confidence interval for instructions %-change: -3.06% -2.78% Instructions are helped. total cycles in shared programs: 533146915 -> 533120531 (<.01%) cycles in affected programs: 10356261 -> 10329877 (-0.25%) helped: 1933 HURT: 844 helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16 helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88% HURT stats (abs) min: 1 max: 423 x̄: 36.17 x̃: 12 HURT stats (rel) min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59% 95% mean confidence interval for cycles value: -11.78 -7.22 95% mean confidence interval for cycles %-change: -1.98% -1.65% Cycles are helped. Haswell total instructions in shared programs: 9037416 -> 9034106 (-0.04%) instructions in affected programs: 389831 -> 386521 (-0.85%) helped: 2184 HURT: 120 helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57% 95% mean confidence interval for instructions value: -1.49 -1.39 95% mean confidence interval for instructions %-change: -2.68% -2.41% Instructions are helped. total cycles in shared programs: 84636243 -> 84631628 (<.01%) cycles in affected programs: 4745058 -> 4740443 (-0.10%) helped: 1904 HURT: 960 helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18 helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38% HURT stats (abs) min: 1 max: 1080 x̄: 55.11 x̃: 14 HURT stats (rel) min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81% 95% mean confidence interval for cycles value: -4.51 1.29 95% mean confidence interval for cycles %-change: -1.64% -1.25% Inconclusive result (value mean confidence interval includes 0). LOST: 1 GAINED: 0 Sandy Bridge and Ivy Bridge (Ivy Bridge shown) total instructions in shared programs: 10018873 -> 10015456 (-0.03%) instructions in affected programs: 512820 -> 509403 (-0.67%) helped: 2268 HURT: 162 helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88% HURT stats (abs) min: 1 max: 4 x̄: 1.59 x̃: 1 HURT stats (rel) min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50% 95% mean confidence interval for instructions value: -1.46 -1.35 95% mean confidence interval for instructions %-change: -2.38% -2.12% Instructions are helped. total cycles in shared programs: 87538223 -> 87524771 (-0.02%) cycles in affected programs: 5435520 -> 5422068 (-0.25%) helped: 1916 HURT: 946 helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18 helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97% HURT stats (abs) min: 1 max: 633 x̄: 45.41 x̃: 11 HURT stats (rel) min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62% 95% mean confidence interval for cycles value: -7.34 -2.06 95% mean confidence interval for cycles %-change: -1.62% -1.26% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 7888446 -> 7886959 (-0.02%) instructions in affected programs: 331581 -> 330094 (-0.45%) helped: 1160 HURT: 97 helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1 helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25% 95% mean confidence interval for instructions value: -1.25 -1.12 95% mean confidence interval for instructions %-change: -0.91% -0.75% Instructions are helped. total cycles in shared programs: 178130766 -> 178116996 (<.01%) cycles in affected programs: 12534564 -> 12520794 (-0.11%) helped: 1856 HURT: 187 helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4 helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11% HURT stats (abs) min: 2 max: 26 x̄: 3.55 x̃: 2 HURT stats (rel) min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02% 95% mean confidence interval for cycles value: -7.41 -6.07 95% mean confidence interval for cycles %-change: -0.28% -0.22% Cycles are helped. GM45 total instructions in shared programs: 4858912 -> 4857887 (-0.02%) instructions in affected programs: 237565 -> 236540 (-0.43%) helped: 867 HURT: 57 helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1 helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22% 95% mean confidence interval for instructions value: -1.18 -1.04 95% mean confidence interval for instructions %-change: -0.88% -0.71% Instructions are helped. total cycles in shared programs: 122189118 -> 122180816 (<.01%) cycles in affected programs: 8776418 -> 8768116 (-0.09%) helped: 1213 HURT: 166 helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4 helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11% HURT stats (abs) min: 2 max: 26 x̄: 3.35 x̃: 2 HURT stats (rel) min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02% 95% mean confidence interval for cycles value: -6.78 -5.26 95% mean confidence interval for cycles %-change: -0.24% -0.18% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-09 23:32:47 +00:00
(('flt', ('fadd', a, b), 0.0), ('flt', a, ('fneg', b))),
(('flt', ('fneg', ('fadd', a, b)), 0.0), ('flt', ('fneg', a), b)),
(('~fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
(('~fge', ('fneg', ('fadd', a, b)), 0.0), ('fge', ('fneg', a), b)),
(('~feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
(('~fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-03-05 20:08:29 +00:00
# nir_lower_to_source_mods will collapse this, but its existence during the
# optimization loop can prevent other optimizations.
(('fneg', ('fneg', a)), a),
# Subtractions get lowered during optimization, so we need to recombine them
(('fadd', 'a', ('fneg', 'b')), ('fsub', 'a', 'b'), '!options->lower_sub'),
(('iadd', 'a', ('ineg', 'b')), ('isub', 'a', 'b'), '!options->lower_sub'),
(('fneg', a), ('fsub', 0.0, a), 'options->lower_negate'),
(('ineg', a), ('isub', 0, a), 'options->lower_negate'),
# These are duplicated from the main optimizations table. The late
# patterns that rearrange expressions like x - .5 < 0 to x < .5 can create
# new patterns like these. The patterns that compare with zero are removed
# because they are unlikely to be created in by anything in
# late_optimizations.
(('flt', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('flt', a, b)),
(('flt', '#b(is_gt_0_and_lt_1)', ('fsat(is_used_once)', a)), ('flt', b, a)),
(('fge', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('fge', a, b)),
(('fge', '#b(is_gt_0_and_lt_1)', ('fsat(is_used_once)', a)), ('fge', b, a)),
(('feq', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('feq', a, b)),
(('fne', ('fsat(is_used_once)', a), '#b(is_gt_0_and_lt_1)'), ('fne', a, b)),
(('fge', ('fsat(is_used_once)', a), 1.0), ('fge', a, 1.0)),
(('flt', ('fsat(is_used_once)', a), 1.0), ('flt', a, 1.0)),
(('~fge', ('fmin(is_used_once)', ('fadd(is_used_once)', a, b), ('fadd', c, d)), 0.0), ('iand', ('fge', a, ('fneg', b)), ('fge', c, ('fneg', d)))),
(('flt', ('fneg', a), ('fneg', b)), ('flt', b, a)),
(('fge', ('fneg', a), ('fneg', b)), ('fge', b, a)),
(('feq', ('fneg', a), ('fneg', b)), ('feq', b, a)),
(('fne', ('fneg', a), ('fneg', b)), ('fne', b, a)),
(('flt', ('fneg', a), -1.0), ('flt', 1.0, a)),
(('flt', -1.0, ('fneg', a)), ('flt', a, 1.0)),
(('fge', ('fneg', a), -1.0), ('fge', 1.0, a)),
(('fge', -1.0, ('fneg', a)), ('fge', a, 1.0)),
(('fne', ('fneg', a), -1.0), ('fne', 1.0, a)),
(('feq', -1.0, ('fneg', a)), ('feq', a, 1.0)),
(('ior', a, a), a),
(('iand', a, a), a),
nir/algebraic: optimize ior(ine(a, 0), ine(b, 0)) to ine(ior(a, b), 0) Like 70f9e2589e6b. Also scrub the unnecessary size qualifier in both replacement patterns. This occurs in a handful of places in the soft-fp64 code, and that is the primary reason for the change. Perhaps the patterns that generate umin should be conditioned on something, but I'm not sure what. lower_bitops might cover the cases that matter, but it seems ugly. Results on the 308 shaders extracted from the fp64 portion of the OpenGL CTS: Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) total instructions in shared programs: 936505 -> 933388 (-0.33%) instructions in affected programs: 925719 -> 922602 (-0.34%) helped: 154 HURT: 1 helped stats (abs) min: 1 max: 211 x̄: 35.45 x̃: 16 helped stats (rel) min: 0.34% max: 9.30% x̄: 2.28% x̃: 0.96% HURT stats (abs) min: 2342 max: 2342 x̄: 2342.00 x̃: 2342 HURT stats (rel) min: 2.28% max: 2.28% x̄: 2.28% x̃: 2.28% 95% mean confidence interval for instructions value: -51.21 10.99 95% mean confidence interval for instructions %-change: -2.61% -1.89% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 7323502 -> 7306184 (-0.24%) cycles in affected programs: 7220376 -> 7203058 (-0.24%) helped: 126 HURT: 1 helped stats (abs) min: 2 max: 946 x̄: 159.10 x̃: 95 helped stats (rel) min: 0.01% max: 9.62% x̄: 0.80% x̃: 0.37% HURT stats (abs) min: 2728 max: 2728 x̄: 2728.00 x̃: 2728 HURT stats (rel) min: 0.37% max: 0.37% x̄: 0.37% x̃: 0.37% 95% mean confidence interval for cycles value: -192.07 -80.66 95% mean confidence interval for cycles %-change: -1.07% -0.51% Cycles are helped. total spills in shared programs: 635 -> 817 (28.66%) spills in affected programs: 635 -> 817 (28.66%) helped: 0 HURT: 3 total fills in shared programs: 2065 -> 2438 (18.06%) fills in affected programs: 2019 -> 2392 (18.47%) helped: 0 HURT: 2 Regular shader-db results: All Haswell+ platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 17611506 -> 17611489 (<.01%) instructions in affected programs: 33442 -> 33425 (-0.05%) helped: 32 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 1.69 x̃: 1 helped stats (rel) min: 0.08% max: 1.90% x̄: 0.27% x̃: 0.11% HURT stats (abs) min: 1 max: 15 x̄: 6.17 x̃: 5 HURT stats (rel) min: 0.09% max: 1.50% x̄: 0.65% x̃: 0.55% 95% mean confidence interval for instructions value: -1.70 0.80 95% mean confidence interval for instructions %-change: -0.30% 0.05% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 338419218 -> 338418502 (<.01%) cycles in affected programs: 385795 -> 385079 (-0.19%) helped: 42 HURT: 3 helped stats (abs) min: 2 max: 192 x̄: 24.57 x̃: 16 helped stats (rel) min: 0.04% max: 2.09% x̄: 0.33% x̃: 0.22% HURT stats (abs) min: 64 max: 164 x̄: 105.33 x̃: 88 HURT stats (rel) min: 0.77% max: 1.58% x̄: 1.09% x̃: 0.93% 95% mean confidence interval for cycles value: -29.76 -2.06 95% mean confidence interval for cycles %-change: -0.40% -0.07% Cycles are helped. Ivy Bridge and Sandy Bridge had similar results. (Ivy Bridge shown) total instructions in shared programs: 11875620 -> 11875617 (<.01%) instructions in affected programs: 421 -> 418 (-0.71%) helped: 2 HURT: 0 total cycles in shared programs: 178245336 -> 178245326 (<.01%) cycles in affected programs: 3425 -> 3415 (-0.29%) helped: 2 HURT: 0 No changes on Gen4 or Gen5. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2019-10-31 00:41:41 +00:00
(('iand', ('ine(is_used_once)', 'a@32', 0), ('ine', 'b@32', 0)), ('ine', ('umin', a, b), 0)),
(('ior', ('ieq(is_used_once)', 'a@32', 0), ('ieq', 'b@32', 0)), ('ieq', ('umin', a, b), 0)),
nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-03-28 06:57:07 +01:00
(('~fadd', ('fneg(is_used_once)', ('fsat(is_used_once)', 'a(is_not_fmul)')), 1.0), ('fsat', ('fadd', 1.0, ('fneg', a)))),
(('fdot2', a, b), ('fdot_replicated2', a, b), 'options->fdot_replicates'),
(('fdot3', a, b), ('fdot_replicated3', a, b), 'options->fdot_replicates'),
(('fdot4', a, b), ('fdot_replicated4', a, b), 'options->fdot_replicates'),
(('fdph', a, b), ('fdph_replicated', a, b), 'options->fdot_replicates'),
nir/algebraic: Pull common addition out of flrp arguments v2: Augment the late optimization patterns with a couple pre-ffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 -> 15342485 (<.01%) instructions in affected programs: 56304 -> 55807 (-0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: -2.31 -1.92 95% mean confidence interval for instructions %-change: -1.46% -1.09% Instructions are helped. total cycles in shared programs: 355734740 -> 355734320 (<.01%) cycles in affected programs: 1028807 -> 1028387 (-0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: -8.51 4.98 95% mean confidence interval for cycles %-change: -0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 -> 10886390 (<.01%) instructions in affected programs: 36883 -> 36458 (-1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: -3.12 -2.67 95% mean confidence interval for instructions %-change: -1.83% -1.38% Instructions are helped. total cycles in shared programs: 154188360 -> 154186902 (<.01%) cycles in affected programs: 388094 -> 386636 (-0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: -22.62 2.92 95% mean confidence interval for cycles %-change: -0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 -> 8220357 (-0.01%) instructions in affected programs: 54560 -> 53678 (-1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: -5.21 -4.28 95% mean confidence interval for instructions %-change: -2.23% -1.72% Instructions are helped. total cycles in shared programs: 188654442 -> 188650364 (<.01%) cycles in affected programs: 1454384 -> 1450306 (-0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: -22.38 -17.60 95% mean confidence interval for cycles %-change: -0.67% -0.46% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-23 03:15:15 +01:00
(('~flrp@32', ('fadd(is_used_once)', a, b), ('fadd(is_used_once)', a, c), d), ('fadd', ('flrp', b, c, d), a)),
(('~flrp@64', ('fadd(is_used_once)', a, b), ('fadd(is_used_once)', a, c), d), ('fadd', ('flrp', b, c, d), a)),
nir/algebraic: Reassociate open-coded flrp(1, b, c) In a previous verion of this patch, Jason commented, "Re-associating based on whether or not something has a constant value of 1.0 seems a bit sneaky. I think it's well within the rules but it seems like something that could bite you." That is possibly true. The reassociation will generate different results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as fabs(c) approaches 0. However, i965 has done this same reassociation indirectly for years. We would previously allow nir_op_flrp on all pre-Gen11 hardware even though Gen4 and Gen5 do not have a LRP instruction. Optimizations in nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a, b, c). On Gen7+, the hardware performs the same arithmetic as a(1-c)+bc. Gen6 seems to implement LRP as a+c(b-a). On Gen4 and Gen5, we would lower LRP to a sequence of instructions that implement a(1-c)+bc. The lowering happens after all constant folding, so we would litterally generate a 1+(-1) instruction sequence in this scenario: one instruction to load either 1 or -1 in a register, and another instruction to add either -1 or 1 to it. This patch just cuts out the middle man. Do the reassociation that we've always done, but do it explicitly at a time when we can benefit from other optimizations. A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently" are restored by this patch. This includes a few shaders in ET:QW. I tried a similar thing for open-coded flrp(-1, b, c), and it hurt instructions on 35 shaders for ILK without helping any. The helped / hurt cycles was about even. No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8172020 -> 8164367 (-0.09%) instructions in affected programs: 1089851 -> 1082198 (-0.70%) helped: 3285 HURT: 64 helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2 helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38% 95% mean confidence interval for instructions value: -2.32 -2.25 95% mean confidence interval for instructions %-change: -1.16% -1.09% Instructions are helped. total cycles in shared programs: 188758338 -> 188719974 (-0.02%) cycles in affected programs: 20004922 -> 19966558 (-0.19%) helped: 3012 HURT: 477 helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12 helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24% HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4 HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11% 95% mean confidence interval for cycles value: -11.38 -10.62 95% mean confidence interval for cycles %-change: -0.46% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-19 20:42:05 +01:00
(('~fadd@32', 1.0, ('fmul(is_used_once)', c , ('fadd', b, -1.0 ))), ('fadd', ('fadd', 1.0, ('fneg', c)), ('fmul', b, c)), 'options->lower_flrp32'),
(('~fadd@64', 1.0, ('fmul(is_used_once)', c , ('fadd', b, -1.0 ))), ('fadd', ('fadd', 1.0, ('fneg', c)), ('fmul', b, c)), 'options->lower_flrp64'),
nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a) v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. v3: Add a couple more patterns that just move the negation around. No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. Skylake total instructions in shared programs: 15279687 -> 15256058 (-0.15%) instructions in affected programs: 4344440 -> 4320811 (-0.54%) helped: 23455 HURT: 18 helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1 HURT stats (rel) min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34% 95% mean confidence interval for instructions value: -1.01 -1.00 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 355593755 -> 355339981 (-0.07%) cycles in affected programs: 162089552 -> 161835778 (-0.16%) helped: 20467 HURT: 7158 helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6 helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58% HURT stats (abs) min: 1 max: 4814 x̄: 47.46 x̃: 11 HURT stats (rel) min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98% 95% mean confidence interval for cycles value: -10.39 -7.98 95% mean confidence interval for cycles %-change: -0.57% -0.47% Cycles are helped. total spills in shared programs: 8843 -> 8835 (-0.09%) spills in affected programs: 190 -> 182 (-4.21%) helped: 2 HURT: 0 total fills in shared programs: 21738 -> 21738 (0.00%) fills in affected programs: 372 -> 372 (0.00%) helped: 1 HURT: 1 LOST: 12 GAINED: 22 Broadwell total instructions in shared programs: 15290523 -> 15266818 (-0.16%) instructions in affected programs: 4314738 -> 4291033 (-0.55%) helped: 23391 HURT: 11 helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 189 x̄: 18.09 x̃: 1 HURT stats (rel) min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50% 95% mean confidence interval for instructions value: -1.04 -0.99 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 388911660 -> 388830827 (-0.02%) cycles in affected programs: 172903324 -> 172822491 (-0.05%) helped: 15601 HURT: 13269 helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6 helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55% HURT stats (abs) min: 1 max: 14904 x̄: 28.21 x̃: 6 HURT stats (rel) min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60% 95% mean confidence interval for cycles value: -4.20 -1.40 95% mean confidence interval for cycles %-change: -0.17% -0.08% Cycles are helped. total spills in shared programs: 23110 -> 23069 (-0.18%) spills in affected programs: 656 -> 615 (-6.25%) helped: 3 HURT: 1 total fills in shared programs: 34399 -> 34398 (<.01%) fills in affected programs: 905 -> 904 (-0.11%) helped: 3 HURT: 1 LOST: 6 GAINED: 23 Haswell total instructions in shared programs: 13465303 -> 13441142 (-0.18%) instructions in affected programs: 3726999 -> 3702838 (-0.65%) helped: 22139 HURT: 347 helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12% 95% mean confidence interval for instructions value: -1.08 -1.07 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 376271308 -> 376273090 (<.01%) cycles in affected programs: 167496811 -> 167498593 (<.01%) helped: 13206 HURT: 13281 helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8 helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80% HURT stats (abs) min: 1 max: 3828 x̄: 35.32 x̃: 8 HURT stats (rel) min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61% 95% mean confidence interval for cycles value: -1.33 1.47 95% mean confidence interval for cycles %-change: 0.22% 0.36% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23158 -> 23134 (-0.10%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 34580 -> 34550 (-0.09%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 23 GAINED: 13 Ivy Bridge total instructions in shared programs: 12034154 -> 12014301 (-0.16%) instructions in affected programs: 3636209 -> 3616356 (-0.55%) helped: 18771 HURT: 459 helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11% 95% mean confidence interval for instructions value: -1.04 -1.02 95% mean confidence interval for instructions %-change: -0.86% -0.84% Instructions are helped. total cycles in shared programs: 180186960 -> 180175147 (<.01%) cycles in affected programs: 44652745 -> 44640932 (-0.03%) helped: 12979 HURT: 11033 helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6 helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74% HURT stats (abs) min: 1 max: 4811 x̄: 37.61 x̃: 9 HURT stats (rel) min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69% 95% mean confidence interval for cycles value: -2.29 1.31 95% mean confidence interval for cycles %-change: 0.11% 0.26% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3623 -> 3599 (-0.66%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 4061 -> 4031 (-0.74%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 17 GAINED: 18 Sandy Bridge total instructions in shared programs: 10853968 -> 10834932 (-0.18%) instructions in affected programs: 3769957 -> 3750921 (-0.50%) helped: 17944 HURT: 204 helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93% 95% mean confidence interval for instructions value: -1.05 -1.04 95% mean confidence interval for instructions %-change: -0.81% -0.78% Instructions are helped. total cycles in shared programs: 153894864 -> 153885988 (<.01%) cycles in affected programs: 50643925 -> 50635049 (-0.02%) helped: 9361 HURT: 10534 helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4 helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22% HURT stats (abs) min: 1 max: 1371 x̄: 16.42 x̃: 5 HURT stats (rel) min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27% 95% mean confidence interval for cycles value: -1.27 0.38 95% mean confidence interval for cycles %-change: -0.03% 0.04% Inconclusive result (value mean confidence interval includes 0). LOST: 6 GAINED: 24 Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-08-06 21:07:59 +01:00
# A similar operation could apply to any ffma(#a, b, #(-a/2)), but this
# particular operation is common for expanding values stored in a texture
# from [0,1] to [-1,1].
(('~ffma@32', a, 2.0, -1.0), ('flrp', -1.0, 1.0, a ), '!options->lower_flrp32'),
(('~ffma@32', a, -2.0, -1.0), ('flrp', -1.0, 1.0, ('fneg', a)), '!options->lower_flrp32'),
(('~ffma@32', a, -2.0, 1.0), ('flrp', 1.0, -1.0, a ), '!options->lower_flrp32'),
(('~ffma@32', a, 2.0, 1.0), ('flrp', 1.0, -1.0, ('fneg', a)), '!options->lower_flrp32'),
(('~fadd@32', ('fmul(is_used_once)', 2.0, a), -1.0), ('flrp', -1.0, 1.0, a ), '!options->lower_flrp32'),
(('~fadd@32', ('fmul(is_used_once)', -2.0, a), -1.0), ('flrp', -1.0, 1.0, ('fneg', a)), '!options->lower_flrp32'),
(('~fadd@32', ('fmul(is_used_once)', -2.0, a), 1.0), ('flrp', 1.0, -1.0, a ), '!options->lower_flrp32'),
(('~fadd@32', ('fmul(is_used_once)', 2.0, a), 1.0), ('flrp', 1.0, -1.0, ('fneg', a)), '!options->lower_flrp32'),
# flrp(a, b, a)
# a*(1-a) + b*a
# a + -a*a + a*b (1)
# a + a*(b - a)
# Option 1: ffma(a, (b-a), a)
#
# Alternately, after (1):
# a*(1+b) + -a*a
# a*((1+b) + -a)
#
# Let b=1
#
# Option 2: ffma(a, 2, -(a*a))
# Option 3: ffma(a, 2, (-a)*a)
# Option 4: ffma(a, -a, (2*a)
# Option 5: a * (2 - a)
#
# There are a lot of other possible combinations.
(('~ffma@32', ('fadd', b, ('fneg', a)), a, a), ('flrp', a, b, a), '!options->lower_flrp32'),
(('~ffma@32', a, 2.0, ('fneg', ('fmul', a, a))), ('flrp', a, 1.0, a), '!options->lower_flrp32'),
(('~ffma@32', a, 2.0, ('fmul', ('fneg', a), a)), ('flrp', a, 1.0, a), '!options->lower_flrp32'),
(('~ffma@32', a, ('fneg', a), ('fmul', 2.0, a)), ('flrp', a, 1.0, a), '!options->lower_flrp32'),
(('~fmul@32', a, ('fadd', 2.0, ('fneg', a))), ('flrp', a, 1.0, a), '!options->lower_flrp32'),
# we do these late so that we don't get in the way of creating ffmas
(('fmin', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmin', a, b))),
(('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmax', a, b))),
nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b)) I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-12-03 22:41:07 +00:00
2019-03-30 03:51:20 +00:00
(('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', ('ior', a, b)))),
nir/algebraic: Reassociate fadd into fmul in DPH-like pattern Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> 1220190 (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-06 01:23:11 +01:00
nir/algebraic: Simplify some Inf and NaN avoidance code Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.
2019-11-02 02:53:06 +00:00
# Putting this in 'optimizations' interferes with the bcsel(a, op(b, c),
# op(b, d)) => op(b, bcsel(a, c, d)) transformations. I do not know why.
(('bcsel', ('feq', ('fsqrt', 'a(is_not_negative)'), 0.0), intBitsToFloat(0x7f7fffff), ('frsq', a)),
('fmin', ('frsq', a), intBitsToFloat(0x7f7fffff))),
nir/algebraic: Reassociate fadd into fmul in DPH-like pattern Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> 1220190 (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-06 01:23:11 +01:00
# Things that look like DPH in the source shader may get expanded to
# something that looks like dot(v1.xyz, v2.xyz) + v1.w by the time it gets
# to NIR. After FFMA is generated, this can look like:
#
# fadd(ffma(v1.z, v2.z, ffma(v1.y, v2.y, fmul(v1.x, v2.x))), v1.w)
#
# Reassociate the last addition into the first multiplication.
nir/algebraic: Do not apply late DPH optimization in vertex processing stages Some shaders do not use 'invariant' in vertex and (possibly) geometry shader stages on some outputs that are intended to be invariant. For various reasons, this optimization may not be fully applied in all shaders used for different rendering passes of the same geometry. This can result in Z-fighting artifacts (at best). For now, disable this optimization in these stages. In tessellation stages applications seem to use 'precise' when necessary, so allow the optimization in those stages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490 Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16194726 -> 16344745 (0.93%) instructions in affected programs: 2855172 -> 3005191 (5.25%) helped: 6 HURT: 20279 helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44% HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7 HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56% 95% mean confidence interval for instructions value: 7.34 7.45 95% mean confidence interval for instructions %-change: 8.48% 8.67% Instructions are HURT. total cycles in shared programs: 364471296 -> 365014683 (0.15%) cycles in affected programs: 32421530 -> 32964917 (1.68%) helped: 2925 HURT: 16144 helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5 helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15% HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15 HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87% 95% mean confidence interval for cycles value: 21.58 35.41 95% mean confidence interval for cycles %-change: 4.36% 4.52% Cycles are HURT.
2019-08-31 19:40:32 +01:00
#
# Some shaders do not use 'invariant' in vertex and (possibly) geometry
# shader stages on some outputs that are intended to be invariant. For
# various reasons, this optimization may not be fully applied in all
# shaders used for different rendering passes of the same geometry. This
# can result in Z-fighting artifacts (at best). For now, disable this
# optimization in these stages. See bugzilla #111490. In tessellation
# stages applications seem to use 'precise' when necessary, so allow the
# optimization in those stages.
nir/algebraic: Reassociate fadd into fmul in DPH-like pattern Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> 1220190 (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-06 01:23:11 +01:00
(('~fadd', ('ffma(is_used_once)', a, b, ('ffma', c, d, ('fmul', 'e(is_not_const_and_not_fsign)', 'f(is_not_const_and_not_fsign)'))), 'g(is_not_const)'),
nir/algebraic: Do not apply late DPH optimization in vertex processing stages Some shaders do not use 'invariant' in vertex and (possibly) geometry shader stages on some outputs that are intended to be invariant. For various reasons, this optimization may not be fully applied in all shaders used for different rendering passes of the same geometry. This can result in Z-fighting artifacts (at best). For now, disable this optimization in these stages. In tessellation stages applications seem to use 'precise' when necessary, so allow the optimization in those stages. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111490 Fixes: 09705747d72 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16194726 -> 16344745 (0.93%) instructions in affected programs: 2855172 -> 3005191 (5.25%) helped: 6 HURT: 20279 helped stats (abs) min: 1 max: 3 x̄: 1.33 x̃: 1 helped stats (rel) min: 0.44% max: 1.00% x̄: 0.54% x̃: 0.44% HURT stats (abs) min: 1 max: 32 x̄: 7.40 x̃: 7 HURT stats (rel) min: 0.14% max: 42.86% x̄: 8.58% x̃: 6.56% 95% mean confidence interval for instructions value: 7.34 7.45 95% mean confidence interval for instructions %-change: 8.48% 8.67% Instructions are HURT. total cycles in shared programs: 364471296 -> 365014683 (0.15%) cycles in affected programs: 32421530 -> 32964917 (1.68%) helped: 2925 HURT: 16144 helped stats (abs) min: 1 max: 403 x̄: 18.39 x̃: 5 helped stats (rel) min: <.01% max: 22.61% x̄: 1.97% x̃: 1.15% HURT stats (abs) min: 1 max: 18471 x̄: 36.99 x̃: 15 HURT stats (rel) min: 0.02% max: 52.58% x̄: 5.60% x̃: 3.87% 95% mean confidence interval for cycles value: 21.58 35.41 95% mean confidence interval for cycles %-change: 4.36% 4.52% Cycles are HURT.
2019-08-31 19:40:32 +01:00
('ffma', a, b, ('ffma', c, d, ('ffma', e, 'f', 'g'))), '(info->stage != MESA_SHADER_VERTEX && info->stage != MESA_SHADER_GEOMETRY) && !options->intel_vec4'),
(('~fadd', ('ffma(is_used_once)', a, b, ('fmul', 'c(is_not_const_and_not_fsign)', 'd(is_not_const_and_not_fsign)') ), 'e(is_not_const)'),
('ffma', a, b, ('ffma', c, d, e)), '(info->stage != MESA_SHADER_VERTEX && info->stage != MESA_SHADER_GEOMETRY) && !options->intel_vec4'),
# Convert f2fmp instructions to concrete f2f16 instructions. At this point
# any conversions that could have been removed will have been removed in
# nir_opt_algebraic so any remaining ones are required.
(('f2fmp', a), ('f2f16', a)),
]
for op in ['fadd']:
late_optimizations += [
(('bcsel', a, (op + '(is_used_once)', b, c), (op, b, d)), (op, b, ('bcsel', a, c, d))),
(('bcsel', a, (op, b, c), (op + '(is_used_once)', b, d)), (op, b, ('bcsel', a, c, d))),
]
for op in ['ffma']:
late_optimizations += [
(('bcsel', a, (op + '(is_used_once)', b, c, d), (op, b, c, e)), (op, b, c, ('bcsel', a, d, e))),
(('bcsel', a, (op, b, c, d), (op + '(is_used_once)', b, c, e)), (op, b, c, ('bcsel', a, d, e))),
(('bcsel', a, (op + '(is_used_once)', b, c, d), (op, b, e, d)), (op, b, ('bcsel', a, c, e), d)),
(('bcsel', a, (op, b, c, d), (op + '(is_used_once)', b, e, d)), (op, b, ('bcsel', a, c, e), d)),
]
print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render())
print(nir_algebraic.AlgebraicPass("nir_opt_algebraic_before_ffma",
before_ffma_optimizations).render())
print(nir_algebraic.AlgebraicPass("nir_opt_algebraic_late",
late_optimizations).render())