Commit Graph

116 Commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho 0345aeeb40 intel/compiler: Use nir_opt_conditional_discard
anv vkpipeline-db results for SKL:

total instructions in shared programs: 3622461 -> 3611281 (-0.31%)
instructions in affected programs: 396452 -> 385272 (-2.82%)
helped: 2062
HURT: 1

total cycles in shared programs: 1458144669 -> 1458105320 (<.01%)
cycles in affected programs: 4171830 -> 4132481 (-0.94%)
helped: 1874
HURT: 180

total loops in shared programs: 2437 -> 2437 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 8745 -> 8748 (0.03%)
spills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1

total fills in shared programs: 23392 -> 23395 (0.01%)
fills in affected programs: 8 -> 11 (37.50%)
helped: 1
HURT: 1

LOST:   0
GAINED: 1

No changes to shader-db on i965 or iris.  The glsl compiler already
does a similar optimization.

Improvement suggested by Daniel Schürmann.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-22 09:33:48 -07:00
Jason Ekstrand 110669c85c st,i965: Stop looping on 64-bit lowering
Now that the 64-bit lowering passes do a complete lowering in one go, we
don't need to loop anymore.  We do, however, have to ensure that int64
lowering happens after double lowering because double lowering can
produce int64 ops.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand 974fabe810 intel: Run the optimization loop before and after lowering int64
For bindless SSBO access, we have to do 64-bit address calculations.  On
ICL and above, we don't have 64-bit integer support so we have to lower
the address calculations to 32-bit arithmetic.  If we don't run the
optimization loop before lowering, we won't fold any of the address
chain calculations before lowering 64-bit arithmetic and they aren't
really foldable afterwards.  This cuts the size of the generated code in
the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by
around 30%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-07-13 02:59:28 +00:00
Andres Gomez f4d2be03b1 intel/compiler: remove abandoned comments
c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
2019-07-12 16:15:20 +00:00
Connor Abbott 6b28808b22 intel/nir: Extract add_const_offset_to_base
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:14:53 +02:00
Jason Ekstrand fa869f45c8 intel/fs: Use nir_lower_interpolation on gen11+
On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand 2b79a9e5a5 intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Jason Ekstrand 811c05dfe6 intel/nir: Take nir_shader*s in brw_nir_link_shaders
Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is
enabled, we can just take a single pointer and not a pointer to pointer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand bb67a99a2d intel/nir: Stop returning the shader from helpers
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Ian Romanick 65df6122da intel/compiler: Use compare rematerialization pass
Almost all of the spill / fill benefit is in Deus Ex.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs)   min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel)   min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs)   min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel)   min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0

total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3

LOST:   0
GAINED: 17

Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.

total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs)   min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel)   min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0

total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0

LOST:   0
GAINED: 8

Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.

total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs)   min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel)   min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.

total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0

total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0

LOST:   0
GAINED: 2

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs)   min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel)   min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Jason Ekstrand 4fde459563 intel/nir: Call alu_to_scalar one last time before out-of-ssa
A few of our very late passes can end up generating vectors accidentally
so we need to get rid of them.  The only known case of this is the ffma
peephole which generates fneg and fabs as vectors.  Currently, they're
not a problem because they get turned into fmov which the back-end
compiler knows how to handle as a vector.  That's about to change.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Ian Romanick 45c7ff95fc intel/compiler: Repeat nir_opt_algebraic_late
A tiny bit of help seems to come from nir_copy_prop.  Future patches
will benefit from this change.

Doing more copy propagation on the vec4 backend led to a disaster in
hurt cycles.

v2: Fix typo in comment.  Noticed by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224634 -> 17224623 (<.01%)
instructions in affected programs: 4586 -> 4575 (-0.24%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.19%
Instructions are helped.

total cycles in shared programs: 360828542 -> 360828714 (<.01%)
cycles in affected programs: 151159 -> 151331 (0.11%)
helped: 49
HURT: 28
helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6
helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42%
HURT stats (abs)   min: 1 max: 196 x̄: 52.36 x̃: 15
HURT stats (rel)   min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88%
95% mean confidence interval for cycles value: -13.48 17.95
95% mean confidence interval for cycles %-change: -0.69% 0.84%
Inconclusive result (value mean confidence interval includes 0).

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13529544 -> 13529542 (<.01%)
instructions in affected programs: 358 -> 356 (-0.56%)
helped: 2
HURT: 0

total cycles in shared programs: 357290311 -> 357289678 (<.01%)
cycles in affected programs: 178324 -> 177691 (-0.35%)
helped: 48
HURT: 40
helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13
helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66%
HURT stats (abs)   min: 1 max: 224 x̄: 22.00 x̃: 6
HURT stats (rel)   min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31%
95% mean confidence interval for cycles value: -18.28 3.89
95% mean confidence interval for cycles %-change: -1.01% 0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8159110 -> 8158980 (<.01%)
instructions in affected programs: 22719 -> 22589 (-0.57%)
helped: 65
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74%
95% mean confidence interval for instructions value: -2.06 -1.94
95% mean confidence interval for instructions %-change: -0.78% -0.68%
Instructions are helped.

total cycles in shared programs: 188609448 -> 188609214 (<.01%)
cycles in affected programs: 1875852 -> 1875618 (-0.01%)
helped: 109
HURT: 104
helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4
helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07%
HURT stats (abs)   min: 2 max: 20 x̄: 3.31 x̃: 2
HURT stats (rel)   min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for cycles value: -1.95 -0.25
95% mean confidence interval for cycles %-change: -0.04% -0.01%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Jonathan Marek d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Ian Romanick c866500525 intel/compiler: Don't always require precise lowering of flrp
No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Caio Marcelo de Oliveira Filho 055f6281d4 intel/fs: Don't handle texop_tex for shaders without implicit LOD
These will be lowered by nir_lower_tex() with the
lower_tex_when_implicit_lod_not_supported, so don't need the extra
handling here.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-25 12:13:06 -07:00
Jason Ekstrand 2edf29b933 intel,nir: Lower TXD with a bindless sampler
When we have a bindless sampler, we need an instruction header.  Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers.  Instead, we have to lower TXD to TXL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand b1a633d9fb intel/nir: Re-run int64 lowering in postprocess_nir
We're about to start doing 64-bit pointer calculations in ANV.  They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering.  Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Iago Toral Quiroga 472244b374 intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
Particularly, we need the same lowewrings we use for 16-bit
integers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga 4588f4a604 intel/compiler: handle extended math restrictions for half-float
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.

v2: quashed together the following patches (Jason):
  - intel/compiler: allow extended math functions with HF operands
  - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  - intel/compiler: extended Math is limited to SIMD8 on half-float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
  (allow extended math functions with HF operands,
   extended Math is limited to SIMD8 on half-float)
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga 114f4e6c29 intel/compiler: lower some 16-bit float operations to 32-bit
The hardware doesn't support half-float for these.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga 3e377c68f8 intel/compiler: add a NIR pass to lower conversions
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
 - Consider fp16 rounding conversion opcodes
 - Properly handle swizzles on conversion sources.

v3
 - Run the pass earlier, right after nir_opt_algebraic_late (Jason)
 - NIR alu output types already have the bit-size (Jason)
 - Use 'is_conversion' to identify conversion operations (Jason)

v4:
 - Be careful about the intermediate types we use so we don't lose
   range and avoid incorrect rounding semantics (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Karol Herbst bbf2ecaf35 intel/nir: use nir_src_is_const and nir_src_as_uint
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Mark Janes 2393cc7f00 intel/common: move gen_debug to intel/dev
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler.  Break the circular
dependency by moving gen_debug files to libintel_dev.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-10 13:15:33 -07:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Ian Romanick 7832fb7889 intel/compiler: Use partial redundancy elimination for compares
Almost all of the hurt shaders are repeated instances of the same shader
in synmark's compilation speed tests.

shader-db results:

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256389 (<.01%)
instructions in affected programs: 54137 -> 53686 (-0.83%)
helped: 288
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74%
95% mean confidence interval for instructions value: -1.76 -1.38
95% mean confidence interval for instructions %-change: -2.47% -1.50%
Instructions are helped.

total cycles in shared programs: 372286583 -> 372283851 (<.01%)
cycles in affected programs: 833829 -> 831097 (-0.33%)
helped: 265
HURT: 16
helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4
helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35%
HURT stats (abs)   min: 2 max: 130 x̄: 24.88 x̃: 8
HURT stats (rel)   min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27%
95% mean confidence interval for cycles value: -12.30 -7.15
95% mean confidence interval for cycles %-change: -1.06% -0.64%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5038653 -> 5038495 (<.01%)
instructions in affected programs: 13939 -> 13781 (-1.13%)
helped: 50
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4
helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83%
95% mean confidence interval for instructions value: -3.73 -2.47
95% mean confidence interval for instructions %-change: -3.16% -1.21%
Instructions are helped.

total cycles in shared programs: 128118922 -> 128118228 (<.01%)
cycles in affected programs: 134906 -> 134212 (-0.51%)
helped: 50
HURT: 0
helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18
helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70%
95% mean confidence interval for cycles value: -16.54 -11.22
95% mean confidence interval for cycles %-change: -0.95% -0.53%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:53 -07:00
Jason Ekstrand 08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Jason Ekstrand d3386e73c5 intel/nir: Lower array-deref-of-vector UBO and SSBO loads
This fixes a serious performance issue with DXVK:

https://github.com/doitsujin/dxvk/issues/937

This was caused by a recent change that to improve performance on RADV
which back-fired on ANV and killed performance for some apps:

e5a06d3f4a

Throwing in this bit of lowering lets us come along and CSE those UBO
loads (or copy-prop for SSBO load) and get one load where we previously
would have gotten several.

VkPipeline-db results on Kaby Lake:

    total instructions in shared programs: 5115361 -> 5073185 (-0.82%)
    instructions in affected programs: 1754333 -> 1712157 (-2.40%)
    helped: 5331
    HURT: 63

    total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%)
    cycles in affected programs: 2531058653 -> 2467702029 (-2.50%)
    helped: 9202
    HURT: 4323

    total loops in shared programs: 3340 -> 3331 (-0.27%)
    loops in affected programs: 9 -> 0
    helped: 9
    HURT: 0

    total spills in shared programs: 3246 -> 3053 (-5.95%)
    spills in affected programs: 384 -> 191 (-50.26%)
    helped: 10
    HURT: 5

    total fills in shared programs: 4626 -> 4452 (-3.76%)
    fills in affected programs: 439 -> 265 (-39.64%)
    helped: 10
    HURT: 5

All of the shaders with hurt spilling were in Rise of the Tomb Raider
which also had shaders solidly helped in the spilling department.  Not
shown in those results (because I've not had success dumping the
shaders) is Witcher 3 where this reduces spilling and improves over-all
perf by around 20-25%.  There were no shader-db changes.  Apparently,
this just isn't a pattern that happens in OpenGL.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: "19.0" mesa-stable@lists.freedesktop.org
2019-03-15 23:10:27 -05:00
Caio Marcelo de Oliveira Filho 65e8761474 intel/nir: Combine store_derefs to improve code from SPIR-V
Due to lack of write mask in SPIR-V store, generators may produce
multiple stores to the same vector but using different array derefs.
Use the combining store pass to clean this up.  For example,

    layout(binding = 3) buffer block {
        vec4 v;
    };

    void main() {
        v.x = 11;
        v.y = 22;
    }

after going to SPIR-V and NIR, ends up with in two store_derefs to
v[0] and v[1]

    vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
    vec2 32 ssa_6 = deref_array &(*ssa_4)[0] (ssbo float) /* &((block *)ssa_2)->field0[0] */
    intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x */ /* access=0 */
    vec1 32 ssa_13 = load_const (0x00000001 /* 0.000000 */)
    vec2 32 ssa_14 = deref_array &(*ssa_4)[1] (ssbo float) /* &((block *)ssa_2)->field0[1] */
    intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x */ /* access=0 */

producing two different sends instructions in skl.  The combining pass
transform the snippet above into

    vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
    vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17
    intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy */ /* access=0 */

producing a single sends instruction.

v2: Move this from spirv_to_nir into the general optimization pass for
    intel compiler.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho 10dfb0011e intel/nir: Combine store_derefs after vectorizing IO
Shader-db results for skl:

    total instructions in shared programs: 15232903 -> 15224781 (-0.05%)
    instructions in affected programs: 61246 -> 53124 (-13.26%)
    helped: 221
    HURT: 0

    total cycles in shared programs: 371440470 -> 371398018 (-0.01%)
    cycles in affected programs: 281363 -> 238911 (-15.09%)
    helped: 221
    HURT: 0

Results for bdw are very similar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho 822a8865e4 nir: Add a pass to combine store_derefs to same vector
v2: (all from Jason)
    Reuse existing function for the end of the block combinations.
    Check the SSA values are coming from the right place in tests.
    Document the case when the store to array_deref is reused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Jason Ekstrand 6d5d89d25a intel/nir: Vectorize all IO
The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one.  To
alleviate this, we now vectorize the I/O once again.  This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15224023 -> 15220871 (-0.02%)
    instructions in affected programs: 342009 -> 338857 (-0.92%)
    helped: 1236
    HURT: 443

    total spills in shared programs: 23471 -> 23465 (-0.03%)
    spills in affected programs: 6 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 31770 -> 31766 (-0.01%)
    fills in affected programs: 4 -> 0
    helped: 1
    HURT: 0

Cycles was just a lot of churn do to moves being different places.  Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-03-12 15:34:06 +00:00
Jason Ekstrand 179d254cba intel/nir: Move lower_mem_access_bit_sizes to postprocess_nir
It doesn't really matter where this pass goes as long as it's after we
call nir_lower_explicit_io and before we go into the back-end.  Putting
it brw_postprocess_nir lets us move nir_lower_explicit_io significantly
later in the pipeline.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-08 22:03:14 -06:00
Jason Ekstrand 656ace3dd8 intel/nir: Move 64-bit lowering later
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later.  This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything.  This substantially reduces both fp64 shader compile times
and the resulting code size.  On the vs-isnan-dvec test from piglit:

Before this commit:

    1684.63s user 17.29s system 99% cpu 28:28.24 total
    101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
    Peak memory usage (according to massif): 1.435 GB

After this commit:

    179.64s user 7.75s system 99% cpu 3:07.92 total
    57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
    Peak memory usage (according to massif): 531.0 MB

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand e02959f442 nir/lower_doubles: Inline functions directly in lower_doubles
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves.  This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand 8993e0973f intel/nir: Drop an unneeded lower_constant_initializers call
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed.  We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand 5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Jordan Justen 10c5579921
intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Kasireddy, Vivek 7cab8d3661 i965: Add support for sampling from XYUV images
Add support to the i965 DRI driver to sample from XYUV8888 buffers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:52 +00:00
Tapani Pälli 3da858a6b9 intel/compiler: add scale_factors to sampler_prog_key_data
Patch propagates given scale_factors to lowering options.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 08:42:25 +02:00
Karol Herbst b9fec2b38c nir: replace more nir_load_system_value calls with builder functions
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-21 00:16:51 +01:00
Karol Herbst 9b24028426 nir: rename nir_var_function to nir_var_function_temp
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-19 20:01:41 +01:00
Jason Ekstrand 24c8108ea6 intel/nir: Call nir_opt_deref in brw_nir_optimize
It's an optimization so we should probably be calling it in the
optimization loop.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-01-12 17:55:49 -06:00
Matt Turner 613ac3aaa2 i965: Compile fp64 software routines and lower double-ops
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:41 -08:00
Karol Herbst d0c6ef2793 nir: rename global/local to private/function memory
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.

v2: rename local to function as well
v3: rename vtn_variable_mode_local as well

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 18:51:46 +01:00
Timothy Arceri 50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Iago Toral Quiroga d6110d4d54 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
The former expects to see SSA-only things, but the latter injects registers.

The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.

Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-20 08:02:44 +01:00
Ian Romanick af07141b33 intel/compiler: More peephole_select for pre-Gen6
No shader-db changes on any Gen6+ platform.

All of the shaders with cycles hurt by more than ~2% are from Master of
Orion.  All of the shaders have instructions helped.  It looks like the
pass enables some control flow to be converted to bcsels, then the
scheduler does dumb things.  These are new shaders (just added before
doing this shader-db run), so there's probably some low-hanging fruit.

Iron Lake
total instructions in shared programs: 8214327 -> 8213684 (<.01%)
instructions in affected programs: 84469 -> 83826 (-0.76%)
helped: 114
HURT: 26
helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61%
95% mean confidence interval for instructions value: -5.87 -3.32
95% mean confidence interval for instructions %-change: -2.32% -1.17%
Instructions are helped.

total cycles in shared programs: 187736850 -> 187749314 (<.01%)
cycles in affected programs: 506750 -> 519214 (2.46%)
helped: 104
HURT: 36
helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16
helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63%
HURT stats (abs)   min: 4 max: 1402 x̄: 409.67 x̃: 40
HURT stats (rel)   min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39%
95% mean confidence interval for cycles value: 28.32 149.74
95% mean confidence interval for cycles %-change: -0.07% 1.61%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 5044014 -> 5043652 (<.01%)
instructions in affected programs: 46751 -> 46389 (-0.77%)
helped: 63
HURT: 13
helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52%
95% mean confidence interval for instructions value: -6.54 -2.99
95% mean confidence interval for instructions %-change: -3.04% -1.28%
Instructions are helped.

total cycles in shared programs: 128143042 -> 128150188 (<.01%)
cycles in affected programs: 324564 -> 331710 (2.20%)
helped: 57
HURT: 19
helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32
helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81%
HURT stats (abs)   min: 10 max: 1400 x̄: 468.21 x̃: 60
HURT stats (rel)   min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70%
95% mean confidence interval for cycles value: 6.90 181.15
95% mean confidence interval for cycles %-change: -0.52% 1.59%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 8fb8ebfbb0 intel/compiler: More peephole select
Shader-db results:

The one shader hurt for instructions is a compute shader that had both
spills and fills hurt.

v2: Fix typo in comment noticed by Caio.

v3: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15072761 -> 15047884 (-0.17%)
instructions in affected programs: 895539 -> 870662 (-2.78%)
helped: 3623
HURT: 1
helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20%
HURT stats (abs)   min: 92 max: 92 x̄: 92.00 x̃: 92
HURT stats (rel)   min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92%
95% mean confidence interval for instructions value: -7.10 -6.63
95% mean confidence interval for instructions %-change: -4.03% -3.82%
Instructions are helped.

total cycles in shared programs: 369738930 -> 369535732 (-0.05%)
cycles in affected programs: 68027851 -> 67824653 (-0.30%)
helped: 2609
HURT: 1035
helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77
helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47%
HURT stats (abs)   min: 1 max: 33336 x̄: 261.04 x̃: 20
HURT stats (rel)   min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47%
95% mean confidence interval for cycles value: -96.43 -15.09
95% mean confidence interval for cycles %-change: -6.07% -5.36%
Cycles are helped.

total spills in shared programs: 10158 -> 10159 (<.01%)
spills in affected programs: 166 -> 167 (0.60%)
helped: 1
HURT: 1

total fills in shared programs: 22105 -> 22116 (0.05%)
fills in affected programs: 837 -> 848 (1.31%)
helped: 4
HURT: 1

Ivy Bridge
total instructions in shared programs: 12021190 -> 11990256 (-0.26%)
instructions in affected programs: 910561 -> 879627 (-3.40%)
helped: 3344
HURT: 18
helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6
helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31%
HURT stats (abs)   min: 2 max: 20 x̄: 7.89 x̃: 6
HURT stats (rel)   min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70%
95% mean confidence interval for instructions value: -9.49 -8.91
95% mean confidence interval for instructions %-change: -5.32% -4.98%
Instructions are helped.

total cycles in shared programs: 179077826 -> 178570196 (-0.28%)
cycles in affected programs: 63205667 -> 62698037 (-0.80%)
helped: 2767
HURT: 620
helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88
helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09%
HURT stats (abs)   min: 1 max: 31255 x̄: 152.27 x̃: 11
HURT stats (rel)   min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58%
95% mean confidence interval for cycles value: -173.94 -125.81
95% mean confidence interval for cycles %-change: -7.68% -6.97%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10852569 -> 10843758 (-0.08%)
instructions in affected programs: 235803 -> 226992 (-3.74%)
helped: 800
HURT: 0
helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8
helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36%
95% mean confidence interval for instructions value: -11.93 -10.10
95% mean confidence interval for instructions %-change: -4.99% -4.39%
Instructions are helped.

total cycles in shared programs: 154732047 -> 154608941 (-0.08%)
cycles in affected programs: 4063110 -> 3940004 (-3.03%)
helped: 606
HURT: 253
helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62
helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81%
HURT stats (abs)   min: 1 max: 1966 x̄: 59.36 x̃: 11
HURT stats (rel)   min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67%
95% mean confidence interval for cycles value: -170.49 -116.13
95% mean confidence interval for cycles %-change: -2.61% -1.65%
Cycles are helped.

No change on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00