Commit Graph

99717 Commits

Author SHA1 Message Date
Ian Romanick 053be9f020 nir: Rearrange and-compounded float compares
If both comparisons are used as sources for instructions other than the
iand, this transformation is detrimental.  If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.

It is interesting to me that on Iron Lake only 81 shaders have
instruction counts changed, but 726 shaders have cycle counts changed.

shader-db results:

Skylake
total instructions in shared programs: 14525728 -> 14521017 (-0.03%)
instructions in affected programs: 1164726 -> 1160015 (-0.40%)
helped: 1692
HURT: 5
helped stats (abs) min: 1 max: 637 x̄: 2.79 x̃: 2
helped stats (rel) min: 0.07% max: 16.36% x̄: 0.81% x̃: 0.33%
HURT stats (abs)   min: 1 max: 12 x̄: 3.20 x̃: 1
HURT stats (rel)   min: 0.38% max: 2.86% x̄: 2.36% x̃: 2.86%
95% mean confidence interval for instructions value: -3.52 -2.03
95% mean confidence interval for instructions %-change: -0.86% -0.74%
Instructions are helped.

total cycles in shared programs: 533115449 -> 532991404 (-0.02%)
cycles in affected programs: 119401803 -> 119277758 (-0.10%)
helped: 1145
HURT: 467
helped stats (abs) min: 1 max: 34644 x̄: 145.92 x̃: 18
helped stats (rel) min: <.01% max: 45.33% x̄: 1.58% x̃: 0.42%
HURT stats (abs)   min: 1 max: 1590 x̄: 92.15 x̃: 15
HURT stats (rel)   min: <.01% max: 13.48% x̄: 1.26% x̃: 0.39%
95% mean confidence interval for cycles value: -122.16 -31.74
95% mean confidence interval for cycles %-change: -0.94% -0.57%
Cycles are helped.

total spills in shared programs: 9597 -> 9534 (-0.66%)
spills in affected programs: 403 -> 340 (-15.63%)
helped: 1
HURT: 1

total fills in shared programs: 13904 -> 13790 (-0.82%)
fills in affected programs: 1627 -> 1513 (-7.01%)
helped: 2
HURT: 1

LOST:   0
GAINED: 2

Broadwell
total instructions in shared programs: 14816966 -> 14812590 (-0.03%)
instructions in affected programs: 1499885 -> 1495509 (-0.29%)
helped: 1672
HURT: 15
helped stats (abs) min: 1 max: 455 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.05% max: 16.36% x̄: 0.81% x̃: 0.33%
HURT stats (abs)   min: 1 max: 21 x̄: 9.20 x̃: 8
HURT stats (rel)   min: 0.08% max: 2.86% x̄: 1.06% x̃: 0.53%
95% mean confidence interval for instructions value: -3.14 -2.05
95% mean confidence interval for instructions %-change: -0.85% -0.73%
Instructions are helped.

total cycles in shared programs: 559353622 -> 559345595 (<.01%)
cycles in affected programs: 139893703 -> 139885676 (<.01%)
helped: 921
HURT: 697
helped stats (abs) min: 1 max: 42424 x̄: 143.45 x̃: 18
helped stats (rel) min: <.01% max: 36.23% x̄: 2.02% x̃: 0.87%
HURT stats (abs)   min: 1 max: 2370 x̄: 178.03 x̃: 38
HURT stats (rel)   min: <.01% max: 17.35% x̄: 0.71% x̃: 0.14%
95% mean confidence interval for cycles value: -59.64 49.72
95% mean confidence interval for cycles %-change: -1.02% -0.66%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 78902 -> 78861 (-0.05%)
spills in affected programs: 2418 -> 2377 (-1.70%)
helped: 1
HURT: 11

total fills in shared programs: 83782 -> 83678 (-0.12%)
fills in affected programs: 3515 -> 3411 (-2.96%)
helped: 2
HURT: 11

LOST:   0
GAINED: 5

Haswell and Ivy Bridge had similar results. Haswell shown.
total instructions in shared programs: 9033898 -> 9032010 (-0.02%)
instructions in affected programs: 308064 -> 306176 (-0.61%)
helped: 921
HURT: 4
helped stats (abs) min: 1 max: 20 x̄: 2.05 x̃: 1
helped stats (rel) min: 0.17% max: 17.54% x̄: 0.80% x̃: 0.35%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23%
95% mean confidence interval for instructions value: -2.21 -1.87
95% mean confidence interval for instructions %-change: -0.88% -0.68%
Instructions are helped.

total cycles in shared programs: 84628949 -> 84620520 (<.01%)
cycles in affected programs: 2164913 -> 2156484 (-0.39%)
helped: 518
HURT: 359
helped stats (abs) min: 1 max: 440 x̄: 41.52 x̃: 20
helped stats (rel) min: <.01% max: 17.17% x̄: 1.95% x̃: 1.01%
HURT stats (abs)   min: 1 max: 586 x̄: 36.43 x̃: 8
HURT stats (rel)   min: 0.04% max: 18.65% x̄: 1.47% x̃: 0.40%
95% mean confidence interval for cycles value: -15.17 -4.05
95% mean confidence interval for cycles %-change: -0.77% -0.32%
Cycles are helped.

LOST:   0
GAINED: 4

Sandy Bridge
total instructions in shared programs: 10544860 -> 10542933 (-0.02%)
instructions in affected programs: 360019 -> 358092 (-0.54%)
helped: 931
HURT: 4
helped stats (abs) min: 1 max: 20 x̄: 2.07 x̃: 1
helped stats (rel) min: 0.11% max: 15.52% x̄: 0.68% x̃: 0.30%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 3.33% max: 3.33% x̄: 3.33% x̃: 3.33%
95% mean confidence interval for instructions value: -2.23 -1.89
95% mean confidence interval for instructions %-change: -0.76% -0.58%
Instructions are helped.

total cycles in shared programs: 146106820 -> 146098397 (<.01%)
cycles in affected programs: 3435047 -> 3426624 (-0.25%)
helped: 572
HURT: 329
helped stats (abs) min: 1 max: 1289 x̄: 32.52 x̃: 15
helped stats (rel) min: <.01% max: 26.29% x̄: 0.97% x̃: 0.33%
HURT stats (abs)   min: 1 max: 1714 x̄: 30.93 x̃: 6
HURT stats (rel)   min: 0.02% max: 41.31% x̄: 1.13% x̃: 0.19%
95% mean confidence interval for cycles value: -16.85 -1.85
95% mean confidence interval for cycles %-change: -0.39% -0.01%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 7886925 -> 7886711 (<.01%)
instructions in affected programs: 25763 -> 25549 (-0.83%)
helped: 75
HURT: 6
helped stats (abs) min: 1 max: 13 x̄: 3.33 x̃: 1
helped stats (rel) min: 0.35% max: 17.57% x̄: 1.96% x̃: 0.53%
HURT stats (abs)   min: 1 max: 16 x̄: 6.00 x̃: 1
HURT stats (rel)   min: 2.86% max: 4.79% x̄: 3.49% x̃: 2.86%
95% mean confidence interval for instructions value: -3.69 -1.60
95% mean confidence interval for instructions %-change: -2.54% -0.57%
Instructions are helped.

total cycles in shared programs: 178116888 -> 178115324 (<.01%)
cycles in affected programs: 5858790 -> 5857226 (-0.03%)
helped: 484
HURT: 242
helped stats (abs) min: 2 max: 76 x̄: 5.27 x̃: 6
helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
HURT stats (abs)   min: 2 max: 76 x̄: 4.07 x̃: 2
HURT stats (rel)   min: 0.01% max: 3.99% x̄: 0.19% x̃: 0.03%
95% mean confidence interval for cycles value: -2.76 -1.55
95% mean confidence interval for cycles %-change: -0.12% 0.01%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857870 -> 4857762 (<.01%)
instructions in affected programs: 13994 -> 13886 (-0.77%)
helped: 39
HURT: 5
helped stats (abs) min: 1 max: 13 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.33% max: 17.11% x̄: 1.86% x̃: 0.48%
HURT stats (abs)   min: 1 max: 16 x̄: 4.00 x̃: 1
HURT stats (rel)   min: 2.86% max: 4.71% x̄: 3.23% x̃: 2.86%
95% mean confidence interval for instructions value: -3.86 -1.05
95% mean confidence interval for instructions %-change: -2.61% 0.04%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 122180744 -> 122179674 (<.01%)
cycles in affected programs: 3686646 -> 3685576 (-0.03%)
helped: 273
HURT: 141
helped stats (abs) min: 2 max: 76 x̄: 5.81 x̃: 6
helped stats (rel) min: 0.01% max: 10.70% x̄: 0.18% x̃: 0.06%
HURT stats (abs)   min: 2 max: 76 x̄: 3.66 x̃: 2
HURT stats (rel)   min: 0.01% max: 3.99% x̄: 0.16% x̃: 0.02%
95% mean confidence interval for cycles value: -3.42 -1.75
95% mean confidence interval for cycles %-change: -0.15% 0.03%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 821e7a4d32 nir: Separate a weird compare with zero to two compares with zero
min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0).

No shader-db changes, but it does prevent 6 to 12 instruction
regressions in the next patch on all measured Intel platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 68420d8322 nir: Simplify min and max of b2f
v2: Rebase on almost 2 years.  Require that one of the arguments to fmin
or fmax be used only once.  This prevents some regressions.

shader-db results:

Skylake and Broadwell had similar results.  Skylake shown.
total instructions in shared programs: 14526021 -> 14525913 (<.01%)
instructions in affected programs: 4613 -> 4505 (-2.34%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4
helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42%

total cycles in shared programs: 533118710 -> 533118403 (<.01%)
cycles in affected programs: 34334 -> 34027 (-0.89%)
helped: 24
HURT: 0
helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14
helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03%

No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick d8d18516b0 nir: Undo possible damage caused by rearranging or-compounded float compares
shader-db results:

Skylake and Broadwell had similar results (Skylake shown)
total instructions in shared programs: 14525898 -> 14525836 (<.01%)
instructions in affected programs: 1964 -> 1902 (-3.16%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 25 x̄: 4.43 x̃: 1
helped stats (rel) min: 0.68% max: 9.77% x̄: 2.10% x̃: 0.86%
95% mean confidence interval for instructions value: -9.46 0.60
95% mean confidence interval for instructions %-change: -3.97% -0.24%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 533119892 -> 533115756 (<.01%)
cycles in affected programs: 96061 -> 91925 (-4.31%)
helped: 13
HURT: 1
helped stats (abs) min: 60 max: 596 x̄: 318.77 x̃: 300
helped stats (rel) min: 1.15% max: 5.49% x̄: 4.27% x̃: 4.42%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.46% max: 0.46% x̄: 0.46% x̃: 0.46%
95% mean confidence interval for cycles value: -379.43 -211.43
95% mean confidence interval for cycles %-change: -4.84% -3.01%
Cycles are helped.

Haswell, Ivy Bridge and Sandy Bridge had similar results (Haswell shown).
total instructions in shared programs: 9033948 -> 9033898 (<.01%)
instructions in affected programs: 535 -> 485 (-9.35%)
helped: 2
HURT: 0

total cycles in shared programs: 84631402 -> 84628949 (<.01%)
cycles in affected programs: 63197 -> 60744 (-3.88%)
helped: 13
HURT: 2
helped stats (abs) min: 1 max: 594 x̄: 189.62 x̃: 140
helped stats (rel) min: 0.07% max: 5.04% x̄: 3.79% x̃: 4.01%
HURT stats (abs)   min: 4 max: 8 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.17% max: 0.45% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for cycles value: -253.40 -73.67
95% mean confidence interval for cycles %-change: -4.24% -2.25%
Cycles are helped.

No changes on GM45 or Iron Lake.

v2: Add a couple more tautological compares.  Suggested by Elie.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick 3941cba0f7 nir: Be more conservative about rearranging or-compounded compares
If both comparisons are used as sources for instructions other than the
ior, this transformation is detrimental.  If the non-identical value in
both compares is constant, the fmin or fmax will be constant-folded
away, so the transformation is always a win.

shader-db results:

Skylake
total instructions in shared programs: 14526147 -> 14525898 (<.01%)
instructions in affected programs: 70239 -> 69990 (-0.35%)
helped: 102
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.86 -2.02
95% mean confidence interval for instructions %-change: -0.46% -0.31%
Instructions are helped.

total cycles in shared programs: 533120531 -> 533119892 (<.01%)
cycles in affected programs: 994875 -> 994236 (-0.06%)
helped: 76
HURT: 26
helped stats (abs) min: 1 max: 324 x̄: 27.09 x̃: 13
helped stats (rel) min: <.01% max: 4.21% x̄: 0.45% x̃: 0.18%
HURT stats (abs)   min: 1 max: 167 x̄: 54.62 x̃: 26
HURT stats (rel)   min: <.01% max: 4.36% x̄: 1.01% x̃: 0.39%
95% mean confidence interval for cycles value: -19.44 6.91
95% mean confidence interval for cycles %-change: -0.30% 0.15%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 14816005 -> 14815787 (<.01%)
instructions in affected programs: 64658 -> 64440 (-0.34%)
helped: 97
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.25 x̃: 1
helped stats (rel) min: 0.07% max: 2.30% x̄: 0.38% x̃: 0.20%
95% mean confidence interval for instructions value: -2.62 -1.87
95% mean confidence interval for instructions %-change: -0.45% -0.30%
Instructions are helped.

total cycles in shared programs: 559340386 -> 559339907 (<.01%)
cycles in affected programs: 1090491 -> 1090012 (-0.04%)
helped: 66
HURT: 28
helped stats (abs) min: 2 max: 198 x̄: 23.83 x̃: 16
helped stats (rel) min: 0.01% max: 4.21% x̄: 0.47% x̃: 0.27%
HURT stats (abs)   min: 2 max: 226 x̄: 39.07 x̃: 11
HURT stats (rel)   min: <.01% max: 4.61% x̄: 0.64% x̃: 0.20%
95% mean confidence interval for cycles value: -15.94 5.75
95% mean confidence interval for cycles %-change: -0.35% 0.07%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 1

Haswell
total instructions in shared programs: 9034106 -> 9033948 (<.01%)
instructions in affected programs: 24096 -> 23938 (-0.66%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.16 x̃: 4
helped stats (rel) min: 0.42% max: 2.29% x̄: 0.71% x̃: 0.64%
95% mean confidence interval for instructions value: -4.71 -3.60
95% mean confidence interval for instructions %-change: -0.84% -0.58%
Instructions are helped.

total cycles in shared programs: 84631628 -> 84631402 (<.01%)
cycles in affected programs: 148674 -> 148448 (-0.15%)
helped: 14
HURT: 14
helped stats (abs) min: 1 max: 114 x̄: 22.14 x̃: 12
helped stats (rel) min: 0.02% max: 2.98% x̄: 0.66% x̃: 0.21%
HURT stats (abs)   min: 1 max: 10 x̄: 6.00 x̃: 5
HURT stats (rel)   min: 0.01% max: 0.20% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: -19.42 3.28
95% mean confidence interval for cycles %-change: -0.59% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total instructions in shared programs: 10015456 -> 10015293 (<.01%)
instructions in affected programs: 27701 -> 27538 (-0.59%)
helped: 38
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 4.29 x̃: 4
helped stats (rel) min: 0.33% max: 2.79% x̄: 0.66% x̃: 0.52%
95% mean confidence interval for instructions value: -4.87 -3.71
95% mean confidence interval for instructions %-change: -0.82% -0.51%
Instructions are helped.

total cycles in shared programs: 87524771 -> 87524569 (<.01%)
cycles in affected programs: 112324 -> 112122 (-0.18%)
helped: 6
HURT: 12
helped stats (abs) min: 2 max: 111 x̄: 44.67 x̃: 20
helped stats (rel) min: 0.02% max: 2.94% x̄: 1.45% x̃: 1.26%
HURT stats (abs)   min: 1 max: 16 x̄: 5.50 x̃: 5
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -29.14 6.69
95% mean confidence interval for cycles %-change: -0.93% 0.08%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 2

Sandy Bridge
total instructions in shared programs: 10545655 -> 10545465 (<.01%)
instructions in affected programs: 37198 -> 37008 (-0.51%)
helped: 42
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 4.52 x̃: 4
helped stats (rel) min: 0.31% max: 2.15% x̄: 0.58% x̃: 0.49%
95% mean confidence interval for instructions value: -5.14 -3.91
95% mean confidence interval for instructions %-change: -0.68% -0.47%
Instructions are helped.

total cycles in shared programs: 146113059 -> 146112427 (<.01%)
cycles in affected programs: 423514 -> 422882 (-0.15%)
helped: 32
HURT: 10
helped stats (abs) min: 4 max: 162 x̄: 24.34 x̃: 12
helped stats (rel) min: 0.06% max: 2.74% x̄: 0.37% x̃: 0.11%
HURT stats (abs)   min: 12 max: 19 x̄: 14.70 x̃: 14
HURT stats (rel)   min: 0.10% max: 0.18% x̄: 0.16% x̃: 0.14%
95% mean confidence interval for cycles value: -26.03 -4.07
95% mean confidence interval for cycles %-change: -0.43% -0.05%
Cycles are helped.

Iron Lake
total instructions in shared programs: 7886959 -> 7886925 (<.01%)
instructions in affected programs: 1340 -> 1306 (-2.54%)
helped: 4
HURT: 0
helped stats (abs) min: 2 max: 15 x̄: 8.50 x̃: 8
helped stats (rel) min: 0.63% max: 4.30% x̄: 2.45% x̃: 2.43%
95% mean confidence interval for instructions value: -20.44 3.44
95% mean confidence interval for instructions %-change: -5.78% 0.89%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 178116996 -> 178116888 (<.01%)
cycles in affected programs: 6262 -> 6154 (-1.72%)
helped: 2
HURT: 2
helped stats (abs) min: 44 max: 78 x̄: 61.00 x̃: 61
helped stats (rel) min: 3.31% max: 3.94% x̄: 3.62% x̃: 3.62%
HURT stats (abs)   min: 6 max: 8 x̄: 7.00 x̃: 7
HURT stats (rel)   min: 0.34% max: 0.68% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -93.27 39.27
95% mean confidence interval for cycles %-change: -5.38% 2.27%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4857887 -> 4857870 (<.01%)
instructions in affected programs: 674 -> 657 (-2.52%)
helped: 2
HURT: 0

total cycles in shared programs: 122180816 -> 122180744 (<.01%)
cycles in affected programs: 3764 -> 3692 (-1.91%)
helped: 1
HURT: 1
helped stats (abs) min: 78 max: 78 x̄: 78.00 x̃: 78
helped stats (rel) min: 3.94% max: 3.94% x̄: 3.94% x̃: 3.94%
HURT stats (abs)   min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel)   min: 0.34% max: 0.34% x̄: 0.34% x̃: 0.34%

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Ian Romanick cfc0d34802 nir: See through an fneg to apply existing optimizations
Doing the same for the existing feq and fne transformations didn't help
anything in shader-db.

shader-db results:

Broadwell and Skylake (Skylake shown)
total instructions in shared programs: 14529463 -> 14526147 (-0.02%)
instructions in affected programs: 402420 -> 399104 (-0.82%)
helped: 2136
HURT: 131
helped stats (abs) min: 1 max: 10 x̄: 1.61 x̃: 1
helped stats (rel) min: 0.03% max: 16.22% x̄: 3.14% x̃: 1.12%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.13% max: 7.69% x̄: 0.75% x̃: 0.57%
95% mean confidence interval for instructions value: -1.51 -1.41
95% mean confidence interval for instructions %-change: -3.06% -2.78%
Instructions are helped.

total cycles in shared programs: 533146915 -> 533120531 (<.01%)
cycles in affected programs: 10356261 -> 10329877 (-0.25%)
helped: 1933
HURT: 844
helped stats (abs) min: 1 max: 490 x̄: 29.44 x̃: 16
helped stats (rel) min: <.01% max: 28.57% x̄: 3.43% x̃: 1.88%
HURT stats (abs)   min: 1 max: 423 x̄: 36.17 x̃: 12
HURT stats (rel)   min: <.01% max: 23.75% x̄: 1.90% x̃: 0.59%
95% mean confidence interval for cycles value: -11.78 -7.22
95% mean confidence interval for cycles %-change: -1.98% -1.65%
Cycles are helped.

Haswell
total instructions in shared programs: 9037416 -> 9034106 (-0.04%)
instructions in affected programs: 389831 -> 386521 (-0.85%)
helped: 2184
HURT: 120
helped stats (abs) min: 1 max: 11 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.73% x̃: 1.02%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.19% max: 7.69% x̄: 0.81% x̃: 0.57%
95% mean confidence interval for instructions value: -1.49 -1.39
95% mean confidence interval for instructions %-change: -2.68% -2.41%
Instructions are helped.

total cycles in shared programs: 84636243 -> 84631628 (<.01%)
cycles in affected programs: 4745058 -> 4740443 (-0.10%)
helped: 1904
HURT: 960
helped stats (abs) min: 1 max: 466 x̄: 30.21 x̃: 18
helped stats (rel) min: 0.02% max: 36.36% x̄: 3.57% x̃: 2.38%
HURT stats (abs)   min: 1 max: 1080 x̄: 55.11 x̃: 14
HURT stats (rel)   min: 0.02% max: 51.33% x̄: 2.77% x̃: 0.81%
95% mean confidence interval for cycles value: -4.51 1.29
95% mean confidence interval for cycles %-change: -1.64% -1.25%
Inconclusive result (value mean confidence interval includes 0).

LOST:   1
GAINED: 0

Sandy Bridge and Ivy Bridge (Ivy Bridge shown)
total instructions in shared programs: 10018873 -> 10015456 (-0.03%)
instructions in affected programs: 512820 -> 509403 (-0.67%)
helped: 2268
HURT: 162
helped stats (abs) min: 1 max: 11 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 2.47% x̃: 0.88%
HURT stats (abs)   min: 1 max: 4 x̄: 1.59 x̃: 1
HURT stats (rel)   min: 0.09% max: 7.69% x̄: 0.86% x̃: 0.50%
95% mean confidence interval for instructions value: -1.46 -1.35
95% mean confidence interval for instructions %-change: -2.38% -2.12%
Instructions are helped.

total cycles in shared programs: 87538223 -> 87524771 (-0.02%)
cycles in affected programs: 5435520 -> 5422068 (-0.25%)
helped: 1916
HURT: 946
helped stats (abs) min: 1 max: 1392 x̄: 29.44 x̃: 18
helped stats (rel) min: <.01% max: 34.51% x̄: 3.34% x̃: 1.97%
HURT stats (abs)   min: 1 max: 633 x̄: 45.41 x̃: 11
HURT stats (rel)   min: 0.02% max: 25.95% x̄: 2.41% x̃: 0.62%
95% mean confidence interval for cycles value: -7.34 -2.06
95% mean confidence interval for cycles %-change: -1.62% -1.26%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 7888446 -> 7886959 (-0.02%)
instructions in affected programs: 331581 -> 330094 (-0.45%)
helped: 1160
HURT: 97
helped stats (abs) min: 1 max: 10 x̄: 1.37 x̃: 1
helped stats (rel) min: 0.02% max: 9.68% x̄: 0.93% x̃: 0.43%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.17% max: 4.17% x̄: 0.37% x̃: 0.25%
95% mean confidence interval for instructions value: -1.25 -1.12
95% mean confidence interval for instructions %-change: -0.91% -0.75%
Instructions are helped.

total cycles in shared programs: 178130766 -> 178116996 (<.01%)
cycles in affected programs: 12534564 -> 12520794 (-0.11%)
helped: 1856
HURT: 187
helped stats (abs) min: 2 max: 202 x̄: 7.78 x̃: 4
helped stats (rel) min: <.01% max: 6.47% x̄: 0.28% x̃: 0.11%
HURT stats (abs)   min: 2 max: 26 x̄: 3.55 x̃: 2
HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.08% x̃: 0.02%
95% mean confidence interval for cycles value: -7.41 -6.07
95% mean confidence interval for cycles %-change: -0.28% -0.22%
Cycles are helped.

GM45
total instructions in shared programs: 4858912 -> 4857887 (-0.02%)
instructions in affected programs: 237565 -> 236540 (-0.43%)
helped: 867
HURT: 57
helped stats (abs) min: 1 max: 10 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.02% max: 9.38% x̄: 0.87% x̃: 0.43%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 3.85% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: -1.18 -1.04
95% mean confidence interval for instructions %-change: -0.88% -0.71%
Instructions are helped.

total cycles in shared programs: 122189118 -> 122180816 (<.01%)
cycles in affected programs: 8776418 -> 8768116 (-0.09%)
helped: 1213
HURT: 166
helped stats (abs) min: 2 max: 202 x̄: 7.30 x̃: 4
helped stats (rel) min: <.01% max: 6.43% x̄: 0.25% x̃: 0.11%
HURT stats (abs)   min: 2 max: 26 x̄: 3.35 x̃: 2
HURT stats (rel)   min: 0.01% max: 2.14% x̄: 0.06% x̃: 0.02%
95% mean confidence interval for cycles value: -6.78 -5.26
95% mean confidence interval for cycles %-change: -0.24% -0.18%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-01-30 15:40:14 -08:00
Timothy Arceri 283e25102b st/glsl_to_nir: disable io lowering and array splitting of fs inputs
We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.

The lowering and splitting is made conditional on lower_all_io_to_temps
because vc4 and freedreno both expect these passes to be enabled and
niether support glsl 400 so don't need to deal with the interpolateAt
builtins.

We leave the other stages for now as to avoid regressions. Ideally we
could remove the stage checks and just set the nir options correctly
for each stage. However all gallium drivers currently just use return
the same nir compiler options for all stages, and it's probably more
trouble than its worth to change this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:08 +11:00
Timothy Arceri 9a2e085680 nir: add lower_all_io_to_temps flag
This will be used for freedreno and vc4 which require all inputs
and outputs to be copied to temps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:08 +11:00
Timothy Arceri 3218756262 nir/st_glsl_to_nir: add param to disable splitting of inputs
We need this because we will always copy fs outputs to temps and
split the arrays, but do not want to do either of these with fs
inputs as it is unnessisary and makes handling interpolateAt
builtins difficult.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:08 +11:00
Timothy Arceri 93e213f91f st/glsl_to_nir: copy nir compiler options to context
Various nir passes may expect this to be here as does the nir
serialisation pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:08 +11:00
Timothy Arceri dd6d6c63a7 radeonsi/nir: add input support for arrays that have not been copied to temps and split
We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri d185190222 ac/radeonsi: add lookup_interp_param and load_sample_position to the abi
This will enable the interpolateAt builtins to work on the radeonsi
nir backend.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri 97058168a4 radeonsi/nir: add prim_mask to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri 3ff012f142 radeonsi/nir: adjust load_sample_position() to be shared between backends
With this interface change it can be shared between the tgsi and
nir backends.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri 3a47b138e3 radeonsi/nir: add si_nir_lookup_interp_param() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri b8808848ce ac/nir_to_llvm: move some interp defines to the header
These will be used in the following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri fea6da9aaa radeonsi/nir: move the interpolation qualifier scanning
We need to collect this when scanning over the instruction rather
than when scanning over the inputs otherwise we might get confliting
values for inputs that are use by the interpolateAt* builtins.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri 580f1aa247 radeonsi/nir: add interpolate at intrinsics to scan_instruction()
V2: use the uses_*_opcode_interp_* flags

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Bas Nieuwenhuizen 882eff4d20 radv: Merge raster state with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:02:05 +01:00
Bas Nieuwenhuizen 69364f1c34 radv: Move gs state out of pipeline.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:02:01 +01:00
Bas Nieuwenhuizen e4e060d135 radv: Split out cliprect rule generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:56 +01:00
Bas Nieuwenhuizen acbaef3005 radv: Merge VGT_GS_MODE computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:52 +01:00
Bas Nieuwenhuizen 4ae6a8b0cd radv: Split out processing the vertex input state.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:41 +01:00
Bas Nieuwenhuizen 9062b1c241 radv: Move tessellation state out of pipeline.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:38 +01:00
Bas Nieuwenhuizen 4aa1cb4e90 radv: Move blend state out of pipeline.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:34 +01:00
Bas Nieuwenhuizen 0f72f0eacb radv: Split out generating VGT_SHADER_STAGES_EN.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:30 +01:00
Bas Nieuwenhuizen 694c34314b radv: Split out the ia_multi_vgt_param precomputation.
Also moved everything in a struct and then return the struct from
the helper function, so it is clear in the caller what part of the
pipeline gets modified.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:26 +01:00
Bas Nieuwenhuizen 0bea0851aa radv: Split out db_shader_control computation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:18 +01:00
Bas Nieuwenhuizen 5dce47ae6d radv: Compute shader_z_format when emitting it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:13 +01:00
Bas Nieuwenhuizen df2e7ab0db radv: Merge depth stencil state with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:06 +01:00
Bas Nieuwenhuizen d5a0af84ec radv: Merge ps_input_cntl computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:01:01 +01:00
Bas Nieuwenhuizen e2bf18030d radv: Merge vtx_reuse_depth computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:55 +01:00
Bas Nieuwenhuizen c80747b32c radv: Merge vs state computation with PM4 generation.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:50 +01:00
Bas Nieuwenhuizen c4191cf944 radv: Merge binning state generation with pm4 emission.
We don't need the pipeline state struct anymore.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:45 +01:00
Bas Nieuwenhuizen 6f1a3f081e radv: Constify some pipeline helpers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:40 +01:00
Bas Nieuwenhuizen f0c9ef410a radv: Add PM4 pregeneration for compute pipelines.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:34 +01:00
Bas Nieuwenhuizen beeab44190 radv: Record a PM4 sequence for graphics pipeline switches.
This gives about 2% performance improvement on dota2 for me.

This is mostly a mechanical copy and replacement, but at bind time
we still do:

1) Some stuff that is only based on num_samples changes.
2) Some command buffer state setting.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:22 +01:00
Bas Nieuwenhuizen 7c366bc152 radv: Determine unneeded dynamic states.
Which avoids setting or emitting them.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-01-30 22:00:17 +01:00
Andres Rodriguez 0a89784bcc mesa: check for invalid index on UUID glGet queries
This fixes the piglit test:
spec/ext_semaphore/api-errors/usigned-byte-i-v-bad-value

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 566ed727a4 mesa: fix glGet for ext_external_objects parameters
This allows the client to actually query the enums specified in the
ext_external_objects spec.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 0ebd3cc863 mesa: fix error codes for importing memory/semaphore FDs
This fixes the following piglit tests:
spec/ext_semaphore_fd/api-errors/import-semaphore-fd-bad-enum
spec/ext_memory_object_fd/api-errors/import-memory-fd-bad-enum

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 50b06cbc10 radeonsi: fix fence_server_sync() holding up extra work v2
When calling si_fence_server_sync(), the wait operation is associated
with the next kernel submission. Therefore, any unflushed work
submitted previous to fence_server_sync() will also be affected by
the wait.

To avoid adding the dependency to the unflushed work, we flush before
emitting the fence dependency.

v2: s/semaphore/fence

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez e0f16ee666 radeonsi: implement semaphore_server_signal v2
Syncobj based waits or signals only happen at submission boundaries. In
order to guarantee that the requested signal event will occur when the
state tracker requested it, we must issue a flush.

v2: s/fence/semaphore for pipe objects

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 5b07b06d6b radeonsi: add support for importing PIPE_FD_TYPE_SYNCOBJ semaphores
Hook up importing semaphores of type PIPE_FD_TYPE_SYNCOBJ

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez cc9762d74d winsys/amdgpu: add support for syncobj signaling v3
Add the ability to signal a syncobj when a cs completes execution.

v2: corresponding changes for gallium fence->semaphore rename
v3: s/semaphore/fence for pipe objects

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 29b9bd0539 mesa/st: add support for semaphore object signal/wait v4
Bits to implement ServerWaitSemaphoreObject/ServerSignalSemaphoreObject

v2:
  - corresponding changes for gallium fence->semaphore rename
  - flushing moved to mesa/main

v3: s/semaphore/fence for pipe objects
v4: add bitmap flushing

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 89b52891fd mesa: add support for semaphore object signal/wait v3
Memory synchronization is left for a future patch.

v2: flush vertices/bitmaps moved to mesa/main
v3: removed spaces before/after braces

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 260f7fcc46 mesa: add semaphore parameter stub v2
EXT_semaphore and EXT_semaphore_fd define no pnames. Therefore there
isn't much to do besides determining the correct error code.

v2: removed useless return

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 382067f065 mesa/st: add support for semaphore object create/import/delete v3
Add basic semaphore object operations.

v2: s/semaphore/fence for pipe objects
v3: added missing license headers

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00
Andres Rodriguez 67d5d08682 mesa: add support for semaphore object creation/import/delete v3
Used by EXT_semmaphore and EXT_semaphore_fd

v2: Removed unnecessary dummy callback initialization
v3: Fixed attempting to free the DummySemaphoreObject

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-30 15:13:49 -05:00