Commit Graph

111105 Commits

Author SHA1 Message Date
Mike Blumenkrantz ddd716e746 iris: support dmabuf imports with offsets
this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264

fixes kwg/mesa#47

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-07 13:36:08 -07:00
Roland Scheidegger 748f603390 gallivm: fix broken 8-wide s3tc decoding
Brian noticed there was an uninitialized var for the 8-wide case and 128
bit blocks, which made it always crash. Likewise, the 64bit block case
had another crash bug due to type mismatch.
Color decode (used for all s3tc formats) also had a bogus shuffle for
this case, leading to decode artifacts.
Fix these all up, which makes the code actually work 8-wide. Note that
it's still not used - I've verified it works, and the generated assembly
does look quite a bit simpler actually (20-30% less instructions for the
s3tc decode part with avx2), however in practice it still seems to be
sligthly slower for some unknown reason (tested with openarena) on my
haswell box, so for now continue to split things into 4-wide vectors
before decoding.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2019-05-07 18:59:38 +02:00
Juan A. Suarez Romero 92dba1c66e docs: Add relnotes stub for 19.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-05-07 16:07:29 +00:00
Juan A. Suarez Romero 14a7959cfa Bump version for 19.1 branch
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-05-07 16:02:34 +00:00
Vasily Khoruzhick 6b46399e2f lima: enable sin and cos lowering for GP
GP doesn't support sin/cos natively, so we have to lower them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Vasily Khoruzhick e67e4e90b2 nir: implement lowering for fsin and fcos
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Rob Clark b15c46e6bf freedreno/ir3: move const_state to ir3_shader
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 5690f83bb5 freedreno/ir3: split out const_state setup
Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 9403184ddd freedreno/ir3: move immediates to const_state
They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 23e7a34466 freedreno/ir3: consolidate const state
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark ef3eecd66b freedreno/ir3: move ir3_pointer_size()
Move to ir3_compiler so it doesn't depend on the compile context.  Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Lionel Landwerlin 2d2927938f vulkan/overlay-layer: fix cast errors
Not quite sure what version of GCC/Clang produces errors (8.3.0
locally was fine).

v2: also fix an integer literal issue (Karol)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-07 10:45:45 +01:00
Samuel Iglesias Gonsálvez bc66cebc0d anv: fix alphaToCoverage when there is no color attachment
There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.

v2:
  - We already create a null rt when we don't have any, so reuse that
    for this case (Jason)
  - Simplify the code a bit (Iago)

v3:
  - Take alpha to coverage from the key and don't tie this to depth-only
    rendering only, we want the same behavior if we have multiple render
    targets but the one at location 0 is not used. (Jason).
  - Rewrite commit message (Iago)

v4:
  - Make sure we take into account the array length of the shader outputs,
    which we were no handling correctly either and make sure we also
    create null render targets for any invalid array entries too.

v5:
  - Simplify removal of unused outputs by using rt_used[] so we don't have
    to special case alpha to coverage there too.

Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 09:35:47 +02:00
Ian Romanick c866500525 intel/compiler: Don't always require precise lowering of flrp
No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick ab86926156 nir/algebraic: Reassociate open-coded flrp(1, b, c)
In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8172020 -> 8164367 (-0.09%)
instructions in affected programs: 1089851 -> 1082198 (-0.70%)
helped: 3285
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.09%
Instructions are helped.

total cycles in shared programs: 188758338 -> 188719974 (-0.02%)
cycles in affected programs: 20004922 -> 19966558 (-0.19%)
helped: 3012
HURT: 477
helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.27 x̃: 4
HURT stats (rel)   min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.38 -10.62
95% mean confidence interval for cycles %-change: -0.46% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick c995d1ca3a nir/flrp: Lower flrp(a, b, #c) differently
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick ae02622d8f nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists
There is little effect on Intel GPUs now because we almost always take
the "always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any other Intel platforms.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 188852500 -> 188852484 (<.01%)
cycles in affected programs: 14612 -> 14596 (-0.11%)
helped: 4
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11%
95% mean confidence interval for cycles value: -4.00 -4.00
95% mean confidence interval for cycles %-change: -0.13% -0.09%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick 6698d861a5 nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists
This doesn't help on Intel GPUs now because we always take the
"always_precise" path first.  It may help on other GPUs, and it does
prevent a bunch of regressions in "intel/compiler: Don't always require
precise lowering of flrp".

No changes on any Intel platform.  Before a number of large rebases this
helped cycles in a couple shaders on Iron Lake and GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick 5b908db604 nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently
No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8189888 -> 8153912 (-0.44%)
instructions in affected programs: 1199037 -> 1163061 (-3.00%)
helped: 4124
HURT: 10
helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9
helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06%
95% mean confidence interval for instructions value: -8.84 -8.56
95% mean confidence interval for instructions %-change: -5.12% -4.77%
Instructions are helped.

total cycles in shared programs: 188606710 -> 188426964 (-0.10%)
cycles in affected programs: 27505596 -> 27325850 (-0.65%)
helped: 4026
HURT: 77
helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46
helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85%
HURT stats (abs)   min: 2 max: 376 x̄: 17.79 x̃: 6
HURT stats (rel)   min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04%
95% mean confidence interval for cycles value: -44.75 -42.87
95% mean confidence interval for cycles %-change: -2.44% -2.17%
Cycles are helped.

LOST:   3
GAINED: 35

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick 23c5501b77 nir/flrp: Lower flrp(#a, #b, c) differently
If the magnitudes of #a and #b are such that (b-a) won't lose too much
precision, lower as a+c(b-a).

No changes on any other Intel platforms.

v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of
nir_opt_algebraic")

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8192503 -> 8192383 (<.01%)
instructions in affected programs: 18417 -> 18297 (-0.65%)
helped: 68
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43%
95% mean confidence interval for instructions value: -2.48 -1.05
95% mean confidence interval for instructions %-change: -1.56% -0.63%
Instructions are helped.

total cycles in shared programs: 188662536 -> 188661956 (<.01%)
cycles in affected programs: 744476 -> 743896 (-0.08%)
helped: 62
HURT: 0
helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6
helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06%
95% mean confidence interval for cycles value: -12.37 -6.34
95% mean confidence interval for cycles %-change: -0.48% -0.06%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick dd7135d55d intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5
Previously lower_flrp32 was only set for vertex shaders.  Fragment
shaders performed a(1-c)+bc lowering during code generation.

The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
text-identical fragment shader.

v2: Rebase on 26391cceaa ("intel/compiler: Lower ffma on Gen4 and
Gen5").

v3: Rebase on a004e95dd7 ("radeonsi/nir: create si_nir_opts() helper")

Iron Lake
total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
instructions in affected programs: 2503898 -> 2478487 (-1.01%)
helped: 9936
HURT: 921
helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
95% mean confidence interval for instructions value: -2.43 -2.25
95% mean confidence interval for instructions %-change: -1.41% -1.33%
Instructions are helped.

total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
cycles in affected programs: 71541604 -> 71419616 (-0.17%)
helped: 11649
HURT: 1871
helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
95% mean confidence interval for cycles value: -9.42 -8.63
95% mean confidence interval for cycles %-change: -0.54% -0.50%
Cycles are helped.

total loops in shared programs: 852 -> 856 (0.47%)
loops in affected programs: 0 -> 4
helped: 0
HURT: 4
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for loops value: 1.00 1.00
95% mean confidence interval for loops %-change: 0.00% 0.00%
Loops are HURT.

LOST:   3
GAINED: 12

GM45
total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
instructions in affected programs: 1303584 -> 1290871 (-0.98%)
helped: 5010
HURT: 464
helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
95% mean confidence interval for instructions value: -2.45 -2.20
95% mean confidence interval for instructions %-change: -1.40% -1.28%
Instructions are helped.

total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
cycles in affected programs: 44845402 -> 44768292 (-0.17%)
helped: 6079
HURT: 940
helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
95% mean confidence interval for cycles value: -11.63 -10.34
95% mean confidence interval for cycles %-change: -0.58% -0.52%
Cycles are helped.

total loops in shared programs: 633 -> 635 (0.32%)
loops in affected programs: 0 -> 2
helped: 0
HURT: 2

total spills in shared programs: 60 -> 69 (15.00%)
spills in affected programs: 54 -> 63 (16.67%)
helped: 0
HURT: 1

total fills in shared programs: 92 -> 105 (14.13%)
fills in affected programs: 80 -> 93 (16.25%)
helped: 0
HURT: 1

LOST:   15
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
2019-05-06 22:52:29 -07:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick 158370ed2a nir/flrp: Add new lowering pass for flrp instructions
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923 ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick dc566a033c nir/algebraic: Pull common multiplication out of flrp arguments
All Intel platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342485 -> 15337495 (-0.03%)
instructions in affected programs: 217456 -> 212466 (-2.29%)
helped: 1539
HURT: 1
helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3
helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56%
95% mean confidence interval for instructions value: -3.39 -3.09
95% mean confidence interval for instructions %-change: -3.24% -2.96%
Instructions are helped.

total cycles in shared programs: 355734320 -> 355728237 (<.01%)
cycles in affected programs: 1851555 -> 1845472 (-0.33%)
helped: 835
HURT: 575
helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14
helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81%
HURT stats (abs)   min: 1 max: 322 x̄: 48.40 x̃: 14
HURT stats (rel)   min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43%
95% mean confidence interval for cycles value: -8.50 -0.13
95% mean confidence interval for cycles %-change: 0.48% 1.62%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Ian Romanick a83a6e9690 nir/algebraic: Pull common addition out of flrp arguments
v2: Augment the late optimization patterns with a couple pre-ffma pass
patterns.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15342982 -> 15342485 (<.01%)
instructions in affected programs: 56304 -> 55807 (-0.88%)
helped: 235
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74%
95% mean confidence interval for instructions value: -2.31 -1.92
95% mean confidence interval for instructions %-change: -1.46% -1.09%
Instructions are helped.

total cycles in shared programs: 355734740 -> 355734320 (<.01%)
cycles in affected programs: 1028807 -> 1028387 (-0.04%)
helped: 134
HURT: 104
helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8
helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61%
HURT stats (abs)   min: 1 max: 203 x̄: 29.06 x̃: 8
HURT stats (rel)   min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46%
95% mean confidence interval for cycles value: -8.51 4.98
95% mean confidence interval for cycles %-change: -0.35% 0.39%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10886815 -> 10886390 (<.01%)
instructions in affected programs: 36883 -> 36458 (-1.15%)
helped: 147
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3
helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23%
95% mean confidence interval for instructions value: -3.12 -2.67
95% mean confidence interval for instructions %-change: -1.83% -1.38%
Instructions are helped.

total cycles in shared programs: 154188360 -> 154186902 (<.01%)
cycles in affected programs: 388094 -> 386636 (-0.38%)
helped: 90
HURT: 58
helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15
helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83%
HURT stats (abs)   min: 1 max: 684 x̄: 31.97 x̃: 10
HURT stats (rel)   min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51%
95% mean confidence interval for cycles value: -22.62 2.92
95% mean confidence interval for cycles %-change: -0.68% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8221239 -> 8220357 (-0.01%)
instructions in affected programs: 54560 -> 53678 (-1.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17%
95% mean confidence interval for instructions value: -5.21 -4.28
95% mean confidence interval for instructions %-change: -2.23% -1.72%
Instructions are helped.

total cycles in shared programs: 188654442 -> 188650364 (<.01%)
cycles in affected programs: 1454384 -> 1450306 (-0.28%)
helped: 204
HURT: 0
helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18
helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22%
95% mean confidence interval for cycles value: -22.38 -17.60
95% mean confidence interval for cycles %-change: -0.67% -0.46%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Christian Gmeiner e00fa99b08 glsl_to_nir: drop supports_ints
At initial nir level all drivers are supporting ints.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:59 +02:00
Christian Gmeiner 4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Alyssa Rosenzweig 050b934a24 panfrost: Refactor blend descriptors
This commit does a fairly large cleanup of blend descriptors, although
there should not be any functional changes. In particular, we split
apart the Midgard and Bifrost blend descriptors, since they are
radically different. From there, we can identify that the Midgard
descriptor as previously written was really two render targets'
descriptors stuck together. From this observation, we split the Midgard
descriptor into what a single RT actually needs. This enables us to
correctly dump blending configuration for MRT samples on Midgard. It
also allows the Midgard and Bifrost blend code to peacefully coexist,
with runtime selection rather than a #ifdef. So, as a bonus, this will
help the future Bifrost effort, eliminating one major source of
compile-time architectural divergence.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-07 03:21:08 +00:00
Vasily Khoruzhick d4a249aa09 lima/gpir: enable lowering for ftrunc
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick f4659bea7c lima/gpir: implement nir_op_fmov
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick cf1ab4b96b lima: use int_to_float lowering pass
Neither GP nor PP in Mali4x0 support integers, so utilize new pass
and set native_integers to true for now until this flag is dropped.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Vasily Khoruzhick 443c5a3cd6 nir: add int_to_float lowering pass
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Timothy Arceri 49025292fb radeonsi: add config entry for Counter-Strike Global Offensive
This fixes rendering issues with gun scopes which is rather
important.

Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100239
2019-05-07 09:42:09 +10:00
Vasily Khoruzhick d085920b64 lima/gpir: fix float uniform alignment issue
If PIPE_CAP_PACKED_UNIFORMS is not set uniforms are vec4 aligned,
so lima_nir_lower_uniform_to_scalar should use first channel of vec4
for float uniforms.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-06 14:08:09 -07:00
Erik Faye-Lund d84b85bc28 draw: flush when setting stream-out targets
We need to re-prepare the middle-end state to pick up changes to this
state to react correctly to pausing/resuming stream-out. So let's add a
flush here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: ec8cbd79ac "draw/softpipe: EXT_transform_feedback support (v2)"
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-06 22:42:37 +02:00
Erik Faye-Lund ed53e61bec llvmpipe: pass stream-out targets to draw-module early
We currently set this state in the draw-module twice on each draw, but
which trashes this state. So far that's not a problem, because we don't
really do much from that function.

But it turns out, we're going to have to do more; namely flush when the
state changes. This will incur a large performance penalty due to the
excessive setting.

Instead, let's rely on the CSO caching making sure that
llvmpipe_set_so_targets doesn't get called needlessly, and setup the
state directly there instead.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-05-06 22:42:37 +02:00
Uros Bizjak fc7649c4b7 doc: Update GL_KHR_robustness in features.txt for r600
glxinfo for Cypress XT [Radeon HD 5870] lists GL_KHR_robustness
as supported extension.  This was the last missing extension
for GL 4.5, so Mark GL 4.5 as all DONE for r600.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-05-07 06:21:48 +10:00
Chia-I Wu c7078397ca virgl: do not use inline writes for subdata
Inline writes skip transfer map/unamp at the cost of an extra copy
on the data during execbuffer.  That is generally a win for small
transfers.  But the heuristic to use inline writes based on buffer
sizes rather than transfer sizes makes little sense.  More
importantly, inline writes miss optimizations that are done for
buffer transfers.

Let's just use transfers.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-05-06 10:31:56 -07:00
Chia-I Wu 898be8036d virgl: rework queries
virglrender has been changed such that

 - VIRGL_CCMD_GET_QUERY_RESULT is fenced
 - query buffers (PIPE_BIND_CUSTOM) are coherent

We can check if a query is ready using DRM_IOCTL_VIRTGPU_WAIT, and also
avoid a synchronized transfer to retrieve the query result.  When
running against an older virglrenderer, it falls back to the old
behavior automatically.

TF2 @ 640x480 for pts4.dem went from 17fps to 40fps on my testing
machine.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-06 10:20:40 -07:00
Chia-I Wu b4da53b0c3 virgl: export resource_is_busy from winsys
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-05-06 10:20:38 -07:00
Samuel Pitoiset c10808441c radv: fix rowPitch for R32G32B32 formats on GFX9
The pitch is actually the number of components per row. We found
the problem when we implemented some meta operations for these
formats and the wrong pitch has been confirmed with a small test case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108325
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-06 19:07:44 +02:00
Kenneth Graunke a032a9665f iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
This makes CompressedTexSubImage from a PBO source do proper GPU
rendering to upload instead of stalling to map the PBO source on
the CPU (then copying it on the CPU).

Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this
functionality, and to Jason Ekstrand for writing the code I adapted.
Vulkan only supports a single layer, however, and this code tries to
support multiple layers as long as it's miplevel 0.

Improves performance in Sid Meier's Civilization VI:

   Average frame time (ms):         -3.67423% +/- 1.46201% (n=5)
   99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)
2019-05-06 09:50:32 -07:00
Bas Nieuwenhuizen 8139efbbbd radv: Use given stride for images imported from Android.
Handled similarly as radeonsi. I checked the offsets are actually used.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-05-06 15:36:39 +00:00
Erico Nunes 11602ccd5d lima/ppir: abort compilation in case of unsupported intrinsic
Currently ppir continues compilation when there is an unsupported
intrinsic, resulting in a shader that will surely not work as intended.

This is a problem during piglit runs as some tests don't compile
properly due to this but actually still get submitted to the gpu and
leave the system in an unstable state after executing, causing further
tests to fail.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-05-06 17:15:27 +02:00
Erico Nunes 60a128fe81 lima/ir: print names of unsupported intrinsics
While lima still doesn't support some kinds of intrinsics, it is more
helpful to display the name of the unsupported instr->intrinsic to make
debugging easier.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2019-05-06 17:15:06 +02:00
John Stultz c7f2145b4b mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list
In commit a99c360a46 (nir: add pass to lower fb reads), a new
file was added that needs to also be added to the
Makefile.sources list used by the Android and SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: a99c360a46 ("nir: add pass to lower fb reads")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
John Stultz d04f44a459 mesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset to Makefile.sources
In commit 2f0b9d2249 ("freedreno/ir3: lower
load_barycentric_at_offset") a new file was added that needs to
also be added to the Makefile.sources list used by Android and
SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 2f0b9d2249 ("freedreno/ir3: lower load_barycentric_at_offset")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
John Stultz c935862127 mesa: android: freedreno: Fix build failure due to path change
The ir3_nir_trig.py file was moved in a previous commit,
aa0fed10d3 (freedreno: move ir3 to common location),
so update the Android.gen.mk file to match.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Amit Pundir 88105375c9 mesa: android: freedreno: build libfreedreno_{drm,ir3} static libs
Add libfreedreno_drm/ir3 to the build

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: b4476138d5 ("freedreno: move drm to common location")
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Tweaked to add extra ir3 files from master]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Alistair Strachan 0fda3eac31 mesa: android: Remove unnecessary dependency tracking rules
The current AOSP master build system breaks building mesa due to the
following error:

external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error:
  writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h"

This error is bogus -- nothing "writes" to ir.h -- but the rule is
unnecessary because the generated header that is a dependency of the
non-generated header should be added to LOCAL_GENERATED_SOURCES and this
will track if the dependency needs to be regenerated.

(This change fixes a similar problem affecting nir.h too.)

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
[jstultz: Forward ported and tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:25 +00:00