Commit Graph

108830 Commits

Author SHA1 Message Date
Mathias Fröhlich 904a0552aa st/mesa: Invalidate the gallium array atom only if needed.
Now that the buffer object usage history tracks if it is
being used as vertex buffer object, we can restrict setting
the ST_NEW_VERTEX_ARRAYS bit to dirty on glBufferData calls to
buffers that are potentially used as vertex buffer object.
Also put a note that the same could be done for index arrays
used in indexed draws.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-03-04 17:03:06 +01:00
Mathias Fröhlich e727f8c8b8 mesa: Track buffer object use also for VAO usage.
We already track the usage history for buffer objects
in a lot of aspects. Add GL_ARRAY_BUFFER and
GL_ELEMENT_ARRAY_BUFFER to gl_buffer_object::UsageHistory.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-03-04 17:03:06 +01:00
Samuel Pitoiset 9e787904d0 rav: use 32_AR instead of 32_ABGR when alpha coverage is required
This export format is faster. Seems to improve performance in
Wreckfest.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-04 12:02:01 +01:00
Alyssa Rosenzweig 72981c92ce panfrost: List primitive restart enable bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-04 05:04:14 +00:00
Alyssa Rosenzweig 2b5cda137f panfrost/midgard: Preview for data hazards
If a selected unit causes a data hazard, the whole block gets cut short.
So, we preview for data hazards _while_ selecting units.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
2019-03-04 05:03:48 +00:00
Alyssa Rosenzweig 93eeba623b panfrost/midgard: Promote smul to vmul
smul comes first in the pipeline, before vmul. Until we have a full
instruction scheduler, it's better to have vmul prioritized to maximize
bundle size.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
2019-03-04 05:02:58 +00:00
Alyssa Rosenzweig 25bbb44dce panfrost: Flush with offscreen rendering
This special-case was needlessly added and breaks purely offscreen
rendering (when there is no scanout involved)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
2019-03-04 05:01:45 +00:00
Alyssa Rosenzweig 4f7460297b panfrost/midgard: Don't force constant on VLUT
Previously, we forced a #0 inline constant tacked on for the lut
instructions to mirror the blob's behaviour, which caused some
suboptimal codegen due to our constant inlining implementation. Instead,
just don't force a constant at all.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
2019-03-04 04:59:58 +00:00
Alyssa Rosenzweig c351cc4e94 panfrost: Cleanup cruft related to clears
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-03-04 04:59:12 +00:00
Alyssa Rosenzweig 40ffee4448 panfrost: Decouple Gallium clear from FBD clear
The operations of gallium->clear() and the hardware callbacks are
fundamentally independent. This routine decouples them by routing shared
information via panfrost_job, allowing the hardware half to be deferred
to the fragment job generation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-03-04 04:58:55 +00:00
Alyssa Rosenzweig 59c9623d0a panfrost: Import job data structures from v3d
At the moment, Panfrost state is ad hoc, which creates issues for FBOs.
This commit imports the skeleton of the v3d_job structure as
panfrost_job, in preparation for refactors to organize this state.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-03-04 04:58:15 +00:00
Ilia Mirkin 4eec3a2a36 glsl: fix recording of variables for XFB in TCS shaders
This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.

Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage

v2: Add comment to the new program_resource_visitor::process function.
    (Ilia Mirkin)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo bf1f49482d glsl: TCS outputs can not be transform feedback candidates on GLES
Avoids regression on:

KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

that is uncovered by the following patch.

"glsl: fix recording of variables for XFB in TCS shaders"

v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
    shaders" to avoid temporal regressions. (Illia Mirkin)

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo cc7173b438 glsl: fix typos in comments "transfor" -> "transform"
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-04 01:55:00 +01:00
Gert Wollny 3214f20914 mesa: Expose EXT_texture_query_lod and add support for its use shaders
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.

v2: Set ES 3.0 as minimum GLES version as required by the extension

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-03 21:50:42 +01:00
Greg V 7dc2f47882 util: emulate futex on FreeBSD using umtx
Obtained from: FreeBSD ports
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-03 19:48:49 +00:00
Rob Clark 00f838fa73 freedreno/ir3: track register pressure in sched
Not a perfect solution, and the "pressure" target is hard-coded.  But it
doesn't really seem to much in the common case, and avoids exploding
register usage in dEQP ssbo tests.

So this should serve as a stop-gap solution until I have time to re-
write the scheduler.

Hurts slightly in instruction count, but gains (reduces) slightly the
register usage in shader-db.  Fixes ~150 dEQP-GLES31.functional.ssbo.*
that were failing due to RA fail.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-03 13:27:50 -05:00
Rob Clark 8a5f2d9444 freedreno/ir3: add Sethi–Ullman numbering pass
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-03 13:27:50 -05:00
Rob Clark c8e351ee3a freedreno/ir3: include nopN in expanded instruction count
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-03 13:27:50 -05:00
Dave Airlie cb4e3e3ef6
st/mesa: add support for lowering fp64/int64 for nir drivers
This might enough for iris and possible r600 (when it gets NIR)

This appears to work for iris.

v2:
 * change cap return so DOUBLES == 2 means sw emu

v3:
 * Refactor using int64/doubles lowering options which were added
   into nir options
 * Remove DOUBLES == 2 added in v2

[jordan: Remove "2" value on PIPE_CAP_DOUBLES]
[jordan: Use lowering options added to nir options]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Jordan Justen 7de056e1a9
scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Jordan Justen 10c5579921
intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Jordan Justen 31b35916dd
nir: Add int64/doubles options into nir_shader_compiler_options
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:41 -08:00
Ian Romanick bae0c36751 nir/algebraic: Optimize away an fsat of a b2f
The b2f can only produce 0.0 or 1.0, so the fsat does nothing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-02 13:58:56 -08:00
Ian Romanick d1d56f5f9a intel/fs: Don't assert on b2f with a saturate modifier
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d60938 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
2019-03-02 13:58:50 -08:00
Lionel Landwerlin 32ffd90002 anv: add support for INTEL_DEBUG=bat
As requested by Ken ;)

v2: Also decode simple batches (Caio)
    Fix u_vector usage issues (Lionel)

v3: Make binding/instruction/state/surface available (Lionel)

v4: Going through device pools for simple batches (Lionel)
    Centralize search BO callbacks into anv_device.c (Lionel)

v5: Clear decoded batch buffer var after use (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-02 12:53:21 +00:00
Eric Anholt f1122f78b7 v3d: Fix build of NEON code with Mesa's cflags not targeting NEON.
v3d may be built as part of a set of drivers in a system not requiring
NEON, but we know V3D devices will be paired with CPUs with NEON so we
should be able to use this asm.

Fixes: 0c05198d6b ("v3d: Always enable the NEON utile load/store code.")
2019-03-01 14:21:49 -08:00
Matt Turner e0148bbcfd
intel/compiler: Add commas on final values of compaction table arrays
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-03-01 13:56:25 -08:00
Ian Romanick ecc9ffa778 nir/algebraic: Replace a-fract(a) with floor(a)
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.

In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different.

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.

total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-03-01 12:43:25 -08:00
Ian Romanick 1edf67fc3f intel/fs: Generate if instructions with inverted conditions
Per-platform results were all over the place, so I have included all the
results here.  There is an important note at the bottom of the commit
message.

Skylake
total instructions in shared programs: 15184683 -> 15184679 (<.01%)
instructions in affected programs: 2786 -> 2782 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 370961367 -> 370961173 (<.01%)
cycles in affected programs: 205867 -> 205673 (-0.09%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16
helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -93.01 28.34
95% mean confidence interval for cycles %-change: -0.82% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 15465366 -> 15465362 (<.01%)
instructions in affected programs: 2799 -> 2795 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 410938419 -> 410938531 (<.01%)
cycles in affected programs: 566028 -> 566140 (0.02%)
helped: 18
HURT: 17
helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1
helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01%
HURT stats (abs)   min: 1 max: 12 x̄: 10.29 x̃: 12
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09%
95% mean confidence interval for cycles value: 0.31 6.09
95% mean confidence interval for cycles %-change: -0.10% 0.05%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13749760 -> 13749759 (<.01%)
instructions in affected programs: 2241 -> 2240 (-0.04%)
helped: 1
HURT: 0

total cycles in shared programs: 385398913 -> 385398363 (<.01%)
cycles in affected programs: 554914 -> 554364 (-0.10%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6
helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -45.88 11.51
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 180663626 -> 180663881 (<.01%)
cycles in affected programs: 472350 -> 472605 (0.05%)
helped: 15
HURT: 30
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 8 max: 10 x̄: 9.00 x̃: 9
HURT stats (rel)   min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: 4.21 7.12
95% mean confidence interval for cycles %-change: 0.05% 0.08%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154568664 -> 154569225 (<.01%)
cycles in affected programs: 356486 -> 357047 (0.16%)
helped: 1
HURT: 31
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs)   min: 4 max: 33 x̄: 18.16 x̃: 8
HURT stats (rel)   min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for cycles value: 12.19 22.87
95% mean confidence interval for cycles %-change: 0.10% 0.16%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8206589 -> 8206565 (<.01%)
instructions in affected programs: 3024 -> 3000 (-0.79%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.82% -0.77%
Instructions are helped.

total cycles in shared programs: 187657428 -> 187656228 (<.01%)
cycles in affected programs: 95748 -> 94548 (-1.25%)
helped: 12
HURT: 0
helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100
helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21%
95% mean confidence interval for cycles value: -113.27 -86.73
95% mean confidence interval for cycles %-change: -1.43% -1.11%
Cycles are helped.

GM45
total instructions in shared programs: 5037569 -> 5037557 (<.01%)
instructions in affected programs: 1521 -> 1509 (-0.79%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.83% -0.75%
Instructions are helped.

total cycles in shared programs: 128101478 -> 128100758 (<.01%)
cycles in affected programs: 52746 -> 52026 (-1.37%)
helped: 6
HURT: 0
helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120
helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41%
95% mean confidence interval for cycles value: -120.00 -120.00
95% mean confidence interval for cycles %-change: -1.70% -1.12%
Cycles are helped.

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f
with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071022 -> 15089710 (0.12%)
instructions in affected programs: 1022219 -> 1040907 (1.83%)
helped: 1
HURT: 3937
helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41
helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01%
HURT stats (abs)   min: 1 max: 256 x̄: 4.76 x̃: 4
HURT stats (rel)   min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60%
95% mean confidence interval for instructions value: 4.56 4.93
95% mean confidence interval for instructions %-change: 2.54% 2.64%
Instructions are HURT.

total cycles in shared programs: 369777134 -> 370092923 (0.09%)
cycles in affected programs: 17516573 -> 17832362 (1.80%)
helped: 115
HURT: 3624
helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28
helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65%
HURT stats (abs)   min: 1 max: 12640 x̄: 89.71 x̃: 54
HURT stats (rel)   min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52%
95% mean confidence interval for cycles value: 75.21 93.71
95% mean confidence interval for cycles %-change: 4.43% 4.64%
Cycles are HURT.

total spills in shared programs: 9450 -> 9442 (-0.08%)
spills in affected programs: 166 -> 158 (-4.82%)
helped: 2
HURT: 0

total fills in shared programs: 21115 -> 21094 (-0.10%)
fills in affected programs: 438 -> 417 (-4.79%)
helped: 2
HURT: 0

LOST:   1
GAINED: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick d40640efe8 nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))
I have not investigated the result of doing this during code
generation.  That should be possible, but it would be a bit more
effort.

All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2

total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2

total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs)   min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel)   min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 7725d60938 intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))
Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a))
maps -1 => 0.0 and 0 => 1.0.  This is equivalent to 1.0 +
float(boolBitsToInt(a)).  On Intel GPUs, ADD is one of the few
instructions that can type-convert during write to destination, so we
can achieve this in a single instruction:

    add    g47F, g26D, 1D

v2: Fix swizzles.

v3: Fix typos in comments.  Noticed by Ken.

All Gen6+ platforms had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185583 -> 15184683 (<.01%)
instructions in affected programs: 239389 -> 238489 (-0.38%)
helped: 899
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for instructions value: -1.01 -0.99
95% mean confidence interval for instructions %-change: -0.51% -0.48%
Instructions are helped.

total cycles in shared programs: 370964249 -> 370961508 (<.01%)
cycles in affected programs: 1487586 -> 1484845 (-0.18%)
helped: 420
HURT: 268
helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6
helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41%
HURT stats (abs)   min: 1 max: 230 x̄: 24.90 x̃: 10
HURT stats (rel)   min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52%
95% mean confidence interval for cycles value: -7.61 -0.36
95% mean confidence interval for cycles %-change: -0.44% -0.02%
Cycles are helped.

No changes on Iron Lake or GM45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick cb3e21cd19 intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+
Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of
operands is free on Gen8+.

v2: Fix swizzles.  Fix types for cmod propagation.

v3: Simplify logic for inverting source of inot(ixor(a, b)).  Suggested
by Ken.

Skylake and Broadwell had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185593 -> 15185583 (<.01%)
instructions in affected programs: 5673 -> 5663 (-0.18%)
helped: 12
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -1.66 0.13
95% mean confidence interval for instructions %-change: -2.60% -0.15%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 370977726 -> 370964249 (<.01%)
cycles in affected programs: 869987 -> 856510 (-1.55%)
helped: 15
HURT: 2
helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16
helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53%
HURT stats (abs)   min: 14 max: 42 x̄: 28.00 x̃: 28
HURT stats (rel)   min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13%
95% mean confidence interval for cycles value: -1654.87 69.34
95% mean confidence interval for cycles %-change: -2.29% -0.23%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 8eb36c9129 intel/fs: Emit logical-not of operands on Gen8+
On Gen8+ specifying negation of a logical operation such as AND actually
performs a logical-not.  Take advantage of this to generate fewer
instructions.

v2: Major rebase.  Use nir_src_as_alu_instr.  Fix swizzle handling.

No changes on any pre-Gen8 platform.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15466902 -> 15466274 (<.01%)
instructions in affected programs: 1262953 -> 1262325 (-0.05%)
helped: 682
HURT: 4
helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04%
HURT stats (abs)   min: 1 max: 62 x̄: 17.50 x̃: 3
HURT stats (rel)   min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10%
95% mean confidence interval for instructions value: -1.10 -0.73
95% mean confidence interval for instructions %-change: -0.19% -0.15%
Instructions are helped.

total cycles in shared programs: 410996093 -> 410950440 (-0.01%)
cycles in affected programs: 144389048 -> 144343395 (-0.03%)
helped: 519
HURT: 51
helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140
helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03%
HURT stats (abs)   min: 1 max: 4060 x̄: 167.90 x̃: 22
HURT stats (rel)   min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25%
95% mean confidence interval for cycles value: -97.16 -63.02
95% mean confidence interval for cycles %-change: -0.32% -0.13%
Cycles are helped.

total spills in shared programs: 95311 -> 95329 (0.02%)
spills in affected programs: 881 -> 899 (2.04%)
helped: 0
HURT: 4

total fills in shared programs: 93629 -> 93634 (<.01%)
fills in affected programs: 794 -> 799 (0.63%)
helped: 1
HURT: 2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick 06eaaf2de9 intel/fs: Refactor ALU source and destination handling to a separate function
Other places will need to do this soon to properly handle source
swizzles.  The patch looks a little odd, but the change is pretty
straight forward.  All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.

I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive.  I am open to
suggestions of alternatives.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick fb3ca9109c intel/fs: Handle OR source modifiers in algebraic optimization
Found by inspection.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick c9d5bd050c intel/fs: Relax type matching rules in cmod propagation from MOV instructions
To allow cmod propagation from a MOV in a sequence like:

    and(16)         g31<1>UD       g20<8,8,1>UD   g22<8,8,1>UD
    mov.nz.f0(16)   null<1>F       g31<8,8,1>D

A similar change to the vec4 backend had no effect.

Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.

A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick eae19f5f19 nir/algebraic: Replace i2b used by bcsel or if-statement with comparison
All of the helped shaders are in Deus Ex.  I looked at a couple shaders,
and they have a pattern like:

    vec1 32 ssa_373 = i2b32 ssa_345.w
    vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0
    ...
    vec1 32 ssa_377 = ine ssa_345.w, ssa_0
    if ssa_377 {
        ...
        vec1 32 ssa_416 = i2b32 ssa_385.w
        vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374
        ...
    }

The massive help occurs because the i2b32 is removed, then other passes
determine that ssa_374 must be ssa_20 inside the if-statement allowing
the first bcsel to also be deleted.

v2: Rebase on 1-bit Boolean changes.

v3: Fix i2b32 vs ine problem in if-statement replacement.  Noticed by
Bas.

Skylake
total instructions in shared programs: 15241394 -> 15186287 (-0.36%)
instructions in affected programs: 890583 -> 835476 (-6.19%)
helped: 355
HURT: 0
helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149
helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59%
95% mean confidence interval for instructions value: -165.07 -145.39
95% mean confidence interval for instructions %-change: -6.42% -5.77%
Instructions are helped.

total cycles in shared programs: 373846583 -> 371023357 (-0.76%)
cycles in affected programs: 118972102 -> 116148876 (-2.37%)
helped: 343
HURT: 14
helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089
helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77%
HURT stats (abs)   min: 120 max: 4126 x̄: 2482.79 x̃: 3019
HURT stats (rel)   min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11%
95% mean confidence interval for cycles value: -8723.28 -7093.12
95% mean confidence interval for cycles %-change: -2.57% -2.02%
Cycles are helped.

total spills in shared programs: 32401 -> 23465 (-27.58%)
spills in affected programs: 24457 -> 15521 (-36.54%)
helped: 343
HURT: 0

total fills in shared programs: 37866 -> 31765 (-16.11%)
fills in affected programs: 18889 -> 12788 (-32.30%)
helped: 343
HURT: 0

Broadwell and Haswell had similar results. (Haswell shown)
Haswell
total instructions in shared programs: 13764783 -> 13750679 (-0.10%)
instructions in affected programs: 1176256 -> 1162152 (-1.20%)
helped: 334
HURT: 21
helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47
helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37%
HURT stats (abs)   min: 1 max: 61 x̄: 5.76 x̃: 1
HURT stats (rel)   min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03%
95% mean confidence interval for instructions value: -43.99 -35.47
95% mean confidence interval for instructions %-change: -1.35% -1.08%
Instructions are helped.

total cycles in shared programs: 386511910 -> 385402528 (-0.29%)
cycles in affected programs: 143831110 -> 142721728 (-0.77%)
helped: 327
HURT: 39
helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570
helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96%
HURT stats (abs)   min: 16 max: 4881 x̄: 1065.95 x̃: 997
HURT stats (rel)   min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24%
95% mean confidence interval for cycles value: -3375.59 -2686.60
95% mean confidence interval for cycles %-change: -0.92% -0.64%
Cycles are helped.

total spills in shared programs: 100480 -> 97846 (-2.62%)
spills in affected programs: 84702 -> 82068 (-3.11%)
helped: 316
HURT: 21

total fills in shared programs: 96877 -> 94369 (-2.59%)
fills in affected programs: 69167 -> 66659 (-3.63%)
helped: 316
HURT: 9

No changes on Ivy Bridge or earlier platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:42:14 -08:00
Ian Romanick d2056ab993 intel/vec4: Emit constants for some ALU sources as immediate values
In some cases of flow control, the constant propagation is not able to
determine that the source of an instruction must be a constant value.
When we still have NIR SSA values, we can easily determine this.  Emit
the immediate value during code generation to possible avoid spurious
loads of constants into registers.

I wrote this patch to prevent a couple trivial regressions in vec4
shaders caused by "nir/algebraic: Replace i2b used by bcsel or
if-statement with comparison".  The final result was quite a bit better
than that...

No shader-db changes on any Gen8+ platform.

v2: Assert that we never get a negation source modifier on Gen8+.
Suggested by Ken.  This should never happen because we don't normally
use vec4 for Gen8+ (requires and environment variable to force it), and
there's no code to generate these negations.  Still, erring on the side
of caution is better.

Haswell
total instructions in shared programs: 13776218 -> 13764783 (-0.08%)
instructions in affected programs: 663931 -> 652496 (-1.72%)
helped: 3495
HURT: 1
helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49%
HURT stats (abs)   min: 24 max: 24 x̄: 24.00 x̃: 24
HURT stats (rel)   min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24%
95% mean confidence interval for instructions value: -3.39 -3.15
95% mean confidence interval for instructions %-change: -1.84% -1.75%
Instructions are helped.

total cycles in shared programs: 386818984 -> 386511910 (-0.08%)
cycles in affected programs: 20379636 -> 20072562 (-1.51%)
helped: 3052
HURT: 476
helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6
helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69%
HURT stats (abs)   min: 2 max: 416 x̄: 62.76 x̃: 24
HURT stats (rel)   min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18%
95% mean confidence interval for cycles value: -115.57 -58.51
95% mean confidence interval for cycles %-change: -0.93% -0.73%
Cycles are helped.

total spills in shared programs: 100482 -> 100480 (<.01%)
spills in affected programs: 79 -> 77 (-2.53%)
helped: 3
HURT: 1

total fills in shared programs: 96883 -> 96877 (<.01%)
fills in affected programs: 85 -> 79 (-7.06%)
helped: 4
HURT: 0

Ivy Bridge
total instructions in shared programs: 12000562 -> 11990113 (-0.09%)
instructions in affected programs: 572581 -> 562132 (-1.82%)
helped: 3106
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49%
95% mean confidence interval for instructions value: -3.49 -3.23
95% mean confidence interval for instructions %-change: -1.91% -1.81%
Instructions are helped.

total cycles in shared programs: 180958504 -> 180664500 (-0.16%)
cycles in affected programs: 19991810 -> 19697806 (-1.47%)
helped: 2654
HURT: 486
helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6
helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68%
HURT stats (abs)   min: 2 max: 396 x̄: 59.18 x̃: 24
HURT stats (rel)   min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16%
95% mean confidence interval for cycles value: -125.62 -61.64
95% mean confidence interval for cycles %-change: -0.76% -0.56%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10842336 -> 10835438 (-0.06%)
instructions in affected programs: 395340 -> 388442 (-1.74%)
helped: 1926
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2
helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42%
95% mean confidence interval for instructions value: -3.73 -3.43
95% mean confidence interval for instructions %-change: -1.84% -1.72%
Instructions are helped.

total cycles in shared programs: 154590074 -> 154569050 (-0.01%)
cycles in affected programs: 8159932 -> 8138908 (-0.26%)
helped: 1670
HURT: 228
helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6
helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28%
HURT stats (abs)   min: 2 max: 1798 x̄: 40.58 x̃: 14
HURT stats (rel)   min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31%
95% mean confidence interval for cycles value: -13.51 -8.64
95% mean confidence interval for cycles %-change: -0.60% -0.46%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8212357 -> 8206587 (-0.07%)
instructions in affected programs: 323664 -> 317894 (-1.78%)
helped: 1457
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3
helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44%
95% mean confidence interval for instructions value: -4.14 -3.78
95% mean confidence interval for instructions %-change: -1.93% -1.78%
Instructions are helped.

total cycles in shared programs: 187668016 -> 187657422 (<.01%)
cycles in affected programs: 14856234 -> 14845640 (-0.07%)
helped: 1372
HURT: 83
helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6
helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08%
HURT stats (abs)   min: 2 max: 14 x̄: 3.20 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for cycles value: -7.65 -6.91
95% mean confidence interval for cycles %-change: -0.11% -0.10%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-01 12:41:46 -08:00
Eric Engestrom fc82ea1350 Revert "swr/rast: Archrast codegen updates"
This reverts the following commits:
71a76a47cc "swr/codegen: fix autotools build"
7763e664ce "meson/swr: replace hard-coded path with current_build_dir()"
773b3ceaca "swr/rast: Fix autotools and scons codegen"
16e10b8c30 "swr/rast: Add general SWTag statistics"
b45a15a39f "swr/rast: Add string handling to AR event framework"
8608a747aa "swr/rast: Add initial SWTag proto definitions"
93cd9905c8 "swr/rast: Cleanup and generalize gen_archrast"

The last one in this list broke all the build systems that can build
this (meson, autotools & scons).

See MR !304 for more details:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/304

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-01 16:46:32 +00:00
Fritz Koenig 12af6b30a3 freedreno/a6xx: Enable UBWC modifier
Adding the supported_modifiers allows buffers
to be created with UBWC
2019-03-01 15:51:16 +00:00
Fritz Koenig 4715e7a98a freedreno: UBWC allocator
UBWC requires space for a metadata or flag buffer
that contains compression data. Each 16x4 tile of image
data corresponds to a byte of compression data.

This buffer needs to be stored before (at a lower address)
the image buffer in order to match up with what the
display driver. This allows the display driver to directly
scan-out at UBWC buffer.
2019-03-01 15:51:16 +00:00
Fritz Koenig 3e6758a4e7 freedreno/a6xx: UBWC support
Universal bandwidth compression(UBWC) reduces memory bandwidth
by compressing buffers. This compression takes the form of
a full sized image buffer as well as a smaller metadata buffer.
2019-03-01 15:51:16 +00:00
Fritz Koenig 41082446db freedreno: pass count to query_dmabuf_modifiers
query_dmabuf_modifiers needs to know the max number
of modifiers that the list will hold.
2019-03-01 15:51:16 +00:00
Eric Engestrom 2793417ec6 anv: fix typo
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom 258e463db5 anv: remove spaces around kwargs assignment
pylint complains:
> C0326: No space allowed around keyword argument assignment

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom 7b704fd2fd anv: drop unused parameter
I'm guessing a previous version of this script used an index-based map
of entrypoints, but that's not the case anymore.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Eric Engestrom b503d4e458 anv: simplify chained comparison
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-01 11:20:28 +00:00
Caio Marcelo de Oliveira Filho 1458aa1f78 nir/copy_prop_vars: handle indirect vector elements
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load.  Such SSA
values don't help loading of the indirect unless we emit an if-ladder.

Copy_derefs are supported for indirects.

Also enable two tests that now pass.

v2: Remove unnecessary temporaries.  Be clearer when identifying the
    case where copy_entry doesn't help when we are dealing with an
    indirect array_deref (of a vector).  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho 6c0de78cc2 nir/copy_prop_vars: prefer using entries from equal derefs
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.

This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 23:55:31 -08:00