Commit Graph

106479 Commits

Author SHA1 Message Date
Tapani Pälli aa94e01bfe anv: add from/to helpers with android and vulkan formats
v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason)
v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define
    for now to avoid direct dependency to minigbm headers

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli c1f15a0a1a anv: make anv_get_image_format_features public
This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli 8a469fd335 anv: refactor make_surface to use data from anv_image
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli 2a98e5bbb9 anv: add create_flags as part of anv_image
This will make it possible for next patch to rip
anv_image_create_info out from make_surface function.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Ian Romanick 96c4b135e3 nir/algebraic: Don't put quotes around floating point literals
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value.  The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.

v2: Remove misleading comment about sized literals (suggested by
Timothy).  Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af086 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
2018-12-18 23:28:31 -08:00
Vinson Lee 0f7ba5758b meson: Fix libsensors detection.
Fixes: 5e71efef44 ("meson: Add lmsensors support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-12-18 19:24:01 -08:00
Vinson Lee 84f39e5971 meson: Fix typo.
Fixes: 6b4c7047d5 ("meson: build gallium nine state_tracker")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-12-18 19:14:11 -08:00
Sagar Ghuge 933c44bcc4 nir: Add a new lowering option to lower 3D surfaces from txd to txl.
Tested on gen9.

v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-18 13:44:09 -08:00
Christian Gmeiner 7ea8e54dd6 meson: add etnaviv to the tools option
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-12-18 21:50:58 +01:00
Adam Jackson e36d136102 specs: Bump GLX_MESA_query_renderer to version 9
Note that we have an official GL extension number, pick the appropriate
section of the GLX spec to modify, and add changelog.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2018-12-18 15:46:10 -05:00
Adam Jackson 9e8332ebc2 specs: Remove GLX_RENDERER_ID_MESA from GLX_MESA_query_renderer
This has not even had an attempt at implementation. If you asked for
renderer 0 - which, the spec implies, should always work - then
dri2_convert_glx_attribs would fail, we'd silently fall back to creating
an indirect context, and xserver would also not recognize the attribute
and would throw BadValue at you.

The API would be difficult to use in any case, since there's no way to
enumerate how many renderers the screen has. I'd be tempted to add that
by defining:

    glXQueryRendererIntegerMESA(dpy, screen,
                                /* renderer = */ -1,
                                0, &value);

to return the number of renderers, but a new entrypoint might be
cleaner. Still, better to not specify it at all than to lie about it.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2018-12-18 15:46:10 -05:00
Adam Jackson c63c391756 specs: Remove GLES profile interaction text from GLX_MESA_query_renderer
In one place we say, if GLES isn't supported then the profile version
will be 0.0. Then later we say, if the GLES profile extension isn't
supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not
mentioned in the spec. A strict reading of the latter would mean that
GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token,
and the query should instead return False.

The implementation does not check for the GLES profile extensions, and
the additional complexity doesn't seem worth it. Removing the
interaction text makes the spec match the implementation.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2018-12-18 15:46:10 -05:00
Eduardo Lima Mitev 5820e63418 freedreno/ir3: Make imageStore use num components from image format
emit_intrinsic_store_image() is always using 4 components when
collecting registers for the value. When image has less than
4 components (e.g, r32f, rg32i, etc) this results in extra mov
instructions.

This patch uses the actual number of components from the image format.

For example, in a shader like:

layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
...
void main(void) {
   ...
   imageStore (u_image, some_offset, vec4(1.0));
   ...
}

instruction count is reduced in at least 3 instructions (note image
format is r32f, 1 component only).

This obviously reduces register pressure as well.

v2: - Added support for image formats from NV_image_format extension
    (Ilia Mirkin).
    - Return 4 components by default instead of asserting. (Rob Clark).

v3: Added more missing formats (Ilia Mirkin).

v4: Added a debug message for unknown image formats (Rob Clark).

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2018-12-18 21:15:20 +01:00
Jason Ekstrand 5dad1abfdc nir/dead_write_vars: Get modes directly from derefs
Instead of going all the way back to the variable, just look at the
deref.  The modes are guaranteed to be the same by nir_validate whenever
the variable can be found.  This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand fa40a58fd9 nir/copy_prop_vars: Get modes directly from derefs
Instead of going all the way back to the variable, just look at the
deref.  The modes are guaranteed to be the same by nir_validate whenever
the variable can be found.  This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand cf7fb39805 nir/lower_wpos_center: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand 867fe35a16 nir/lower_io_to_scalar: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand 3fe0363dda nir/lower_io_arrays_to_elements: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand 8cc0f92492 nir/linking_helpers: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs.  This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Jason Ekstrand 8410cf66d7 nir/propagate_invariant: Skip unknown vars
If we can't find the variable from the deref, just assume it isn't
invariant and continue on.  This can happen if, for instance, we're
writing to a deref that points into an SSBO.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-12-18 13:13:28 -06:00
Ian Romanick 29e4b949b4 Revert "nir/lower_indirect: Bail early if modes == 0"
"There's no point in walking the program if we're never going to
    actually lower anything."

Except we might lower compacted local arrays.  In that case, modes will
be 0, but there is still lowering to be done.

This reverts commit 7f75cf2a94.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
2018-12-18 10:47:54 -08:00
Lucas Stach 433ca3127a st/dri: replace format conversion functions with single mapping table
Each time I have to touch the buffer import/export functions in the dri
state tracker I get lost in the maze of functions converting between
DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format.

Rip it out and replace by a single table, which defines the correspondence
between the different representations.

Also this now stores all the known representations in the __DRIimageRec,
to avoid the loss of information we currently have when importing a buffer
with a fourcc, which doesn't have a corresponding dri format.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-18 19:19:45 +01:00
Lucas Stach 67174d40f1 st/dri: allow both render and sampler compatible dma-buf formats
Currently all the EGL APIs are missing a way to specify how an imported
dma-buf is intended to be used. Demanding the format to be both usable
for sampling and rendering artificially restricts the list of formats a
driver is able to import.

Looking at how the Intel driver implements those DRI2 image APIs it
doesn't distinguish between render or sampler compatible formats. So
this patch aligns behavior between Intel and Gallium based drivers.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-18 19:19:40 +01:00
Lucas Stach a3e592e839 etnaviv: use surface format directly
There is no need to do the detour over the resource behind the
surface to get the format. Use the surface format directly.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2018-12-18 19:07:10 +01:00
Dylan Baker 7a90886921 meson: Add toggle for glx-direct
GNU Hurd needs to turn off glx-direct, rather than special case it,
we'll just add a toggle.

CC: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-12-18 09:20:53 -08:00
Dylan Baker 8c77f4c76d meson: Add support for gnu hurd
CC: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-12-18 09:20:49 -08:00
Dylan Baker 6cf5f25bc5 meson: remove duplicate definition
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-12-18 09:18:12 -08:00
Dylan Baker e430a034b9 meson: Fix ppc64 little endian detection
Old versions of meson returned ppc64le as the cpu_family for little
endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't
work in that case. This generalizes the check to work for both old and
new versions of meson.

Fixes: 34bbb24ce7
       ("meson: Add support for ppc assembly/optimizations")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-12-18 09:17:54 -08:00
Jason Ekstrand 3feda3cf35 anv: Bump the patch version to 96
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-18 09:40:46 -06:00
Kenneth Graunke 3c71ba3baa i965: Don't override subslice count to 4 on Gen11.
Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up.  Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2018-12-17 14:03:45 -08:00
Ian Romanick af07141b33 intel/compiler: More peephole_select for pre-Gen6
No shader-db changes on any Gen6+ platform.

All of the shaders with cycles hurt by more than ~2% are from Master of
Orion.  All of the shaders have instructions helped.  It looks like the
pass enables some control flow to be converted to bcsels, then the
scheduler does dumb things.  These are new shaders (just added before
doing this shader-db run), so there's probably some low-hanging fruit.

Iron Lake
total instructions in shared programs: 8214327 -> 8213684 (<.01%)
instructions in affected programs: 84469 -> 83826 (-0.76%)
helped: 114
HURT: 26
helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61%
95% mean confidence interval for instructions value: -5.87 -3.32
95% mean confidence interval for instructions %-change: -2.32% -1.17%
Instructions are helped.

total cycles in shared programs: 187736850 -> 187749314 (<.01%)
cycles in affected programs: 506750 -> 519214 (2.46%)
helped: 104
HURT: 36
helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16
helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63%
HURT stats (abs)   min: 4 max: 1402 x̄: 409.67 x̃: 40
HURT stats (rel)   min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39%
95% mean confidence interval for cycles value: 28.32 149.74
95% mean confidence interval for cycles %-change: -0.07% 1.61%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 5044014 -> 5043652 (<.01%)
instructions in affected programs: 46751 -> 46389 (-0.77%)
helped: 63
HURT: 13
helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52%
95% mean confidence interval for instructions value: -6.54 -2.99
95% mean confidence interval for instructions %-change: -3.04% -1.28%
Instructions are helped.

total cycles in shared programs: 128143042 -> 128150188 (<.01%)
cycles in affected programs: 324564 -> 331710 (2.20%)
helped: 57
HURT: 19
helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32
helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81%
HURT stats (abs)   min: 10 max: 1400 x̄: 468.21 x̃: 60
HURT stats (rel)   min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70%
95% mean confidence interval for cycles value: 6.90 181.15
95% mean confidence interval for cycles %-change: -0.52% 1.59%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 8fb8ebfbb0 intel/compiler: More peephole select
Shader-db results:

The one shader hurt for instructions is a compute shader that had both
spills and fills hurt.

v2: Fix typo in comment noticed by Caio.

v3: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15072761 -> 15047884 (-0.17%)
instructions in affected programs: 895539 -> 870662 (-2.78%)
helped: 3623
HURT: 1
helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20%
HURT stats (abs)   min: 92 max: 92 x̄: 92.00 x̃: 92
HURT stats (rel)   min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92%
95% mean confidence interval for instructions value: -7.10 -6.63
95% mean confidence interval for instructions %-change: -4.03% -3.82%
Instructions are helped.

total cycles in shared programs: 369738930 -> 369535732 (-0.05%)
cycles in affected programs: 68027851 -> 67824653 (-0.30%)
helped: 2609
HURT: 1035
helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77
helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47%
HURT stats (abs)   min: 1 max: 33336 x̄: 261.04 x̃: 20
HURT stats (rel)   min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47%
95% mean confidence interval for cycles value: -96.43 -15.09
95% mean confidence interval for cycles %-change: -6.07% -5.36%
Cycles are helped.

total spills in shared programs: 10158 -> 10159 (<.01%)
spills in affected programs: 166 -> 167 (0.60%)
helped: 1
HURT: 1

total fills in shared programs: 22105 -> 22116 (0.05%)
fills in affected programs: 837 -> 848 (1.31%)
helped: 4
HURT: 1

Ivy Bridge
total instructions in shared programs: 12021190 -> 11990256 (-0.26%)
instructions in affected programs: 910561 -> 879627 (-3.40%)
helped: 3344
HURT: 18
helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6
helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31%
HURT stats (abs)   min: 2 max: 20 x̄: 7.89 x̃: 6
HURT stats (rel)   min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70%
95% mean confidence interval for instructions value: -9.49 -8.91
95% mean confidence interval for instructions %-change: -5.32% -4.98%
Instructions are helped.

total cycles in shared programs: 179077826 -> 178570196 (-0.28%)
cycles in affected programs: 63205667 -> 62698037 (-0.80%)
helped: 2767
HURT: 620
helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88
helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09%
HURT stats (abs)   min: 1 max: 31255 x̄: 152.27 x̃: 11
HURT stats (rel)   min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58%
95% mean confidence interval for cycles value: -173.94 -125.81
95% mean confidence interval for cycles %-change: -7.68% -6.97%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10852569 -> 10843758 (-0.08%)
instructions in affected programs: 235803 -> 226992 (-3.74%)
helped: 800
HURT: 0
helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8
helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36%
95% mean confidence interval for instructions value: -11.93 -10.10
95% mean confidence interval for instructions %-change: -4.99% -4.39%
Instructions are helped.

total cycles in shared programs: 154732047 -> 154608941 (-0.08%)
cycles in affected programs: 4063110 -> 3940004 (-3.03%)
helped: 606
HURT: 253
helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62
helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81%
HURT stats (abs)   min: 1 max: 1966 x̄: 59.36 x̃: 11
HURT stats (rel)   min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67%
95% mean confidence interval for cycles value: -170.49 -116.13
95% mean confidence interval for cycles %-change: -2.61% -1.65%
Cycles are helped.

No change on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 4cd1a0be76 i965/vec4: Propagate conditional modifiers from more compares to other compares
If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component.  The specific
case that I saw was:

    cmp.l.f0(8)     g42<1>.xF       g61<4>.xF       (abs)g18<4>.zF
    ...
    cmp.nz.f0(8)    null<1>D        g42<4>.xD       0D

In this case we can just delete the second CMP.

No changes on Broadwell or Skylake because they do not use the vec4
backend.  Also no changes on GM45 or Iron Lake.

Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10856676 -> 10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.

total cycles in shared programs: 154788865 -> 154732047 (-0.04%)
cycles in affected programs: 2485892 -> 2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs)   min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 9a83c3d3b3 i965/fs: Eliminate unary op on operand of compare-with-zero
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.

All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.

total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.

Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0

total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0

GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 440c051340 i965/vec4/dce: Don't narrow the write mask if the flags are used
In an instruction sequence like

            cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D
    (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD

The other fields of vgrf17 may be unused, but the CMP still needs to
generate the other flag bits.

To my surprise, nothing in shader-db or any test suite appears to hit
this.  However, I have a change to brw_vec4_cmod_propagation that
creates cases where this can happen.  This fix prevents a couple dozen
regressions in that patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
2018-12-17 13:47:06 -08:00
Ian Romanick 111bcc8d02 i965/vec4: Silence unused parameter warnings in vec4 compiler tests
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::dst_reg* copy_propagation_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:57:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual void copy_propagation_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:77:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::vec4_instruction* copy_propagation_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:82:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::dst_reg* register_coalesce_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual void register_coalesce_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:80:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::vec4_instruction* register_coalesce_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:85:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::dst_reg* cmod_propagation_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual void cmod_propagation_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:85:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::vec4_instruction* cmod_propagation_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:90:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Bas Nieuwenhuizen f67dea5e19 radv: Fix multiview depth clears
We were not using the view mask for depth clears, causing only the
first view to be cleared.

Fixes: 2e86f6b259 "radv: Add multiview clears."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-17 20:16:26 +00:00
Bas Nieuwenhuizen 9add63a3a5 radv: Remove redundant format check.
The switch directly after the check has a default case that returns
NULL too, so the effective return value is not changed. Also this
check is wrong once we start dealing with formats introduced by an
extension (e.g. YUV formats).

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-17 20:09:38 +00:00
Eric Anholt 708d8f4d0a nir: Fix clamping of uints for image store lowering.
I botched some copy-and-paste and clamped to signed int max instead of
uint max.  Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.

Fixes: d3e046e76c ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-17 20:02:22 +00:00
Eric Anholt 00e2cbc049 v3d: Fix the argument type for vir_BRANCH().
Apparently this has been spewing warnings for Jason's clang, but not my
gcc.
2018-12-17 09:52:23 -08:00
Eric Anholt 376054fff3 vc4: Reuse nir_format_convert.h in our blend lowering.
These helpers came along after and have effectively the same
implementation.
2018-12-17 09:52:23 -08:00
Samuel Pitoiset 445867c80d radv: report Vulkan version 1.1.90 for real
I thought the value was correctly propagated, but actually not.

Fixes: 2ac6d55f38 ("radv: bump reported version to 1.1.90")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-17 17:51:48 +01:00
Jason Ekstrand cae373117c anv,radv: Re-enable VK_EXT_pci_bus_info
Now at version 2 with the fixed header.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 10:42:35 -06:00
Jason Ekstrand e5b59fe6f5 vulkan: Update the XML and headers to 1.1.96
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 10:41:56 -06:00
Rhys Perry ef198e8c6a radv: switch from nir_bcsel to nir_b32csel
Fixes: 191a1dce92 ('nir: Add 1-bit Boolean opcodes')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-17 14:52:39 +00:00
Rhys Perry bba94a3d85 radv: don't set surf_index for stencil-only images
Fixes: f8d5b377c8 ('radv: set cb base tile swizzles for MRT speedups (v4)')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108116
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-17 14:52:10 +00:00
Ian Romanick 9dc135efa1 nir: Release per-block metadata in nir_sweep
nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.

mean soft fp64 using uint64:   1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11:    63,555,571 =>    61,889,811
deus ex mankind divided 148:      62,845,304 =>    62,829,640
deus ex mankind divided 2890:     71,922,686 =>    71,922,686
dirt showdown 676:                69,238,607 =>    69,238,607
dolphin ubershaders 210:          77,822,072 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 14:39:56 -08:00
Ian Romanick 7adafd6e1c nir: Fix holes in nir_instr
Found using pahole.

Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,555,571
deus ex mankind divided 148:      62,887,728 =>    62,845,304
deus ex mankind divided 2890:     72,399,750 =>    71,922,686
dirt showdown 676:                69,464,023 =>    69,238,607
dolphin ubershaders 210:          78,359,728 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 14:39:56 -08:00