Commit Graph

108674 Commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho 4c160b6bd8 nir: fix MSVC build
Zero initialize struct with {0} instead of {}.
2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho eb13211997 nir/copy_prop_vars: add tests for load/store elements of vectors
Test using array deref on vectors in loads and stores.  These are
marked DISABLED_ as this optimization is currently not done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho 4f3809d389 nir: nir_build_deref_follower accept array derefs of vectors
Code itself already supports it, just make sure we can use it for
those cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho c4beadd28e nir/copy_prop_vars: change test helper to get intrinsics
Replace find_next_intrinsic(intrinsic, after) with
get_intrinsic(intrinsic, index).  This makes slightly more convenient
to check the resulting loads/stores/copies, since in most tests we
know which one we care about.  The cost is to perform more traversals,
but for such tests this is not a problem.

Added the ASSERT_EQ() on count to some tests missing it, so the
indices queried are always expected to find something.

Also, drop two nir_print_shader leftover calls in a test.

v2: Remove redundant assertions.  nir_src_comp_as_uint already
    assert what we need.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho fdcb9779d9 nir/copy_prop_vars: keep track of components in copy_entry
When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from.  At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size.  This prepares the code for array_derefs of vectors, in which
'component[i] != i'.

Also, extract setting all SSA components into a function of its own.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho 6624decbb5 nir/copy_prop_vars: add debug helpers
Disabled by default, to be used during development.  Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho 60d9bb9ff5 nir/copy_prop_vars: don't get confused by array_deref of vectors
For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations.  For load_derefs,
do nothing.  For store_derefs, invalidate whatever the store is
writing to.  For copy_derefs, invalidate whatever the copy is writing
to.

These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Timothy Arceri f48527e51a nir: allow nir_lower_phis_to_scalar() on more src types
Rather than only lowering if all srcs are scalarizable we instead
check that at least one src is scalarizable.

We change undef type to return false otherwise it will cause
regressions when it is the only scalarizable src.

total instructions in shared programs: 13219105 -> 13024547 (-1.47%)
instructions in affected programs: 1153797 -> 959239 (-16.86%)
helped: 581
HURT: 74

total cycles in shared programs: 333968972 -> 324807922 (-2.74%)
cycles in affected programs: 129809402 -> 120648352 (-7.06%)
helped: 571
HURT: 131

total spills in shared programs: 57947 -> 29130 (-49.73%)
spills in affected programs: 53364 -> 24547 (-54.00%)
helped: 351
HURT: 0

total fills in shared programs: 51310 -> 25468 (-50.36%)
fills in affected programs: 44882 -> 19040 (-57.58%)
helped: 351
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-23 11:11:51 +11:00
Alok Hota 6053499f2e swr/rast: bypass size limit for non-sampled textures
This fixes a bug where SWR will fail to render in cases with large
buffer allocations, e.g. very large meshes whose vertex buffers exceed
2GB

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-22 23:35:11 +00:00
Marek Olšák b326a15eda tgsi: don't set tgsi_info::uses_bindless_images for constbufs and hw atomics
This might have decreased performance for radeonsi/tgsi, because most
most shaders claimed they used bindless.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-22 18:00:54 -05:00
Jordan Justen cf652205cf
iris: Add gitlab-ci build testing
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-22 14:08:21 -08:00
Rob Clark fd360c82f0 freedreno/a6xx: cube image fix
Note that emit_intrinsic_load_image() already swaps a .3d flag with an
.a flag.  I tried doing things the other way around (going back to .3d)
but that didn't work.  And treating cube images as 2d array is also what
blob does, so let's just go with that.

Fixes dEQP-GLES31.functional.image_load_store.cube.load_store.*

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Rob Clark f90c3b4485 freedreno/a6xx: fix border-color offset
Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when
run after a test that binds textures used in vertex shader.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Rob Clark bdedb8277a freedreno/ir3: don't hardcode wrmask
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow
and few other similar tests that do multiple texture fetches into
individual components of a packet output.  Mostly works around the
issue mentioned in ra_block_find_definers().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Rob Clark 5d4fa194b8 freedreno: fix race condition
rsc->write_batch can be cleared behind our back, so we need to acquire
the lock *before* deref'ing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Kenneth Graunke 3090c6b9e9 vulkan: Fix 32-bit build for the new overlay layer
vulkan_core.h defines non-dispatchable handles as (struct object *)
on 64-bit systems, but uint64_t on 32-bit systems.  The former can be
implicitly cast to void *, but the latter requires an explicit cast.

While here, %lu is the wrong format specifier for uint64_t on 32-bit
systems, so use PRIu64, fixing a warning.

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-22 08:56:54 -08:00
Juan A. Suarez Romero 4f917e6a61 anv: advertise 8 subpixel precision bits
On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is
used to select between 8 bit subpixel precision (value 0) or 4 bit
subpixel precision (value 1). As this value is not set, means it is
taking the value 0, so 8 bit are used.

On the other side, in the Vulkan CTS tests, if the reference rasterizer,
which uses 8 bit precision, as it is used to check what should be the
expected value for the tests, is changed to use 4 bit as ANV was
advertising so far, some of the tests will fail.

So it seems ANV is actually using 8 bits.

v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason)

v3: use _8Bit definition as value (Jason)

v4: (by Jason)
anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect

This field was added on gen8 even though there's an identically defined
one in 3DSTATE_SF.

CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 17:53:55 +01:00
Juan A. Suarez Romero 3b423eeb2d genxml: add missing field values for 3DSTATE_SF
Fill out "Vertex Sub Pixel Precision Select" possible values.

CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 17:53:45 +01:00
Bas Nieuwenhuizen f324784104 radv: Allow interpolation on non-float types.
In particular structs containing floats and 16-bit floating point
types.

Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Fixes: da29594636 "spirv: Only split blocks"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-22 17:06:55 +01:00
Bas Nieuwenhuizen a1fdd4a4a7 radv: Fix float16 interpolation set up.
float16 types can have non-flat interpolation so set up the HW
correctly for that.

Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-22 17:06:55 +01:00
Ilia Mirkin ae2cb72804 nv50: disable compute
It causes more trouble than it's worth. Now vl tries to create compute
shaders without all the proper checking. Since there's really no
(current) way to use compute on nv50, just mark it disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109742
Fixes: f6ac0b5d71 ("gallium/auxiliary/vl: Add compute shader to support video compositor render")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-22 09:42:41 -05:00
Lionel Landwerlin 1d626fc028 intel: fix urb size for CFL GT1
Same 192Kb amount as SKL/KBL GT1 applies.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: de7ed0ba55 ("i965/CFL: Add PCI Ids for Coffee Lake.")
2019-02-22 11:53:49 +00:00
Samuel Iglesias Gonsálvez bd2c5a8203 isl: the display engine requires 64B alignment for linear surfaces
v2: Add PRM quote (Lionel)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-22 11:45:45 +00:00
Gert Wollny 2ee197d6e8 virgl: Enable mixed color FBO attachemnets only when the host supports
it

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2019-02-22 10:44:08 +01:00
Mauro Rossi 338dacc341 android: intel/isl: remove redundant building rules
Fixes the following building error:

including ./external/mesa/Android.mk ...
build/core/base_rules.mk:183: *** external/mesa/src/intel:
MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel.
make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1

ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c
and that source file includes isl_tiled_memcpy.c source

Fixes: 96bb328 ("iris: add Android build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-22 07:56:11 +02:00
Kenneth Graunke b21de090d6 Revert "iris: Enable auxiliary buffer support"
This reverts commit cd0ced49e7.

It breaks glxgears rendering.
2019-02-21 15:50:46 -08:00
Kenneth Graunke e2cb0c5e0e iris: Enable -msse2 and -mstackrealign
This is needed for gen_clflush.h intrinsics to work on 32-bit builds.
i965 and anv both set these, and iris needs to as well.

Tested-by: Mark Janes <mark.a.janes@intel.com>
2019-02-21 14:51:15 -08:00
Francisco Jerez 7272fe9c08 intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez e03be78252 intel/fs: Implement extended strides greater than 4 for IR source regions.
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185 and then (mistakenly?) reverted as
a31d038208.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez 7f9f6263c1 intel/fs: Cap dst-aligned region stride to maximum representable hstride value.
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez e2f475ddff intel/fs: Lower integer multiply correctly when destination stride equals 4.
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez c3c27762f7 intel/fs: Exclude control sources from execution type and region alignment calculations.
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Timothy Arceri d9e08e753b nir: clone instruction set rather than removing individual entries
This reduces the time spent in nir_opt_cse() by almost a half.

The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 08:36:36 +11:00
Jordan Justen cd0ac3a6af
genxml: Remove extra space in gen4/45/5 field name
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 13:17:10 -08:00
Jordan Justen a9b0b72a78
genxml/gen_bits_header.py: Use regex to strip no alphanum chars
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 13:15:59 -08:00
Kenneth Graunke cd0ced49e7 iris: Enable auxiliary buffer support
This currently regresses KHR-GL4x.compute_shader.resource-texture,
but that's a pre-existing bug (https://bugs.freedesktop.org/109113)
which should be fixed up once we have fast clear support.
2019-02-21 10:26:12 -08:00
Rafael Antognolli db81445837 iris: Flag ALL_DIRTY_BINDINGS on aux state change.
If we change the aux state for a given resource, we need to re-emit the
binding table pointers for any stage that has such resource bound. Since
we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.
2019-02-21 10:26:12 -08:00
Rafael Antognolli 95589652a1 iris: Skip resolve if there's no context.
If iris_resource_get_handle() gets called without a context, we can't
resolve the resource. Hopefully it shouldn't be compressed anyway, so
let's just add an assert to ensure it's correct.
2019-02-21 10:26:12 -08:00
Rafael Antognolli 36138bb7fc iris/clear: Pass on render_condition_enabled. 2019-02-21 10:26:12 -08:00
Rafael Antognolli 8190165d13 iris: Avoid leaking if we fail to allocate the aux buffer.
Otherwise we could leak the aux state map or the aux BO.
2019-02-21 10:26:12 -08:00
Kenneth Graunke 7da53d7188 iris: Only resolve compute resources for compute shaders 2019-02-21 10:26:12 -08:00
Kenneth Graunke 95a36bd55c iris: Fix aux usage in render resolve code 2019-02-21 10:26:12 -08:00
Rafael Antognolli 4f191feb0c iris: Pin HiZ buffers when rendering. 2019-02-21 10:26:12 -08:00
Rafael Antognolli dfd54f9954 iris: Flush before hiz_exec. 2019-02-21 10:26:12 -08:00
Kenneth Graunke f3f7d45a63 iris: Allow disabling aux via INTEL_DEBUG options 2019-02-21 10:26:12 -08:00
Kenneth Graunke 4634b754f4 iris: do flush for buffers still 2019-02-21 10:26:12 -08:00
Kenneth Graunke 15822f33ad iris: make surface states for CCS_D too
CCS_E can fall back to CCS_D with incompatible format views

CCS_D is pretty useless without fast clears and we may as well use NONE,
but we're surely going to hook those up at some point, so may as well
just go ahead and do it now...
2019-02-21 10:26:12 -08:00
Rafael Antognolli 689b590069 iris: Skip msaa16 on gen < 9.
Also needed to add gen information to KEY_INIT.
2019-02-21 10:26:12 -08:00
Kenneth Graunke fd2038b22a iris: Set program key fields for MCS 2019-02-21 10:26:12 -08:00
Kenneth Graunke 92c310fd3f iris: don't use hiz for MSAA buffers 2019-02-21 10:26:12 -08:00