Commit Graph

111458 Commits

Author SHA1 Message Date
Ian Romanick 336eab0630 nir: Add a shallow clone function for nir_alu_instr
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Tomeu Vizoso 0e1c5cc78f panfrost: Remove link stage for jobs
And instead, link them as they are added.

Makes things a bit clearer and prepares future work such as FB reload
jobs.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-31 14:37:10 +02:00
Tomeu Vizoso da9f7ab6d4 panfrost: ci: Switch to kernel 5.2-rc2
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-31 13:51:51 +02:00
Tomeu Vizoso 77f5663cf3 panfrost: ci: Update expectations
A bunch of tests have been fixed, but some regressions have appeared on
T760.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-05-31 13:51:43 +02:00
Connor Abbott 78f33620e8 radeonsi/nir: Remove hack for builtins
We now bounds check properly in the uniform loading fast path, so
there's no need to disable it by pretending there are other UBO bindings
in use. The way this looks at the variable name was causing problems
when two piglit shaders, one with a name that triggered the hack and one
that didn't, got hashed to the same thing after stripping out the names.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:03:05 +02:00
Connor Abbott fca1a35163 radeonsi/nir: Use correct location for uniform access bound
location is the API-level location, but driver_location is the actual
location the uniform gets passed to the driver. This apparently only
caused failures with builtins, where the location is 0 because it's
represented via the state tokens instead.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:02:57 +02:00
Connor Abbott 6571032af1 radeonsi/nir: Correctly handle double TCS/TES varyings
ac expands the store to 32-bit components for us, but we still have to
deal with storing up to 8 components, and when a varying is split across
two vec4 slots we have to calculate the address again for the second
slot, since they aren't adjacent in memory. I didn't do this on the ac
level because we should generate better indexing arithmetic for the lds
store, where slots are contiguous.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 11:02:11 +02:00
Christian Gmeiner ca19f7639a etnaviv: blt: s/TRUE/true && s/FALSE/false
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-31 10:04:49 +02:00
Christian Gmeiner 9e6463e62a etnaviv: rs: s/TRUE/true && s/FALSE/false
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-05-31 10:04:49 +02:00
Bas Nieuwenhuizen e24a7840f6 nir: Actually propagate progress in nir_opt_move_load_ubo.
Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).

Fixes: af355aaa07 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 07:45:43 +00:00
Samuel Pitoiset 9178076a46 radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:16 +02:00
Samuel Pitoiset 0e7b029d00 radv: use CmdPushConstants when restoring constants after meta operations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-31 08:50:13 +02:00
Jason Ekstrand f1cb3348f1 nir/split_vars: Properly bail in the presence of complex derefs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand cc59503b16 nir/vars_to_ssa: Properly ignore variables with complex derefs
Because the core principle of the vars_to_ssa pass is that it globally
(within a function) looks at all of the uses of a never-indirected path
and does a full into-SSA on that path, it can't handle a path which has
any chance of having aliasing.  If a function_temp variable has a cast
or anything else which may cause aliasing, we have to assume that all
paths to that variable may alias and ignore the entire variable.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand 911ea2c66f nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer
We're about to change the meaning of get_deref_node returning NULL so we
need a non-NULL value to mean properly undefined.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand e84194686d nir/deref: Add a has_complex_use helper
This lets passes easily detect derefs which have uses that fall outside
the standard load/store/copy pattern so they can bail appropriately.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand 8948048c6f nir/dead_cf: Call instructions aren't dead
When we inlined cf_node_has_side_effects into node_is_dead, all the
conditions flipped and we forgot to flip one.  Fortunately, it doesn't
matter right now because no one uses this pass on shaders with more than
one function.

Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Dave Airlie 5441d56243 vtn: create cast with type stride.
When creating function parameters, we create pointers from ssa
values, this creates nir casts with stride 0, however we have
no where else to get this value from. Later passes to lower
explicit io need this stride value to do the right thing.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-05-31 09:57:45 +10:00
Rob Clark 372e83b95f list: add some iterator debug
Debugging use of unsafe iterators when you should have used the _safe
version sucks.  Add some DEBUG build support to catch and assert if
someone does that.

I didn't update the UPPERCASE verions of the iterators.  They should
probably be deprecated/removed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-05-30 22:11:26 +00:00
Caio Marcelo de Oliveira Filho 03ce12c5ed nir: Accept nir_var_mem_global in derefs used by phis
This mode is used by PhysicalStorageBufferEXT storage class.

Fixes: 8bdf5a008b "nir: Allow derefs to be used as phi sources"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 14:07:29 -07:00
Jason Ekstrand 5e43a75950 intel/blorp: Use the hardware op for CCS ambiguate on gen10+
Cannonlake hardware adds a new resolve type in 3DSTATE_PS called
FAST_CLEAR_0 which does an ambiguate.  Now that the hardware can do it
directly, we should use that instead of binding the CCS as a render
target and doing it manually.  This was tested with a full Vulkan CTS
run on Cannonlake.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-05-30 13:49:48 -07:00
Jan Zielinski b31a31bba5 swr/rast: Enable ARB_GL_texture_buffer_range
No significant changes in the code needed to enable
the extension. Just updating SWR capabilities
and the documentation

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-05-30 15:42:15 +00:00
Jan Zielinski cf673747ce swr/rast: fix 32-bit compilation on Linux
Removing unused but problematic code from simdlib header to fix
compilation problem on 32-bit Linux.

Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-05-30 15:31:15 +00:00
Jason Ekstrand 9e403dc56e intel/fs: Do a stalling MFENCE in endInvocationInterlock()
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Jason Ekstrand 859de4a748 intel/fs,vec4: Use g0 as the header for MFENCE
We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Samuel Pitoiset 43cc3dc9c0 radv: enable transformFeedbackStreamsLinesTriangles
The driver should already support this without any changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 15:42:36 +02:00
Samuel Pitoiset da26013eb7 radv: implement VK_EXT_sample_locations and disable it
Basically, this extension allows applications to use custom
sample locations. It doesn't support variable sample locations
during subpass. Note that we don't have to upload the user
sample locations because the spec doesn't allow this.

The extension is currently disabled because the driver needs to
support variable sample locations during layout transitions. The
depth decompress needs to know them and that's a bit invasive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 09:52:16 +02:00
Kenneth Graunke e917bb7ad4 iris: Avoid holding the lock while allocating pages.
We only need the lock for:
1. Rummaging through the cache
2. Allocating VMA

We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also
SET_DOMAIN to allocate the underlying pages.  The idea behind calling
SET_DOMAIN was to avoid a lock in the kernel while allocating pages,
now we avoid our own global lock as well.

We do have to re-lock around VMA.  Hopefully this shouldn't happen too
much in practice because we'll find a cached BO in the right memzone
and not have to reallocate it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-05-30 00:46:37 -07:00
Kenneth Graunke 0cb380a6b3 iris: Move SET_DOMAIN to alloc_fresh_bo()
Chris pointed out that the order between SET_DOMAIN and SET_TILING
doesn't matter, so we can just do the page allocation when creating
a new BO.  Simplifies the flow a bit.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-05-30 00:15:26 -07:00
Kenneth Graunke 53878f7a89 iris: Be lazy about cleaning up purged BOs in the cache.
Mathias Fröhlich reported that commit 6244da8e23 crashes.
list_for_each_entry_safe is safe against removing the current entry,
but iris_bo_cache_purge_bucket was potentially removing next entries
too, which broke our saved next pointer.

To fix this, don't bother with the iris_bo_cache_purge_bucket step.
We just detected a single entry where the kernel has purged the BO's
memory, and so it isn't a usable entry for our cache.  We're about to
continue the search with the next BO.  If that one's purged, we'll
clean it up too.  And so on.

We may miss cleaning up purged BOs that are further down the list
after non-purged BOs...but that's probably fine.  We still have the
time-based cleaner (cleanup_bo_cache) which will take care of them
eventually, and the kernel's already freed their memory, so it's not
that harmful to have a few kicking around a little longer.

Fixes: 6244da8e23 iris: Dig through the cache to find a BO in the right memzone
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2019-05-29 23:38:01 -07:00
Kenneth Graunke 6244da8e23 iris: Dig through the cache to find a BO in the right memzone
This saves some util_vma thrash when the first entry in the cache
happens to be in a different memory zone, but one just a tiny bit
ahead is already there and instantly reusable.  Hopefully the cost
of a little extra searching won't break the bank - if it does, we
can consider having separate list heads or keeping a separate VMA
cache.

Improves OglDrvRes performance by 22%, restoring a regression from
deleting the bucket allocators in 694d1a08d3.

Thanks to Clayton Craft for alerting me to the regression.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 20:03:45 -07:00
Kenneth Graunke 4c2d9729df iris: Tidy BO sizing code and comments
Buckets haven't been power of two sized in over a decade.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:42:15 -07:00
Kenneth Graunke 7acc88a47c iris: Move some field setting after we drop the lock.
It's not much, but we may as well hold the lock for a bit less time.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:42:04 -07:00
Kenneth Graunke 76c5a19668 iris: Move cached BO allocation into a helper function.
There's enough going on here to warrant a helper.  This also simplifies
the control flow and eliminates the last non-error-case goto.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:52 -07:00
Kenneth Graunke cea6671395 iris: Fall back to fresh allocations of mapping for zero-memset fails.
It is unlikely that we would fail to map a cached BO in order to zero
its contents.  When we did, we would free the first BO in the cache and
try again with the second.  It's possible that this next BO already had
a map setup, in which case we'd succeed.  But if it didn't, we'd likely
fail again in the same manner.

There's not much point in optimizing this case (and frankly, if we're
out of CPU-side VMA we should probably dump the cache entirely)...so
instead, just fall back to allocating a fresh BO from the kernel which
will already be zeroed so we don't have to try and map it.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:50 -07:00
Kenneth Graunke 042f8514e6 iris: Move fresh BO allocation into a helper function.
There's enough going on here to warrant a helper.  More cleaning coming.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:22 -07:00
Kenneth Graunke 06421e5be7 iris: Do SET_TILING at a single point rather than in two places.
Both the from-cache and fresh-from-GEM cases were calling SET_TILING.
In the cached case, we would retry the allocation on failure, pitching
one BO from the cache each time.  This is silly, because the only time
it should fail is if the tiling or stride parameters are unacceptable,
which has nothing to do with the particular BO in question.  So there's
no point in retrying - we should simply fail the allocation.

This patch moves both calls to bo_set_tiling_internal() below the
cache/fresh split, so we have it at a single point in time instead
of two.

To preserve the ordering between SET_TILING and SET_DOMAIN, we move
that below as well.  (I am unsure if the order matters.)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:41:08 -07:00
Kenneth Graunke 43d835cb0f iris: Use the BO cache even for coherent buffers on non-LLC.
We mark snooped BOs as non-reusable, so we never return them to the
cache.  This means that we'd need to call I915_GEM_SET_CACHING to make
any BO we find in the cache snooped.  But then again, any BO we freshly
allocate from the kernel will also be non-snooped, so it has the same
issue.  There's really no reason to skip the cache - we may as well use
it to avoid the I915_GEM_CREATE overhead.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:40:18 -07:00
Kenneth Graunke 78003014d0 iris: Fix locking around vma_alloc in iris_bo_create_userptr
util_vma needs to be protected by a lock.  All other callers of
vma_alloc and vma_free appear to be holding a lock already.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:40:16 -07:00
Kenneth Graunke 5fc11fd988 iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation.
The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock.
We can simply set the bucket to NULL and it will skip the cache without
goto, and without messing up locking.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-29 19:40:15 -07:00
Marek Olšák 2285b93032 radeonsi: fix timestamp queries for compute-only contexts
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
2019-05-29 21:13:35 -04:00
Marek Olšák b5697c311b Change a few frequented uses of DEBUG to !NDEBUG
debugoptimized builds don't define NDEBUG, but they also don't define
DEBUG. We want to enable cheap debug code for these builds.
I only chose those occurences that I care about.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-05-29 21:13:35 -04:00
Kenneth Graunke 0f1b68ebee iris: Re-emit Surface State Base Address when context is lost.
When we hit a GPU hang, we failed to reset Surface State Base Address
right away, and would keep hanging until we filled up the binder.  Then
we'd finally get it right after a lot of repeated stumbles.  Update it
right away so we hopefully hang fewer times before succeeding.
2019-05-29 16:35:02 -07:00
Jason Ekstrand e459d6d6df iris: Enable nir_opt_large_constants
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15306230 -> 15304726 (<.01%)
    instructions in affected programs: 4570 -> 3066 (-32.91%)
    helped: 16
    HURT: 0

    total cycles in shared programs: 361703436 -> 361680041 (<.01%)
    cycles in affected programs: 129388 -> 105993 (-18.08%)
    helped: 16
    HURT: 0

    LOST:   0
    GAINED: 2

The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal
Space Program

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Jason Ekstrand 9dc57eebd5 iris: Don't assume UBO indices are constant
It will be true for the constant/system value buffer because they use a
constant zero but it's not true in general.  If we ever got here when
the source wasn't constant, nir_src_as_uint would assert.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2019-05-29 21:09:16 +00:00
Jason Ekstrand 744f93f5c1 iris: Move upload_ubo_ssbo_surf_state to iris_program.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Brian Paul e584fd894e nir: silence three compiler warnings seen with MinGW
Silence two unused var warnings.  And init elem_size, elem_align to
zero to silence "maybe uninitialized" warnings.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-29 13:59:24 -06:00
Brian Paul c71ca65405 svga: clamp max_const_buffers to SVGA_MAX_CONST_BUFS
In case the device reports 15 (or more) buffers.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2019-05-29 13:59:23 -06:00
Kenneth Graunke 6892d2b94a iris: Clone before calling nir_strip and serializing
This is non-destructive and leaves the debugging information in place.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 18:16:32 +00:00
Kenneth Graunke e1409aead5 iris: Only store the SHA1 of the NIR in iris_uncompiled_shader
Jason pointed out that we don't need to keep an entire copy of the
serialized NIR around, we just need the SHA1.  This does change our
disk cache key to be taking a SHA1 of a SHA1, which is a bit odd,
but should work out and be faster and use less memory.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 18:16:32 +00:00