Creating Android swapchain image from gralloc buffer requires to use
VkImageDrmFormatModifierExplicitCreateInfoEXT. To fill the struct info,
we need to query extended resource info from gralloc.
With the queried modifier from gralloc, we can ask the driver for the
plane count of the given format and modifier pair.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10553>
We allocate them all align16, so mark the unions (and their container
structs) that way so the compiler can do aligned SSE load/stores.
glmark2 -b loop FPS +0.197265% +/- 0.117633% (n=1906)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10604>
When a TSY barrier is hit, the entire supergroup will be synchronized.
If the supergoup is large and uses all available QPU threads it would
mean that we would sychronize and stall all running threads until all
of them reach the barrier, which may be inefficient.
This patch makes it so that if the compute shader has any such barriers
we limit the supergroup size so each supergroup only takes half of the
QPU threads available at most, so that if one supergroup hits a
barrier we have at least one other supergroup we can run, reducing
idle QPU time.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
Each supergroup executes a number batches. Each batch has 16 elements
(one per QPU lane), except possibly the last batch which might be
incomplete. Until now, we packed a single workgroup in each supergroup,
which can lead to more incomplete batches and less efficient use
of the QPUs depending on the configuration of workgroups being dispatched.
This patch computes a number of workgroups per supergroup so that
we reduce or completely eliminate incomplete batches if possible.
It should be noted however, that TSY barriers act on supergroups,
so larger supergroups lead to larger syncpoints on barriers too.
A follow-up patch will try to find a good balance for compute shaders
that use such barriers.
This improves performance of the Sascha Willem's computecloth demo
by ~13%.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
The V3D hardware allows us to pack multiple workgroups together to avoid
wasting execution lanes in shader cores.
For example, if we dispatch 16 workgroups with a local size of 1 element, we
can pack all 16 workgroups in a single 16-wide dispatch where each lane
executes a different workgroup, instead of 16 1-wide dispatches.
When we do this, we don't have a uniform workgroup id any more.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
Now that all drivers are using brw_cs_get_dispatch_info() we can
remove one function (which is now unused) and reduce the scope of the
other.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10504>
And since right_mask is already provided as part of dispatch_info,
just use that instead of storing it.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10504>
We have this small calculations repeated in each Intel driver, so move
them to a single place to be reused. Also includes "right_mask" since
is always used in the same context and depends on the dispatch info
values.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10504>
The mode one was used before 0bc5a829dd ("nir: Remove shared support from
lower_io").
The others were used before 5f7c7c9a7f ("nir: add src and dest types
to all IO loads and stores for mediump").
All conditions now are always true, so drop them.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10533>
We recently added 5 more VIM3s to the lavalab, this should be more than
enough to run the full GLES 3.0 testsuite on G52.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10614>
Gallium's background as a Tungstend Graphics technology is no longer
significant; it's a historical detail. Besides, since Tungsten Graphics
were acquired by VMware more than a decade ago, the website no longer
exists.
While we're at it, replace the docs link with a link to the mesa docs,
and point to archive.org copy of the Tungsten Graphics paper.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2770
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10452>
For unsigned comparisons with zero these ops can be eliminated.
v2: Add comparison optimizations with -1 (Rhys Perry)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10583>
NIR provides two helper macros to run transformation passes correctly,
NIR_PASS() and NIR_PASS_V(). So far we've seemingly been a bit haphazard
about when to use them.
Let's correct that, and consistently use the NIR helpers here. This
helps us in two ways:
1. We now run nir_validate_shader after each pass, ensuring we didn't
break the shader
2. We now respect the NIR_PRINT environment variable for all NIR passes,
making debugging much less surprising.
In addition, we had an OPT()-macro that doesn't seem to provide much
help other than to hiding some trivial details. But they make our code
different to other users of NIR, which doesn't seem ideal. So let's drop
that macro while we're at it.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10585>
We've had about 1/day of the texelfetch group in the IRC flake reports
since apr 23, and tex-miplevel-selection that I marked before is actually
all the subtests it looks like. Also, you can't include the ",Fail" if
you want to actually match a test name.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10597>
Since the second source is always a constant that is known to be a
number, this should have the same performance as
GALLIVM_NAN_BEHAVIOR_UNDEFINED.
A lofty goal is to eventually remove GALLIVM_NAN_BEHAVIOR_UNDEFINED.
There's still a lot of (mostly implicit) users, and I don't feel like
tackling that right now. :)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>
If it is known that one of the source must be a number, then the (more
efficient) GALLIVM_NAN_RETURN_OTHER_SECOND_NONNAN path can be used.
v2: s/know to be/known to be/. Noticed by Roland.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>
Like softpipe in mesa!10419, llvmpipe suffers from improper handling
of NaN in nir_op_fmax and nir_op_fmin. nir_op_fsat is already handled
correctly. OpenCL strictly requires the "NaN cleansing" behavior, so
all of the functionality is in place. Just make the graphics APIs use
the OpenCL path.
The majority of the possible performance penalty incurred here should
be resolved in the next commit.
v2: Add updated checksum for bgfx/39-assao.rdc trace. Rendering goes
from mostly garbage to looking correct to me.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>
I don't know what I was thinking when I wrote 939bf7a419 ("tgsi_exec:
Fix NaN behavior of min and max") and d1c0f62b42 ("tgsi_exec: Fix NaN
behavior of saturate"). I knew that C99 had fmin and fmax... I just
forgot to use them.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>
There's a recently discovered HW bug affecting hardware at least as far
back as Skylake where, if the LOD is out-of-bounds for any SIMD lane,
then garbage may be returned in all SIMD lanes. The easy solution is to
set lower_txs_lod so that we always have a constant LOD of 0 which we
know a priori is always in-bounds. Fortunately, not many shaders
actually use textureSize() with LOD.
Shader-db results on Ice Lake:
total instructions in shared programs: 19948537 -> 19948564 (<.01%)
instructions in affected programs: 3859 -> 3886 (0.70%)
helped: 0
HURT: 7
One of the shaders is in Civilization: Beyond Earth, and the rest are
all in Civilization VI.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10538>
The source_depth_to_render_target flag can get set on old gen4-5 HW in a
few cases which are independent of the app writing gl_FragDepth. It
should be safe to just use fetch_payload_reg in that case instead of
depending in interpolation setup. This fixes a bug with certain very
simple shaders where we might end up not including the depth when we
should have.
While we're here, rework the logic around setting src_depth and add a
comment so it's more clear what's going on.
Fixes: 6d4070f3dd "intel/compiler: add support for fragment coordinate..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10596>
Somehow this was missed, but when we emit a query start/stop we need
have something that will need to be flushed in the batch.
Detected due to TC assert, but this had the potential to cause problems
in the non-TC case as well.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10599>
Add a helper to both set batch->needs_flush and clear ctx->last_fence so
that the two related bits of state do not get out of sync.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10599>