This arg size should be 1 instead of 3. It does not affect functionality
because we does not enable it in SPI_PS_INPUT_ADDR. But it does affect
the VGPR number that LLVM produce when LLVM still count with all PS
function arguments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12922>
I'm going to want to use this in other passes, so lets let the
allocation hang off the pass's context. Also, while we're at it,
fix the error path leak in ir3_ra().
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12923>
The previous code had a number of errors, the most glaring of which was
forcing linear when it was one of the possible layouts requested.
When modifiers are being used, a list of _acceptable_ modifiers is
supplied; it's up to the driver to then make a decision as to which it
thinks is most optimal.
Normally we would select between linear/tiled/UBWC in ascending order of
preference according to what's possible, however we can't use a tiled
layout with explicit modifiers as there is no modifier token defined for
it.
Rewrite the layout-selection mechanism to always try to do the most
optimal thing. If the use flags force us to, or we have a shared
resource without explicit modifiers, we use linear. Failing that, we use
UBWC wherever possible; if this is not possible, we use tiled for
internal resources only or linear for shared resources.
v2 (Rob): respect FD_FORMAT_MOD_QCOM_TILED; do not print perf warning on
user choice of disabling UBWC;
v3: fix several issues breaking CI tests: revert removal of using
MOD_INVALID in various places, and assume implicit modifiers if present;
do not attempt to set UBWC flags when screen->tile_mode(prsc) falls back
to LINEAR (e.g. for small mip-maps levels); use TILED for implicit
modifier case with non-shared resources
v4: fix unintended demotion of UBWC, i.e. only check QCOM_COMPRESSED
modifier and demote UBWC to less optimal format when using explicit
modifiers
Signed-off-by: Daniel Stone <daniels@collabora.com>
Tested-by: Heinrich Fink <hfink@snap.com>
Signed-off-by: Heinrich Fink <hfink@snap.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12595>
Android.mk was removed, so remove a reference to it from Panfrost
documentation. We should document building for Android, but it'll be
through meson. This note as-is adds more confusion than note.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12914>
Some games, ie. Doom Eternal, present from compute following compute post-fx and would benefit from having DCC image stores available.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12862>
Helper function to check if a modifier supports DCC image stores.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12862>
Improves performance of SynMark OglDrvShComp by +241.879%±1.01366% (n=5)
on a random KBL desktop that I have. That seems to put it at about the
same performance as i965, but I did not test that in a statistically
sound way.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
There's going to be at least one more shader function set in
pipe_screen, so it makes more sense to do it in iris_program.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
Use the flag that was set by nir_lower_passthrough_edgeflags. The
lowering passes will soon be moved to a finalize_nir hook, so there
won't be any choice. Ideally we'd like to eliminate iris_fix_edge_flags
completely, and this is a first step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
Most modern hardware needs the edge flag added as a hidden vertex input
and needs code added to the vertex shader to copy the input to an
output. Intel hardware is a little different. Gfx4 and Gfx5 hardware
works in the previously described mannter. Gfx6+ hardware needs the
edge flag as a specific vertex shader input, and that input is magically
processed by fixed-function hardware without need for extra shader code.
This flag signals only that the vertex shader input is needed. It would
be nice if we could decouple adding the vertex shader input from
generating the copy-to-output code, but that has proven to be
challenging. Not having that code causes other passes to want to
eliminate that shader input.
v2: Convert conditional to assertion. This pass is only called for
vertex shaders. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
The lowering passes will soon be moved to another function, so there
won't be any choice.
As a side benefit, this allows eliminating the uses_atomic_load_store
**pointer** parameter from brw_nir_lower_storage_image. For some reason
crocus was passing false instead of NULL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
...instead of looking for "ARB" in the name of the shader. This matches
the behavior of i965. Using "ARB" was added in a1ebac3750 ("iris:
Implement ALT mode for ARB_{vertex,fragment}_shader"), but there's no
explanation of why that method was used.
v2: Just use shader_info::is_arb_asm everywhere instead of
iris_uncompiled_shader::use_alt_mode. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
No need to re-create this every time we transition from stream-out
enabled to disabled.
Creation of streamout_disable_stateobj is deferred until we create
a program state using streamout to avoid creating it unnecessarily
and because fd6_prog_init() is called before ctx->pipe is created.
(Changing that ordering is complicated by the fact that u_blitter
copies pctx->bind_fs_state(), and friends.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12918>
Userspace frequently reads the elapsed fence, but the GPU only writes it
once per submit. So this should be another useful place for cached-
coherent.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11176>
Some more extreme examples, like gl_driver2_off, can be bottlenecked on
writes to cmdstream. OTOH the CP is pretty pipelined in how it slurps
in memory, so the penalty of using coherent buffers should not be so
much.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11176>
It hasn't really mattered until now, as we keep a separate cache for
cmdstream (which is FD_BO_GPU_READONLY), and the only other flag so
far is FD_BO_SCANOUT (which the bo cache probably messes up, but it
does not matter on most hw, and on hw where it does the scanout buffer
will be imported (and therefore won't end up in the bo cache).
But when we add cached-coherent (or if we wanted to use GPU_READONLY
more) it starts to matter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11176>
Add deqp gles2 CI run for GC2000.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12852>
Makes it possible to use e.g. a ser2net server in the lan.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12852>
Switch to v5.13-rc5-for-mesa-ci-2bb5d9ffd79c branch, which includes
etnaviv MMU patches.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12852>
Build the kernel with CONFIG_DRM_ETNAVIV=y and include
imx6q-cubox-i.dtb.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12852>
When moving the batch cache to the context, I added hash table lookups
from batch to rsc for "is this resource in use" because we could no longer
store data in the rsc bo under the batch cache's lock.
We can save that cost by tracking a bitfield of resources referenced by
the batch, which gives us very cheap checks in the draw path at a minor
cost in memory. We can just use the GEM BO handle, since it's a nice
small integer already (we can't use the TC buffer ID, because the frontend
changes that, and we're in the driver thread).
This required moving the !pending() assert up in resource shadowing, since
the BO swap meant we were checking pending on the wrong resource.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11511>
I think the whole submit lock thing should be possible to remove now, but
just getting rid of the lock since we're no longer sharing batches between
ctxes means another several percent draw overhead improvement.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11511>
In moving batch cache to the context, the check for whether there's
pending work being done to this resources ends up accessing the context,
so we can't do it outside of the fd_context_access_begin(). This flag
lets us do the driver-thread asserts before we've decided whether we need
to flush.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11511>
This prevents other optimizations from associating the inner multiply,
which can add inaccuracies that can lead to discontinuities around the
boundary of the ffract. We should use exactly the sequence that the blob
uses to avoid problems.
Since fadd + fmul cannot be combined to ffma when exact is specified, we
have to use ffma ourselves.
Fixes artifacts in PixMark Volplosion with !7458.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12912>
Fixes dEQP-VK.ssbo.phys.layout.random.8bit.all_per_block_buffers.46 on
GFX8.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12172>