Writes to global memory should not be moved over discard,
otherwise we could have unintended side-effects or lack of
side-effects where they should be observed.
Fixes tests:
dEQP-VK.rasterization.frag_side_effects.color_at_beginning.kill
dEQP-VK.rasterization.frag_side_effects.color_at_end.kill
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9365>
Disabling VM_ALWAYS_VALID actually hurts more than it helps
after doing more testing. Managing the global BO list in userspace
is really costly and make a bunch of games CPU bound.
I think re-enabling VM_ALWAYS_VALID is a step in the right direction.
This reverts commit 6ac6e2fbfb.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9341>
We had an optimization in place to skip a unifa write if the address
happens to be right after the last ldunifa read address, but we can
take this further and update the unifa address by emitting ldunifa
instructions if needed to skip a unifa write that is close enough.
This is because a unifa write involves 4 cycles: 1 for the write
and 3 delay slots before we can emit the first ldunifa.
So if we have code like this:
unifa addr + 0
ldunifa.r0
unifa addr + 12
ldunifa.r1
In practice we end up with QPU like this:
unifa addr + 0
nop
nop
nop
ldunifa.r0
unifa addr + 12
nop
nop
nop
ldunifa.r1
And with this patch we get:
unifa addr + 0
nop
nop
nop
ldunifa.r0 <--- reads offset 0
ldunifa.- <--- reads offset 4
ldunifa.- <--- reads offset 8
ldunifa.r1 <--- reads offset 12
Of course, QPU scheduling might find ways to fill the NOPs to some
extent and remove some of the gains, but generally speaking, this is
still usually a win.
Going by shader-db results, allowing the next unifa address to be up
to 12 bytes after the address resulting from the last ldunifa read
shows the best results:
total instructions in shared programs: 13817048 -> 13812202 (-0.04%)
instructions in affected programs: 602701 -> 597855 (-0.80%)
helped: 1750
HURT: 760
Instructions are helped.
total uniforms in shared programs: 3795485 -> 3793200 (-0.06%)
uniforms in affected programs: 43930 -> 41645 (-5.20%)
helped: 898
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 2326612 -> 2326621 (<.01%)
max-temps in affected programs: 651 -> 660 (1.38%)
helped: 10
HURT: 21
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%)
sfu-stalls in affected programs: 627 -> 591 (-5.74%)
helped: 186
HURT: 158
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%)
inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%)
helped: 1747
HURT: 757
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
We can't remove unused ldunifa that are not the first or last
in a sequence, but we can still ignore their destination
to reduce register pressure.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
With threaded-context we won't have a chance to apply the workaround in
the backend driver. But the previous commit moves it to a driconf
configured workaround in mesa/st, so we can drop this now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9316>
All the ir3 gens had the same thing, time to move it out into a shared
helper.
The keeping the storage in fdN_context is to avoid namespace clashes
between ir3 and ir2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
I had forgotton on which gens these where used on (which is important if
you need to know which shader stages use these).. expand the comments a
bit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
Found this on top of Karol's patches but it seems like it can just be
applied to master.
Helps with some cases of
kernel_image_methods/test_kernel_image_methods 2Darray
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9381>
TGSI has these nice labels for us for where to jump in this case, let's
use them. Improves piglit arb_shader_image_load_store-shader-mem-barrier
runtime massively, though not enough to make the test really reasonable to
run.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9347>
FP16 rendering is supported on all gen4 hardware.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9379>
Build deqp-runner with the `--locked` option to use dependencies
versions specified in `Cargo.lock`.
v2: Bump image tags.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Suggested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9368>
This has 3 advantages:
- It's SMEM.
- Multiple single component loads are merged into 1 multi-dword load by LLVM.
- The result is always packed for packed instructions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395>
The search helps must *never* modify the instruction passed in, so let
the compiler enforce this.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9378>
Since we don't manually enumerate the drivers using it, we have to retest
all drivers when changing it (which basically never happens, anyway).
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9353>
Available since Vulkan 1.0, and in fact already wired up, just not
advertised. It looks like we could make this dynamic state but this
works for now.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9371>
Some games declare the wrong format, so we might want to disable this
optimization in that case.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: e4d75c22 ("nir/opt_shrink_vectors: shrink image stores using the format")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9229>
tu_shader should be freed after pipeline is successfully created.
Fixes tests:
dEQP-VK.api.object_management.alloc_callback_fail.compute_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9364>
Computing the expected buffer size isn't reliable on GFX10+ because
DROPPED_CNTR returns weird results.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9367>
DROPPED_CNTR isn't reliable and might still report non-zero if the
SQTT buffer isn't full. Checking if the number of written bytes by
the hw is equal to the SQTT buffer size seems reliable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9367>
This fixes a GPU hang on my Sienna because the number of SE is
less than the maximum, and SE #1 is disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9370>