If one VMEM instruction uses a sampler and the other doesn't, we can't do
this optimization.
Totals from 47 (0.04% of 127638) affected shaders:
CodeSize: 271744 -> 271656 (-0.03%); split: -0.04%, +0.01%
Instrs: 52783 -> 52761 (-0.04%); split: -0.05%, +0.01%
Cycles: 5547040 -> 5546952 (-0.00%); split: -0.00%, +0.00%
VMEM: 10022 -> 9887 (-1.35%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4949>
Otherwise, we will fail when the traces description file doesn't
contain any checksum for the specified device.
Fixes: efbbf8bb81 ("tracie: Print results in a machine readable format")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4839>
When the LLVM version is too old or missing, SotTR applies shader
workarounds and that reduces performance by 2-5% with ACO.
SotTR workarounds are applied with LLVM 8 and older, so reporting
LLVM 9.0.1 should be fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4984>
ANV and RADV have similar Python code for generating extensions
and dispatch tables.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4987>
The GC880 on iMX6DL indicates in it's minorFeatures2 register that it
does support SEAMLESS_CUBE_MAP, however when the TE.SAMPLER_CONFIG1
VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit is set on GC880 on iMX6DL,
the result is corrupted image. In particular, the following ~112 dEQPs
are affected and fail:
dEQP-GLES2.functional.texture.filtering.cube.*
This only happens on MX6DL GC880, MX6Q GC2000 and STM32MP1 GC400(GCnano)
do not report the minorFeatures2 SEAMLESS_CUBE_MAP bit and ignore the
TE_SAMPLER_CONFIG1 VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit (note
that ss->seamless_cube_map is unconditionally set by mesa at times even
PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE returns 0), so there is no visible
problem and there are no failing dEQP tests on the GC2000 and GCnano.
This might imply that the minorFeatures2 SEAMLESS_CUBE_MAP has some
different meaning on GC880 or the SEAMLESS_CUBE_MAP behaves differently
on the GC880.
This patch does not set the SEAMLESS_CUBE_MAP bit on hardware which does
not indicate support for seamless cube map and on GC880, which results
in reduction in failed dEQPs: 635 to 186 on GC880, 274 to 270 on GC2000
and no change on GC400(GCnano).
Fixes: 8dd26fa2f0 ("etnaviv: support GL_ARB_seamless_cubemap_per_texture")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4865>
On a6xx we need a 0,0 based scissor in the binning pass, but can use the
blit-scissor to avoid restore/resolve of untouched pixels, and use the
conditional execution if the IB to bin to skip bins with no geometry
(due to the scissor).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5021>
Similar to what we do in postsched. It is useful for pre-RA sched to be
a bit aware of things that would cause syncs. In particular for the tex
fetches, since the vecN src/dst tends to limit postsched's ability to
re-order them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
If an instruction's only use is as an output, and it increases register
pressure, then try to avoid scheduling it until there are no other
options.
A semi-common pattern is `fragcolN.a = 1.0`, this pushes all these
immed loads to the end of the shader.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
Similar to avoidance of `(ss)` syncs, it turns out to be helpful to
avoid `(sy)` syncs as well. This helps us turn an tex, (sy)alu, tex,
(sy)alu sequence into tex, tex, (sy)alu, alu, which is a big win in
gfxbench gl_fill2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
Once we schedule an instruction that will require an `(ss)` sync flag,
there is no need to delay any further instructions that consume an
SFU result (until the next SFU instruction is scheduled).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
It seems for short frag shaders, too much prefetch can be detrimental.
I think what we *really* want to do is decide after pre-RA sched, when
we also know about nop's and what the actual ir3 instruction count is.
But that will require re-working how prefetch lowering works. For now
this is a super crude heuristic to attempt to approximate a good
solution.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
We can no longer assume that `state->ranges[0]` is block 0. It *often*
is, but when we encounter a "real" ubo that we lower to `load_uniform`
before a block 0 `load_ubo`, it could end up another entry in the table.
Resulting in the second pass after gathering ubo ranges, not finding a
valid range. Which results in a `load_ubo` for a thing that is not
actually a ubo making it's way into ir3 frontend. Resulting in grabbing
what we think is a ubo address out of some unrelated const register, and
trying to dereference that. Which as you can imagine, fails in amusing
ways.
Fixes: fc850080ee ("ir3: Rewrite UBO push analysis to support bindless")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4954>
Corresponds roughly to what we analyze. Note that "terminate AND
execute" is a contradiction (rather: it's equivalent to just
terminating), hence why there are only three possibilities for the
states of the flags:
.cont = continue, don't execute
.last = don't continue, don't execute
.cont.last = continue and execute
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5014>
This is for an optimization I plan in a following commit. I found I had
to add likely()s to avoid a perf regression from branch prediction.
On the drawoverhead 8 UBOs test, the HW can't quite keep up with the CPU,
but if I set nohw then this change is 1.32023% +/- 0.373053% (n=10).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4996>
This commit fixes a GS regression introduced in !4562 where
ir3's GS lowering pass was moved from common code (ir3_nir) to
freedreno-specific code (ir3_shader). For GS support in turnip, we
need to add the GS lowering pass back in, this time in tu_shader.
As for the nir_gather_info change, the GS lowering pass has always
introduced a discard_if intrinsic into the GS. Previously, we simply
ran nir_shader_gather_info before GS lowering, but now since we lower
the GS before we need to remove the assertion that only a FS can use
the discard_if intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4892>
And try a bit harder to find an optimal layout. Improves on a sub-
optimal layout we arrive at in the 4 MRT pass in manhattan, picking
up a bit more than 3%.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4976>
We potentially don't know yet what the resulting scissor bounds are, so
we can't assume this when estimating number of bins per pipe for VSC
size calculations.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4976>
There is an annoying spec corner on Android. Because VkSwapchain is
implemented in the Vulkan loader on Android which may not know about
this extension, we have to handle it as a special case inside the
driver. We only have to do this on Android and only for VkSwapchainKHR.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4882>
It seems this is actually a "minimum pitch" value. For example
TFETCH6_2_BYTE means a minimum pitch of 128 bytes for mipmap levels.
This fixes breakage with compressed formats. For example this test:
dEQP-VK.pipeline.sampler.view_type.2d.format.eac_r11_snorm_block.mipmap.linear.lod.equal_min_3_max_3
Fixes: a34b3fa198 ("freedreno/fdl: Align after dividing by block size")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5009>
The nir_analyze_ubo_ranges pass removes all UBO block 0 loads to reverse
what nir_lower_uniforms_to_ubo() had done, and we only upload UBO pointers
to the HW for UBO block 1-N, so let's just fix up the shader state.
Fixes an off by one in const state layout setup, and some really dodgy
register addressing trying to deal with dynamic UBO indices when the UBO
pointers happen to be at the start of the constbuf.
There's no fixes tag, though this fixes a bug from September, because it
would require the num_ubos fix in nir_lower_uniforms_to_ubo.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4992>