Noticed while debugging OpenCL. I think this was fallout from the CSF decode
rework?
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22228>
OpenCL can generate large loads and stores that we can't support, so we need to
lower. We can load/store up to 128-bits in a single go. We currently only handle
up to 32-bit components in the load and no more than vec4, so we split up
accordingly.
It's not clear to me what the requirements are for alignment on Valhall, so we
conservatively generate aligned access, at worst there's a performance penalty
in those cases. I think unaligned access is suppoerted, but likely with a
performance penalty of its own? So in the absence of hard data otherwise, let's
just use natural alignment.
Oddly, this shaves off a tiny bit of ALU in a few compute shaders on Valhall,
all in gfxbench. Seems to just be noise from the RA lottery.
total instructions in shared programs: 2686768 -> 2686756 (<.01%)
instructions in affected programs: 584 -> 572 (-2.05%)
helped: 6
HURT: 0
Instructions are helped.
total cvt in shared programs: 14644.33 -> 14644.14 (<.01%)
cvt in affected programs: 5.77 -> 5.58 (-3.25%)
helped: 6
HURT: 0
total quadwords in shared programs: 1455320 -> 1455312 (<.01%)
quadwords in affected programs: 56 -> 48 (-14.29%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22228>
We need to respect the ALU swizzle, this takes a vector. Fixes incorrect
pack_64_2x32 translation hit when wiring up lower_mem_access_bit_sizes for
OpenCL.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22228>
We're going to enforce clang-format in CI, so get with the program! This doesn't
change a *ton* all considered, because panvk was already aiming for the style we
have in the panfrost clang-format file.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22372>
We've regressed the clang-formatting in a few places, since we're not enforcing
formatting in CI yet and I think at one point my editor wasn't quite right.
Reapply so we can get to clang-format-clean.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22372>
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists. That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.
To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.
However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.
There's a bit of churn, but this is largely a mechanical set of changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
Now that we upload workaround samplers for txf, sampler 0 is guaranteed to be
valid but other samplers are not. So ignore whatever the current sampler_index
value is (it's formally undefined in NIR) and use 0, which we know is valid. We
already do this on Valhall for OpenCL, just need to generalize for Midgard and
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
Currently, we always use a hierarchy mask with all levels enabled. While this is
efficient for geometry-heavy workloads like 3D games, it is wasteful for 2D
applications that draw very few vertices. For drawing just a few textured quads,
the overhead of small bin sizes outweighs any performance advantages, so it's a
bit slower. More problematically, small bin sizes require tremendous amounts of
memory for the polygon lists, leading to significant memory consumption (~10MB)
for the polygon list for even the simplest of 2D blits.
To reduce our memory footprint, we need to choose our hierarchy masks more
carefully. In general, we want to allow small bin sizes for geometry-heavy
workloads but not for geometry-light workloads. We estimate vertex count in the
driver as a proxy for this, and use a simple heuristic to select a bin size
based on the estimated vertex count. None of this is an exact science, and the
heuristic could probably be tuned. Nevertheless, the heuristic used (comparing
framebuffer size to vertex count) works well in practice, significantly reducing
the memory footprint of 2D applications like Firefox without hurting the
performance of 3D applications.
I originally wrote this patch while diagnosing high memory footprints on my
Midgard laptop, which is why only Midgard is in scope here. On Bifrost and
Valhall, we have a similar hiearchy mask selection problem. It seems likely that
the same heuristic would work there too, but it's a different code path that I
have not integrated or tested. I'll leave that for the adventurous reader, to
get the memory footprint win there too.
(It's also possible the win is smaller on newer Malis than on Midgard, since Arm
claims they optimized the tiler data structures on the newer parts. There's
probably still some merit to the idea.)
On Mali-T860, glmark2 -bdesktop frametime decreased by 1.35% +/- 0.91% at 95%
confidence, showing a slight win for 2D workloads No statistically significant
difference for glmark2 -bshading:shading=phong, since 3D workloads continue to
use the same hierarchy masks.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
In the next commit, we will refine our algorithm to select hierarchy masks based
on the vertex count. In preparation, augment the driver to track rough estimates
of the vertex count so we have a "geometry complexity" input for the heuristic.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
As Intel does. These functions are written with the expectation that they will
be inlined away, allowing gcc's copy-prop and constant folding to eliminate the
template struct and any unused fields.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
6148e3aae7 ("mesa: Fix ctx->Texture.CubeMapSeamless") introduced a hack, where
seamless cube maps would be requested even for GLES2 contexts despite the spec,
on the assumption that GLES2 gallium drivers would ignore the bit. But that
requires Gallium drivers to know what GLES version they advertise, which is a
horrible layering violation. When the commit was written 8 years ago, there were
classic drivers to contend with so it made sense as a fix to get GLES 3.0 up and
running. With classic drivers gone, it's time to sunset the hack and restore the
intended behaviour by setting ctx->Texture.CubeMapSeamless only once we know the
version.
In addition to fixing a semantic issue in the Gallium contract and preventing a
regression from the next commit, this fixes cube maps on Mali-T720 under
Panfrost. In general, Panfrost supports GLES3 (and honours the seamless flag
everywhere) but on T720 we only advertise GLES2 due to missing MRT support on
older Midgard devices, so we need the flag set properly to distinguish these
cases.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21978>
.lava-test hidden job was setting the HWCI_TEST_SCRIPT variable to deqp
runner. But that is not always the case. When we run piglit traces jobs,
we use piglit-traces.sh instead, for example.
Splitting into:
- .lava-test-deqp (deqp-runner + deqp)
- .lava-traces (deqp-runner + piglit)
- .lava-piglit (piglit-runner + piglit)
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22065>
Previosuly it was assumed that primitives where always converted to
triangles if the driver did not support all primitives, however that's
not true for a driver that supports quads but not quad strips.
Fixes piglit spec@!opengl 1.1@dlist-fdo3129-01 on Panfrost
Fixes: dcbf2423d2 ("vbo/dlist: add vertices to incomplete primitives")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21987>
These were removed and replaced by new Bifrost RSD fields, don't print the wrong
values. Harmless but noises up the decoding.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Since 50b82ca818 ("nir/lower_blend,agx,panfrost: Use lowered I/O"),
nir_lower_blend needs to be called after lowering I/O rather than before.
Furthermore, after lowering blend, we need (in general) to lower the resulting
load_output intrinsics. Now that we have a proper preprocess_nir hook, there is
a natural place in panvk_vX_shader to do this.
Fixes dEQP-VK.pipeline.blend.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This will give the driver [notably, PanVK] a chance to lower dual source
blending without having the dual stores turned into store_combined_output_pan.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
If new load_output are created after preprocessing NIR (namely, from blend
lowering in panvk), this lowering needs to be called to lower load_output to the
vendor intrinsic with conversion descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Only the GL driver produces/consumes these, they shouldn't be in the common
shader_info.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Drop the backend compiler sysval handling in favour of the pass in the GL
driver, bringing us into compliance with Ekstrand's rule.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Blend constants are sysvals, it's just that they can sometimes be inlined
depending on the pipeline state. The old "inline blend constant" pass is a
special case of the new "lower all sysvals" pass in panvk.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Per Ekstrand's Rule. This avoids the "fixed sysval" hack that Faith introduced
to get this behaviour with the GL sysval handling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Now the only passes that depend on the shader key can run late, so we can
preprocess ahead-of-time once and throw away the original shader. This reduces
the cost of shader variants, as well as deduplicates some lowering for
transform feedback shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Do it explicitly in NIR rather than implicitly in the Midgard compiler. This
avoids a nasty sideband input for the render target formats and sample count,
for blend shaders on midgard only.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This is a flag-day change to how we compile. We split preprocessing NIR into a
separate step from compiling, giving the driver a chance to apply its own
lowerings on the preprocessed NIR before the final optimization loop. During
that time, the different producers of NIR (panfrost, panvk, blend shaders, blit
shaders...) will be able to (differently) lower system values.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This will be needed to decouple the lowering in the Midgard compiler from the
specific sampler descriptors used in the blit code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Removes a lot of indentation, and improves metadata handling.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
It doesn't make sense for shader stages other than fragment (and blend which is
fragment-like), assert this.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
To prepare for the new compile flow, where this will be called by the driver
instead of internally in the compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
To prepare for the new compile flow, where this will be called by the driver
instead of internally in the compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Nothing in the optimization loop should remat the lowered instructions, so
there's no need to do it inside the loop.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Nothing in the optimization loop should remat the lowered instructions, so
there's no need to do it inside the loop.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This sideband input is now unused, as the information is available locally
within the NIR as it should be.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
It is unused and should stay unused, as any use is a violation of Ekstrand's
rule.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This gets rid of the hidden gl_BaseVertex system value which violates Ekstrand's
rule.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
We need different settings for Bifrost and Valhall. Keeping everything static
simplifies lifetimes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
See previous commits for justification. Later, we'll split up NIR processing in
a few steps to give the caller a chance to lower the sysval, at which point the
goofy inputs here will go away.
v2: Only lower in fragment shaders. Likely harmless to run elsewhere but still
wrong because the location enum is defined per-stage.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>