Noticed while debugging OpenCL. I think this was fallout from the CSF decode
rework?
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22228>
We've regressed the clang-formatting in a few places, since we're not enforcing
formatting in CI yet and I think at one point my editor wasn't quite right.
Reapply so we can get to clang-format-clean.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22372>
Currently, we always use a hierarchy mask with all levels enabled. While this is
efficient for geometry-heavy workloads like 3D games, it is wasteful for 2D
applications that draw very few vertices. For drawing just a few textured quads,
the overhead of small bin sizes outweighs any performance advantages, so it's a
bit slower. More problematically, small bin sizes require tremendous amounts of
memory for the polygon lists, leading to significant memory consumption (~10MB)
for the polygon list for even the simplest of 2D blits.
To reduce our memory footprint, we need to choose our hierarchy masks more
carefully. In general, we want to allow small bin sizes for geometry-heavy
workloads but not for geometry-light workloads. We estimate vertex count in the
driver as a proxy for this, and use a simple heuristic to select a bin size
based on the estimated vertex count. None of this is an exact science, and the
heuristic could probably be tuned. Nevertheless, the heuristic used (comparing
framebuffer size to vertex count) works well in practice, significantly reducing
the memory footprint of 2D applications like Firefox without hurting the
performance of 3D applications.
I originally wrote this patch while diagnosing high memory footprints on my
Midgard laptop, which is why only Midgard is in scope here. On Bifrost and
Valhall, we have a similar hiearchy mask selection problem. It seems likely that
the same heuristic would work there too, but it's a different code path that I
have not integrated or tested. I'll leave that for the adventurous reader, to
get the memory footprint win there too.
(It's also possible the win is smaller on newer Malis than on Midgard, since Arm
claims they optimized the tiler data structures on the newer parts. There's
probably still some merit to the idea.)
On Mali-T860, glmark2 -bdesktop frametime decreased by 1.35% +/- 0.91% at 95%
confidence, showing a slight win for 2D workloads No statistically significant
difference for glmark2 -bshading:shading=phong, since 3D workloads continue to
use the same hierarchy masks.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
In the next commit, we will refine our algorithm to select hierarchy masks based
on the vertex count. In preparation, augment the driver to track rough estimates
of the vertex count so we have a "geometry complexity" input for the heuristic.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19482>
As Intel does. These functions are written with the expectation that they will
be inlined away, allowing gcc's copy-prop and constant folding to eliminate the
template struct and any unused fields.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21848>
These were removed and replaced by new Bifrost RSD fields, don't print the wrong
values. Harmless but noises up the decoding.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Only the GL driver produces/consumes these, they shouldn't be in the common
shader_info.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Drop the backend compiler sysval handling in favour of the pass in the GL
driver, bringing us into compliance with Ekstrand's rule.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Do it explicitly in NIR rather than implicitly in the Midgard compiler. This
avoids a nasty sideband input for the render target formats and sample count,
for blend shaders on midgard only.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This is a flag-day change to how we compile. We split preprocessing NIR into a
separate step from compiling, giving the driver a chance to apply its own
lowerings on the preprocessed NIR before the final optimization loop. During
that time, the different producers of NIR (panfrost, panvk, blend shaders, blit
shaders...) will be able to (differently) lower system values.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This will be needed to decouple the lowering in the Midgard compiler from the
specific sampler descriptors used in the blit code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This sideband input is now unused, as the information is available locally
within the NIR as it should be.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
This gets rid of the hidden gl_BaseVertex system value which violates Ekstrand's
rule.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
We need different settings for Bifrost and Valhall. Keeping everything static
simplifies lifetimes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
Compilers are free to make the assumption that pointers don't violate
strict aliasing. If that assumption is incorrect, as it is with the
framebuffer pointer packing code here, the job can fail.
This depends heavily on the compiler and optimization levels, so it's
hard to reproduce, but it did happen for at least two users running with
-O2 on gcc.
Fixes: 67cbbf9417 ("panfrost: Use framebuffer pointer XML")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8627
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21991>
mesa/st doesn't like to use 24-bit textures, preferring RGBX over true RGB even
for texture views where this isn't valid. Given how silly true RGB is in
practice, I'd rather drop support and fix texture views than go against the
grain and risk more issues down the line since nobody else in tree is testing
these paths and apps really shouldn't be caring.
Fixes page faults in arb_texture_view-rendering-formats_gles3 which tries to
sample an R8G8B8_UINT texture with a R8G8B8X8_UNORM view in one subcase. That
test is now passing reliably.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
This is signed, not unsigned. We were already passing negatives and silently
relying on 2's complement and C to do the right thing. But that's silly. We
should just, actually do the right thing.
Found while struggling to debug primitive-restart-draw-mode.
v2: Update the other architectures too, including a decode_csf.c change for the
v10 incarnation of this v4-era field.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net> [v1]
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21891>
Minimum/maximum LOD and LOD bias are unsigned and signed fixed point formats
respectively. They are not unsigned integers. Introduce fixed-point types into
our GenXML and use them in the XML, rather than packing in sidebands. This makes
the XML more correct and fixes pretty-printing of texture and sampler
descriptors.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20445>
Now that we're working on lowered I/O, passing in the dual source blend colour
via a sideband doesn't make any sense. The primary source blend colours are
implicitly passed in as the sources of store_output intrinsics; likewise, we
should get dual source blend colours from their respective stores. And since
dual colours are only needed by blending, we can delete the stores as we go.
That means nir_lower_blend now provides an all-in-one software lowering of dual
source blending with no driver support needed! It even works for 8 dual-src
render targets, but I don't have a use case for that.
The only tricky bit here is making sure we are robust against different orders
of store_output within the exit block. In particular, if we naively lower
x = ...
primary color = x
y = ...
dual color = y
we end up emitting uses of y before it has been defined, something like
x = ...
primary color = blend(x, y)
y = ...
Instead, we remove dual stores and sink blend stores to the bottom of the block,
so we end up with the correct
x = ...
y = ...
primary color = blend(x, y)
lower_io_to_temporaries ensures that the stores will be in the same (exit)
block, so we don't need to sink further than that ourselves.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21426>
This is one step towards lowering I/O during shader preprocess rather than at
variant create time, which helps mitigate shader variant jank. It's also a lot
simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> [v1]
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20836>
Harmless in practice (so no need to backport) but still very wrong. Noticed
looking at traces of Dolphin trying to debug acute misrendering.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20780>
Known unsound code.
So far I'm not convinced transaction elimination is doing us much good. Even in
synthetic glmark style benchmarks this seems to be a few % hit at most. Given
that transaction elimination is unsound by design, and that panfrost's
implementation is buggy in several places and getting it right (up to the
unsoundness of the hardware feature itself) would take actual engineering
effort, and the priority is making glamor work... disabling is the obvious
choice here.
For now, we leave the code but gate it behind a env var
flag (PAN_MESA_DEBUG=crc) rather than defaulting to enabled unless
PAN_MESA_DEBUG=nocrc is set. This way, we can still experiment with it if we
need that data ("what performance could we gain if we had this feature,
unsoundness be damned?"). That said, I'm not really ok with having unsoundness
on my devices, y'know? Back of the napkin math suggests that it's not unlikely
that somebody has hit a transaction elimination collision in the wild with the
DDK.
Boils down to values.
Closes: #8113
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21258>
Add support to pandecode for Mali architecture v10, featuring the new command
stream frontend (CSF). This replaces the "job chain" with a new Command
Execution Unit (CEU) that runs a domain-specific assembly language. That
requires us to refactor pandecode substantially, splitting out JM-only code from
shared JM/CSF common code, and adding new CSF-only decode routines to
disassemble and interpret CSF command streams and pretty-printing the
data structures hit.
This is of course impossible to do properly, since the CEU is pretty easily
Turing-complete and hence subject to the halting problem. But we implement some
simple heuristics to follow jumps that are just good enough for the simple
command streams emitting by both the DDK and Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20837>
Correct some errors from the file's initial check in, as we're about to add
corresponding pandecode changes for the file.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20837>
opaque should not be set when logicops are enabled, that needs blending
even on Bifrost. Fixes is for when I believe the bug became possible to hit.
The logical error is older.
Fixes Piglit logicop tests again.
Fixes: d849d9779a ("panfrost: Avoid blend shader when not blending")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20685>
This isn't allowed for the same reason that AFBC of regular luminance-alpha
isn't allowed (and will raise DATA_INVALID_FAULTs). Reorder the checks to
ensure these formats are checked.
Fixes Piglit texwrap GL_EXT_texture_sRGB-s3tc.
Fixes: 476be5cb27 ("panfrost: Don't use texture format swizzles on v7")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20686>
Now that renderonly.h includes util/simple_mtx.h, which itself includes
valgrind.h, dep_valgrind is required by any module that includes
renderonly.h.
In file included from ../src/gallium/auxiliary/renderonly/renderonly.h:33,
from ../src/gallium/winsys/kmsro/drm/kmsro_drm_winsys.c:39:
../src/util/simple_mtx.h:34:12: fatal error: valgrind.h: No such file or directory
34 | # include <valgrind.h>
| ^~~~~~~~~~~~
compilation terminated.
dep_valgrind is part of idep_mesautil, which should be used instead of
copying the list of deps for each util header included (which would
have to be updated every time a util header changes its own includes),
so let's add idep_mesautil everywhere that includes renderonly.h.
Fixes: ad4d7ca833 ("kmsro: Fix renderonly_scanout BO aliasing")
Tested-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20530>
The goal is to make files at the root of src/compiler/ apply to both Bifrost and
Valhall, while ISA-specific code (e.g. instruction packing) code goes in
compiler/bifrost/ or compiler/valhall/. This is what Valhall is already doing,
the Bifrost specific stuff was just grandfathered in.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20455>
This is the compiler for both Bifrost and Valhall, and presumably future
Mali GPUs too. Give it a more generic name so we can use the bifrost/ path for
something a bit more specific.
For historical reasons the compiler's name is still "bifrost" and uses the
prefix `bi_`. I think that's ok in the same way that i915 in the kernel supports
way more than just i915.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20455>
When importing a BO, if it is already imported, then the handle will
alias an existing BO instance. It is possible for the existing owner to
free the BO after the import and leave a dangling handle before we get a
chance to increase the refcount, so we need to lock the BO table mutex
before importing, to make sure nobody else goes through the free path
during that window.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20403>
This switches us over to Mesa's code style [1], normalizing us within the tree.
The results aren't perfect, but they bring us a hell of a lot closer to the rest
of the tree. Panfrost doesn't feel so foreign relative to Mesa with this, which
I think (in retrospect after a bunch of years of being "different") is the right
call.
I skipped PanVK because that's paused right now.
find panfrost/ -type f -name '*.h' | grep -v vulkan | xargs clang-format -i;
find panfrost/ -type f -name '*.c' | grep -v vulkan | xargs clang-format -i;
clang-format -i gallium/drivers/panfrost/*.c gallium/drivers/panfrost/*.h ; find
panfrost/ -type f -name '*.cpp' | grep -v vulkan | xargs clang-format -i
[1] https://docs.mesa3d.org/codingstyle.html
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20425>
There are too many problems with indirect draws on v7 that we never got this
code path to the finish line, and none of us have a good plan (or reason) to fix
this. Proper indirect draws are only possible since v10 on Mali.
There was interest in using this path to implement indexed draws in PanVK, that
MR is stalled and it's not clear how much sense it makes to do Vulkan on
anything older than v9 or v10 at this point. This code isn't *gone*, it'll still
be in git history, but I don't see a lot of reason in keeping it in tree if it's
unused and complicating e.g. the sysval upload path of the driver.
Indirect dispatch remains supported on v7, as that path *is* working and flipped
on for end users. Indirect dispatch on v7 is considerably less complicated than
indirect draws.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20420>
Add the GenXML hardware description for Mali architecture v10, as implemented in
Mali-G610. This is not 100% complete but it should be good enough for parity
with v9.
The XML itself is forked off of v9, with all Job Managerisms replaced with
CSFisms. This notably includes a large number of new structures defining the
instructions that run on the Command Execution Unit (CEU).
This is the first step towards supporting Mali-G610 (i.e. RK3588) upstream. Next
up will be pandecode support.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20360>