PAN_MESA_DEBUG=overflow will place objects as close as possible to a
protected region at the end of the buffer, so that overflows segfault.
Caught the bugs in all four of the preceding commits.
v2: memset the BO to 0xbb to catch code expecting zeroed allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
create_vertex_elements_state is sometimes called with a too large
num_elements argument, for example with util_blitter, which causes a
buffer overflow.
There is no documentation to forbid this practice, so don't rely on
so->num_elements being correct and instead use the vertex shader
attribute count, which matches the value used to allocate the
descriptors.
Use attributes_read_count rather than attribute_count because the
latter also includes images and PAN_VERTEX_ID/PAN_INSTANCE_ID.
Fixes: 76de3e691c ("panfrost: Merge attribute packing routines")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
Remove the previous compile-time early-ZS implementation and replace it with the
decoupled early-ZS implementation. This uses more efficient settings in some
cases (depth/stencil tests always passes or do not write), and fixes the
settings used in another case (alpha-to-coverage enabled with an otherwise
early-ZS shader.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #6206
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
The new early-ZS helpers are pure functions, leaf nodes of the call graph, and
implemented with a different algorithm from the "oracle" table of correct values
for various combinations of states. Further, incorrect settings often still pass
CTS while causing game bugs or inefficiencies. That combination makes the
helpers an excellent candidate for unit tests. Add some.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
Bifrost (and Valhall) separate early-ZS configuration into two fields: when does
the depth/stencil buffer update happen? and when are pixels killed by the
depth/stencil tests? The driver separately configures these to occur early
(before the shader executes) or late (after the ATEST instruction executes at
the end of the shader). Early tests are generally more efficient, but various
combinations of API state and fragment shader properties can require late
updates and/or late kills for correctness. Determining how to configure these
fields is nontrivial.
Our current implementation (on Bifrost) configures these fields at fragment
shader compile time and bakes the settings into the RSD. This is both wrong
(using early testing when late testing is required) and suboptimal (using late
testing when early testing would suffice). We need to defer this configuration
until draw time, when we know rasterizer and Z/S state.
Reclassifying at draw time (as we currently do on Valhall) would be expensive,
especially with the extra terms added in here. To cope, decouple the shader
classification from the draw-time configuration. Since there are only a few bits
of draw state involved, this implementation just calculates all possible states.
Then the draw time classification is just indexing into a lookup table.
The actual algorithm used to classify is written with correctness and clarity in
mind. Unlike the current classification algorithm (which tries to match what the
DDK does, poorly), this algorithm embeds its proofs of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
Nos that glsl_to_nir is setting sample_shading_enable whenever FB fetch
is used, we don't need to duplicate it here.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
Pandecode is not thread-safe (for a large number of reasons) and does not even
try to be. This is a problem when tracing (or just using PAN_MESA_DEBUG=sync)
multithreaded applications. The most common symptom of the problem are assertion
failures deep in the red-black tree implementation, which is not thread-safe.
Just protect the whole thing by a "in pandecode?" mutex, since this is not
performance sensitive code and we don't really care about the extra
serialization incurred. As pandecode does not recurse into itself, we may simply
lock at the beginning and unlock at the end of each entrypoint in pandecode,
which is thread-safe regardless of how pandecode is used. A few entrypoints are
refactored to avoid early returns to keep the lock/unlock calls in obvious
visual pairs.
Fixes flakes when running the CL CTS with PAN_MESA_DEBUG=sync like we would in
CI (e.g: events.event_flush)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17409>
The physical tile buffer size (and hence the maximum available tilebuffer size)
are implementation-defined. Track this information on the device so we can
correctly select tile sizes, instead of hardcoding the value for Midgard.
Implementation values are pulled from the "Tile bits/pixel" row of the public
Mali data sheet [1]. That row lists the maximum number of bits available for a
pixel given the maximum tile size and pipelining. For currently supported
hardware (v9 and older), that maximum tile size is 16x16. So those values should
be multiplied by (16 * 16 * 2) / 8 to get the physical size in bytes.
This may improve Bifrost/Valhall performance on workloads using multiple render
targets. It also gets us ready for the dazzling array of tile sizes available
with v10.
[1] https://developer.arm.com/documentation/102849/latest/
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>
Separate out "calculating the size of each pixel", "selecting a tile size", and
"calculating the colour buffer allocation". Then implement the middle (selecting
a tile size) with a simple constant time expression, rather than a loop. There's
a bit of related clean up in here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>
Make the separation between entries in the resource table more
obvious.
Increase the indent by two levels to keep descriptors distinct from
the resource entry itself.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
To query the core count, the hardware has a SHADERS_PRESENT register containing
a mask of shader cores connected. The core count equals the number of 1-bits,
regardless of placement. This value is useful for public consumption (like
in clinfo).
However, internally we are interested in the range of core IDs.
We usually query core count to determine how many cores to allocate various
per-core buffers for (performance counters, occlusion queries, and the stack).
In each case, the hardware writes at the index of its core ID, so we have to
allocate enough for entire range of core IDs. If the core mask is
discontiguous, this necessarily overallocates.
Rename the existing core_count to core_id_range, better reflecting its
definition and purpose, and repurpose core_count for the actual core count.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Starting with Valhall, the provoking vertex state is specified per-framebuffer
(batch) instead of per-draw. We use the pan_tristate infrastructure to translate
between desktop OpenGL's per-draw semantics to Valhall's per-framebuffer
semantic. This is notably not required for GLES or Vulkan.
If the provoking vertex is unset when the tiler context is generated, it could
be set (incompatibly) later in the batch, and the tiler context's provoking
vertex field would no longer match the framebuffer's. That would violate a
hardware invariant. To ensure that doesn't happen, we make sure to set provoking
vertexes *before* generating the tiler context so it can't change after.
Fixes arb-provoking-vertex-render on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17068>
Iterating over a util_sparse_array is very expensive; replace this
with a standard dynarray.
Using the sparse 'nodearray' datastructure instead was tested, but
found to be slower in some cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16988>
The hardware writes one CRC per (effective) tile, the tile size of the CRC
buffer is the same as the configured effective tile size. However, all our CRC
infrastructure assumes 16x16 tiles. In case CRC is used with smaller tiles,
buffer overflows and incorrect rendering are all possible. Don't use CRC at
smaller tile sizes. Note disabling CRC correctly invalidates any bound CRC
buffers.
Fixes: 2e97d7c835 ("panfrost: Transaction elimination support")
Closes: #6332
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
It has a single user -- in a section of code that only runs for MFBD GPUs and
that has already decided whether to use CRCs -- so inlining it simplifies its
definition greatly and may avoid redeciding the CRC setting.
[Note for mesa-stable maintainers: This is not a bug fix but is marked for
backport so the next patch applies cleanly.]
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
info.fs.sidefx considers discard() to be a side effect. That definition is...
dubious at best. It certainly isn't the definition needed for forward pixel
kill. The only reason pixels couldn't be killed by FPK is if the shader has side
effects in the sense of writing to memory. Use that more precise condition so
FPK works more often.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #5607
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16984>
The number of L2 performance counter blocks equals the number of L2 slices, so
add a query to get this. This information isn't needed by the Mesa driver, so
don't get it in the default device initialization path.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16803>
The input is specified in two identical structs, tear that apart.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
Much simpler than creating a UBO and relying on it getting optimized to a push
constant, with possible reordering.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
Everything required for conformant OpenGL ES 3.1 support on Valhall (v9) is now
upstream -- all that's left is to enable implementations! Add the GPU ID for the
Mali-G57 implemented in the MediaTek MT8192 system-on-chip.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16890>
We need to pack special AFBC-specific plane descriptors instead of the generic
plane descriptor. Nothing too fancy here, though.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
Add the required handling when packing render target and depth buffer
descriptors on Valhall. This is mostly equivalent to Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
Map a canonical format (a hardware-independent pipe_format) to a compression
mode (Valhall-specific hardware enum defined in GenXML). To be used for packing
plane descriptors and render target descriptors when AFBC is in use on Valhall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
For callers that have a device object, it's easy to pass dev->arch instead of
dev. But this requires callers to have a reference to the device, which is
tricky for callers that only have the arch via PAN_ARCH. Pass dev->arch instead
of dev to accommodate them.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
Tiled headers and bounds checking were introduced with v7. The flags don't exist
on v6. Fix the XML accordingly so we don't accidentally use features too new for
the hardware.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
These check the alignments are correct. Of course, ideally these cases aren't
hit in practice, since it's a waste of memory.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
The header size is the header stride times the number of rows in the header
(number of tiles of superblocks). We already calculate the header stride, so
eliminate the separate header size calculation.
Delete the old header size calculation. It has no notion of wide blocks, let
alone tiled AFBC headers.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
Extract a helper for calculating AFBC strides. This is used in two places in
pan_layout. It will need extension for tiled AFBC, and the extended version
could benefit from unit testing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
Let's keep all the AFBC computations inside the layout code, to keep pan_cs
dumb. This helper will need some extension for tiled AFBC.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
The struct is returned from a function, so in debug builds the address
may change after returning, and pointers to patched_s will be broken.
Pass the pointer to the patched stencil view as a parameter to
pan_preload_get_views to avoid this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16343>