Instead of storing the vertex mask per wave into LDS and then computing
the prefix sum, store 8-bit bitcounts (vertex counts) of the vertex masks
into LDS. This allows us to compute the sum using v_sad_u8, which computes
a sum of 4 i8vec4 components in one instruction.
Each i8vec4 of vertex counts is loaded in parallel threads (one dword
per thread) instead of all being loaded in thread 0, and readlane copies
them to SGPRs instead of readfirstlane.
LDS is no longer initialized before culling. Instead, the counts for
inactive waves are masked with AND later.
Incorrect old comments are also fixed.
This change removes 80 bytes from the code size, and it allows increasing
the workgroup size from 128 to 256. (which is the main motivation for this)
Now changing the workgroup size with wave64 has no effect on the code size.
Switching to wave32 with 8 waves even generates slightly smaller code than
wave64 with 4 waves.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
I don't know which change fixed this, but I can't reproduce the hang anymore.
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
Uncached access can be slow if the box is not aligned nicely.
Also, caching in L2 might enable bigger PCIe bursts.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
If a shader has no param exports (no varyings), the pixel shader can start
after the VS position is written before the vertex shader finishes.
The fix is to wait for the memory stores before the position export.
The code needs to be restructured. First prepare param exports to get
nr_param_exports, then emit position exports with the wait, and then
emit param exports.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
This removes confusing register types due to deduplication, such as:
"name": "SQ_WAVE_TTMP10",
"type_ref": "SPI_SHADER_USER_DATA_PS_0"
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813>
This way if there's a fault, the pipeline won't accidentally pass
and let bugs slip into main. This seems to have occurred on both T720
and G72, leading to flakes on both. I want flakeless CI, so this is a
step in eliminating flakes before they hit the tree.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10938>
It's reliably faulting in CI but not locally, and I can't figure out
what the difference could possibly be. Regardless I can't fix the fault
otherwise, and faultless CI matters more than losing a single trace
(from an app I manually test anyway).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10938>
We previously allocated only 16MB, but this isn't always enough. Now
that we have growable (heap) on recent kernels, there's not much reason
to try to shrink this allocation.
Fixes OUT_OF_MEMORY fault on furmark trace.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10938>
This allows failing fast (optionally still tracing, if set with
PAN_MESA_DEBUG=trace) when a GPU fault is introduced. This is better
behaviour for both use cases:
1. When debugging a known fault, setting this mode together with trace
will stop the driver as soon as a buggy command stream is submitted,
and the offending stream will be the last trace file.
2. When running test suites (particularly in CI), setting this mode
will detect faults and crash, causing the pipeline to fail fast as
opposed to incorrectly marking the run green if the test happens to
pass despite the faults and slow downs.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10938>
Due to Midgard ABI silliness. We could fix this properly but I'm not
aware of any app that needs more than 16, so let's just revert to the
behaviour matching the DDK.
Fixes: fdbf8c96fe ("panfrost: Use natural shader limits")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10938>
Some resources like backbuffers are explicitly flushed by the frontend
at the appropriate time, others however won't get flushed explicitly.
Remember those resources when they get emitted as a render buffer and
flush them on a context flush to make their content visible to other
entities sharing the buffer.
We still keep the optimized path for most resources where the frontend
promises to do the flushing for us and only enable implicit flushing
when a buffer handle is exported/imported without the
PIPE_HANDLE_USAGE_EXPLICIT_FLUSH flag set.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7603>
surf->base.texture is already assigned earlier via a proper
pipe_resource_reference call. Remove the superfluous assignement.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7603>
dri2_resource_get_param() is called from two different places right now.
Only one of them adds the EXPLICIT_FLUSH hint to the handle usage, which
may disable the optimizations provided by this hint without a reason.
Make sure to always add this hint when appropriate.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7603>
There is an effort to drop the "Gen" prefix from much of our codebase.
This just applies this to the metrics.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
Adding a register, similar to what was done for RenderBasic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
The availability of those counters depends on the topology.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>