Mark test_full_yaml_log with this new marker to be easily run by the
developers.
Make `debian-testing` skip this test with `not slow` marker hint.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
Implement a log-based retry hint for R8152 issue described in #6681,
which is based on detecting these two consecutive lines:
```
r8152 <USB> eth0: Tx status -71
nfs: server <IP> not responding, still trying
```
Where <IP> and <USB> could be any IP and USB addresses, respectfully.
This commit is a temporary fix since it requires a section-aware log
follower, implemented in !16323. When the cited MR is merged, one will
make a proper fix on top of that.
Closes: #6681
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
if a texture barrier occurs while clears are pending, these clears should
show up if the fb attachments are read in shaders, so trigger a renderpass
to flush out the clears
cc: mesa-stable
fixes#6766
fixes (radv):
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_common.common_advanced_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_common.common_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_common.common_separate_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_indexed.common_advanced_blend_eq_buffer_advanced_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_indexed.common_advanced_blend_eq_buffer_blend_eq
dEQP-GLES3.functional.draw_buffers_indexed.overwrite_indexed.common_advanced_blend_eq_buffer_separate_blend_eq
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17363>
This got removed by mistake and broke
RADV_DEBUG=shaders,nocache,prologs.
Fixes: 9fe2b6b748 ("aco/radv: provide a vs prolog callback from aco to radv.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17413>
The Vulkan specification states:
> Query commands, for the same query and submitted to the same queue,
> execute in their entirety in submission order, relative to each other. In
> effect there is an implicit execution dependency from each such query
> command to all query commands previously submitted to the same queue.
Fixes dEQP-VK.query_pool.statistics_query.reset_after_copy.*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17400>
This seems to be a relic from before aco added per generation opcodes.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17405>
PS is the only shader type where unpacked 16-bit outputs can be used,
so use ac_shader_abi::is_16bit to pass the proper type to LLVMBuildLoad2.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17361>
Outputs are always f32 except for FS that may use unpacked f16.
Store this information here to make it available to later processing.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17361>
This commit replaces LLVMBuildLoad usage by LLVMBuildLoad2
where possible (= where the pointee type is known).
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17361>
Disabling opaque pointers in LLVM doesn't fix all the issues but
it makes pointers non-opaque by default (eg LLVMPointerType()
returns a typed pointer).
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17361>
Fixed:
- VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT was able to stop prim counter
when pipeline stats query is running.
- This may have happened when prim gen query was in secondary cmdbuf.
- VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT counting geometry in each tile.
- VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT counting geometry in each tile
when pipeline stats query is started inside prim gen query and inside
a renderpass.
The matter of VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT and pipeline stats
interaction is solved by tracking whether pipeline stats query is
running both on CPU (for non secondary cmdbuf case) and on GPU (for
secondary cmdbuf).
Note, prim gen query is not allowed with secondary command buffers, so
only pipeline stats query is tracked on gpu.
See https://gitlab.khronos.org/vulkan/vulkan/-/issues/3142
Counting geometry per each tile is solved by:
- Conditionally executing START/STOP_PRIMITIVE_CTRS to not run in tiling
pass. Solves the case when prim gen query is inside a renderpass.
- Stop prim counters before executing `draw_cs` and restarting them
afterwards. Solves prim gen query being outside a renderpass.
Fixes GL CTS tests with Zink + `TU_DEBUG=gmem`:
GTF-GL46.gtf30.GL3Tests.transform_feedback.transform_feedback_max_separate
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_basic
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_framebuffer
GTF-GL46.gtf40.GL3Tests.transform_feedback3.transform_feedback3_streams_overflow
GTF-GL46.gtf40.GL3Tests.transform_feedback3.transform_feedback3_streams_queried
GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_states
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6602
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17163>
For a5xx+ renamed:
- RST_PIX_CNT -> START_FRAGMENT_CTRS
- RST_VTX_CNT -> STOP_FRAGMENT_CTRS
- TILE_FLUSH -> START_COMPUTE_CTRS
- STAT_EVENT -> STOP_COMPUTE_CTRS
I'm not sure about a5xx itself but I'll take a chance of it being
similar to a6xx in this regard.
Knowing this emit_begin_stat_query/emit_end_stat_query can now emit
only events that are needed for the pool's flags.
Also primitive generated query clearly doesn't need fragment and
compute counters.
Passes tests:
dEQP-VK.query_pool.statistics_query.*
dEQP-VK.transform_feedback.primitives_generated_query.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17163>
It's legal to use this instruction in a fragment shader, even if the
graphics pipeline doesn't use MSAA.
Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.sample_n_*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17398>
D3D uses an int which seems like it'd support value between 0 and 100,
but in reality it only accepts values of exactly 0, or 100. The space
is left in case future values were to be added, so that comparisons
would work (e.g. MEDIUM_HIGH < HIGH).
Treat higher than 0.5 to be HIGH, and anything less to be NORMAL.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17406>
Flushing the batch midframe (splitting a renderpass) is expensive on a tiler, as
it requires the GPU to flush the framebuffer contents to main memory and read
them back. Clearing the framebuffer should not trigger a flush. Apps expect
clears to be (almost) free, flushing for a clear is at the very least unexpected
behaviour.
The only reason we previously flushed is to ensure we could always use a "fast"
clear. But a slow clear is a heck of a lot faster than a flush ;-) Instead of
flushing, we should clear with a draw (via u_blitter) in case a fast clear isn't
possible.
This fixes pathological performance for applications that rely on partial clears
within a frame. This issue was identified with Inochi2D, which repeatedly clears
the stencil buffer midframe, in order to implement masking efficiently with the
stencil buffer. In total, the all-important workload of rendering Asahi Lina is
improved from 17fps to 29fps on a panfrost device.
Fixes: c138ca80d2 ("panfrost: Make sure a clear does not re-use a pre-existing batch")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17112>
The last few frames of the trace are expensive (in terms of GPU time) and are
close to hitting the timeout. With the next commit, they do hit the timeout due
to using a larger batch. Nevertheless the next commit should be an overall perf
improvement on average, so this remove to unblock CI.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17112>
Make the separation between entries in the resource table more
obvious.
Increase the indent by two levels to keep descriptors distinct from
the resource entry itself.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
Otherwise the list 'next' changing will cause the assertion in
list_for_each_entry to be hit.
This was not hit before because list_assert is defined for debug
builds but not debugoptimized.
Fixes: 5067a26f44 ("pan/bi: Use flow control lowering on Valhall")
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
I wrote fiction about dreaming that these were supported but after
waking finding that they were not. Somehow I came to consider that
fiction as fact and I never thought to test if they did work.
While reverse engineering the polygon list format, I found that these
were supported after all.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
This maps to CL_DEVICE_MAX_COMPUTE_UNITS, which is defined as:
The number of parallel compute cores on the OpenCL device.
Since all supported Malis are unified shader cores, the number of parallel
compute cores is simply the number of shader cores.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Whereas the compiler needs to know the warp size for lowering divergent
indirects, the driver needs to know it to report the subgroup size. Move the
Bifrost-specific helper to common and add the trivial implementation for
Midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
To query the core count, the hardware has a SHADERS_PRESENT register containing
a mask of shader cores connected. The core count equals the number of 1-bits,
regardless of placement. This value is useful for public consumption (like
in clinfo).
However, internally we are interested in the range of core IDs.
We usually query core count to determine how many cores to allocate various
per-core buffers for (performance counters, occlusion queries, and the stack).
In each case, the hardware writes at the index of its core ID, so we have to
allocate enough for entire range of core IDs. If the core mask is
discontiguous, this necessarily overallocates.
Rename the existing core_count to core_id_range, better reflecting its
definition and purpose, and repurpose core_count for the actual core count.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Float conversions with explicit rounding modes are required for OpenCL,
as well as for Vulkan with the VK_KHR_16bit_storage extension (mandatory
in Vulkan 1.1). Since the hardware conversion instructions allow
configuring the round mode, this is easy to support :-)
Fixes test_half.vstore_half_rtz.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
This got messed up when scalarizing the IR. Fix the definition of the opcode to
return (instead of break, asserting out) and to respect the swizzle (instead of
failing validation). Noticed when bringing up OpenCL on Valhall.
Fixes: 5febeae58e ("pan/bi: Emit collect and split")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17222>