The kernel driver has a range of valid priority values that can
be supplied to it, submitting any priority value outside these
bounds will result in `-EINVAL`. To avoid this, the priority
value is now clamped to the range that the kernel supports.
Fixes: 0c6fbfca0c
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18389>
It is just a renamed VK_ARM_rasterization_order_attachment_access.
Zink depends on it to expose KHR_blend_equation_advanced_coherent
Passes GL tests via Zink:
dEQP-GLES31.functional.blend_equation_advanced.*
KHR-GLES31.core.blend_equation_advanced.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18420>
There are more magic regs which have different values between GPU
subgenerations than we specified.
The updated list and values where obtained by using libwrapfake
with v631 blob and dEQP-VK.draw.renderpass.basic_draw.draw.triangle_list.1
vk cts test.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18229>
The implementation of isa_decode(..) is already part of isaspec. So lets
move the function declaration and some related structs to a src/isaspec.
Also make the header C++ safe.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18403>
Acked-by: Juan A. Suarez <jasuarez@igalia.com> # for broadcom
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> # for zink
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18318>
A subpass in gfxbench has the depth buffer present, but not written to,
for a render pass using the depth buffer as an input attachment. We can
skip single-prim-mode and the associated "oh no don't use sysmem" in that
case.
Improves gfxbench vk-5-normal perf by 1.56193% +/- 0.0743035% (n=14).
Part of #6327.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18241>
This is the standard pattern in the kernel for providing vfunc tables
for C objects. We're using it in the pipeline cache code but we're
about to start adding more stuff and so it really helps if we have it
for command buffers as well.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18324>
Most other init functions follow the Vulkan API convention of putting
the parent object first.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18324>
This is a counterpart to the previous commit. When we have multiple
depth attachments, in the secondary we currently don't disable LRZ and
so we may need a valid LRZ fast-clear base.
Fixes: 4b5f0d98 ("tu: Overhaul LRZ, implement on-GPU dir tracking and LRZ fast-clear")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18369>
In
dEQP-VK.renderpass.dedicated_allocation.attachment_allocation.input_output.94
we have the following:
- There is more than one subpass, but only one depth attachment.
- The first subpass doesn't use depth.
- The subpass that does use depth has a draw call in a secondary.
We wouldn't hit the case where there's more than one depth attachment,
but because tu_begin_resumed_renderpass() only looked at the first
subpass it wouldn't find the depth attachment and would leave LRZ
invalid and thus a NULL LRZ fast-clear base. Then
tu_begin_secondary_cmdbuf() would leave LRZ enabled and the draw would
have LRZ enabled, leading to a hang.
Fix this by making tu_begin_resumed_renderpass() match
tu_begin_renderpass() with how it finds the depth attachment.
Fixes: 4b5f0d98 ("tu: Overhaul LRZ, implement on-GPU dir tracking and LRZ fast-clear")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18369>
We had a mix of common-macro-and-chip-overrides in static decls and plus
more overrides later in C code. It's way cleaner to just have a static
decl for the base options and chip overrides in C code.
This moves a few things (lower_cs_local_index_to_id, lower_wpos_pntc,
lower_int64_options) to the common static decl that had been pasted into
both a3xx-a5xx and a6xx.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18327>
We'd keep incrementing the costs in a cmd buffer's dynamic_pass on each
BeginRendering. This fixes the main renderpass of aztec ruins on zink to
use gmem, taking fps from ~8 to ~10.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18352>
We weren't setting LOCAL, so unless freedreno GL had set it since the GPU
woke up, we wouldn't get it.
This requires moving the GLOBAL unsetting out of tile_store's IB, since it
would never be executed when it mattered, anyway.
No perf difference detected on gfxbench vk-5-normal, or ANGLE minecraft,
genshin, and pubg.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18187>
Move MESA_TRACE_* to the new file.
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18260>
For the virtgpu backend, immediately mmap'ing a buffer can be expensive
(ie. require a sync with host), so for small transfers we'd prefer to go
the upload path.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18258>
A better explanation for SP_HS_WAVE_INPUT_SIZE is that it is the size
of local memory to allocate per wave (which can be more than one
patch), in 256B units.
Then the maximum of 64 makes sense because only 16KB of local memory
is reserved for VS<->HS linkage.
The resulting formula matches the blob behaviour, even when
patch_control_points and tcs_vertices_out have different values,
while the past formula gave wrong answers on gen3+.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Suggested-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17957>
Mirrors 31835ac3b8 change in freedreno.
Together with "tu: Fix HS input size formula for gen3+" fixes following
tests from GL CTS running via Zink:
dEQP-GLES31.functional.tessellation.invariance.inner_triangle_set.quads_fractional_odd_spacing
dEQP-GLES31.functional.tessellation.invariance.inner_triangle_set.triangles_fractional_odd_spacing
dEQP-GLES31.functional.tessellation.invariance.primitive_set.triangles_fractional_odd_spacing_ccw
dEQP-GLES31.functional.tessellation.invariance.primitive_set.triangles_fractional_odd_spacing_cw
dEQP-GLES31.functional.tessellation.invariance.triangle_set.triangles_fractional_odd_spacing
dEQP-GLES31.functional.tessellation.primitive_discard.quads_fractional_odd_spacing_ccw
dEQP-GLES31.functional.tessellation.primitive_discard.quads_fractional_odd_spacing_cw
dEQP-GLES31.functional.tessellation.primitive_discard.triangles_fractional_odd_spacing_ccw
dEQP-GLES31.functional.tessellation.primitive_discard.triangles_fractional_odd_spacing_cw
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17957>
Set cmd->trace_renderpass_end after tu6_emit_tile_store in case of gmem.
To be able to do that, we push the update of cmd->trace_renderpass_end
down into tu_cmd_render_tiles/tu_cmd_render_sysmem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18238>
These would only have worked in GCC and Clang, which so far wasn't an
issue, but let's clean it up anyway.
Cc: mesa-stable
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18190>
It was discovered that a substantial amount (several GiB) of
private memory was being used by Skyline Emulator as it used a
substantial amount of pipelines with private memory that were
never deleted throughout the lifetime of the application.
These private memory allocations are now pooled into per-device
BOs shared among several pipelines instead of a single BO for
every pipeline, this reduces the memory footprint of private
memory allocations from several GiB to 8 MiB in Skyline Emulator
on certain titles.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7033
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18073>
If BO is freed while kernel considers it busy, our VMA state gets
desynchronized from kernel's VMA state because kernel waits
until BO stops being busy. And whether BO is busy kernel decides at
submission granularity.
On the other hand in Vulkan we may free resource as soon as we know
it won't be used.
Not completely reverting the changes in hope that proper resolution
would be found soon.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7106
Fixes: e23c4fbd9b
("tu: Switch to userspace iova allocations if kernel supports it")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18201>
gmem is a fractional run of the full caselist, and one of them showed up
crashing on a630_vk_full after the deqp-runner uprev. Add all of them so
we don't fail on the next reshuffle either.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17932>
A later CmdBindPipeline would shrink the two draw states' sizes to the
number of VBs the pipeline actually uses, but we can save some CPU
overhead and memory by not emitting all the unused VBs as well.
Improves zink drawoverhead throughput on test 5 (1 VB change) by 38.5178%
+/- 0.48738% (n=18).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17932>