Figured this while debugging a GPU hang with a simple CTS test. This
is to make sure data written by the CP are coherent on the CPU.
This also explains spurious GPU hang reports generated for Hitman 3
that made no sense without it. Now it's clear that this game hangs
after a DRAW_INDEX_INDIRECT packet.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17183>
The driver can't assume sample positions at (0.5, 0.5) when user
sample locations are used.
This doesn't fix anything in practice because NGGC is only enabled by
default on GFX10.3 and that extension is currently disabled on GFX10+,
but I would like to expose it at some point.
This fixes dEQP-VK.pipeline.*.sample_locations_ext.verify_location.*
(when the extension is enabled locally).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17228>
radeonsi may add extra dword to the stride, so let's pass it
directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16788>
nir_intrinsic_load_ring_esgs_amd
nir_intrinsic_load_ring_es2gs_offset_amd
Will be used by esgs lowering.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16788>
Before lower TES load input to load buffer, mark this flag for this
intrinsic, otherwise we get corruption with GFX10 after the lowering.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
This can remove special handling of tessfactors which also benifit
the nir lower pass which does not handle these as system value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
Remove the store_tcs_outputs abi, we can use common output abi
to handle the tessfactor pass as vgpr.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
This is used by radeonsi to save some lds space when all LS output
is passed by register.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
radv will lower this intrinsic before gets to llvm, so we just need to
implement it in radeonsi.
The tes version will be used in tess lower too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
Used by radeonsi and radv to reflect true wave size used, not minimal size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
radeonsi won't emit tess factor in the lower pass, need to keep
the output for llvm backend to pass it as parameter. This is used
by radeonsi for an optimization to save LDS write.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
radeonsi load this from SGPR arg, can't use static value because TCS output
and TES input may not match (TCS output is not a key for TES) and
determined in runtime.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
Also add radv and radeonsi implementation. Will be used in tess lowering.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
Create nir passthrough shader with explicit input/output and vertex
output count so that it can be handled by compiler same as user tcs.
The drawback is we create more si_shader_selector with different
input/output and vertex output count which was handled by compiler
backend before.
As fixed function tcs can be handled like user tcs, we don't need
the dedicated fixed_func_tcs_shader state either.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
We can do this parse with original nir instead of shader key pass
applied nir in si_get_nir_shader.
This can free si_get_nir_shader to just use si_shader as parameter.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16705>
nir lowering already call this with wave id check, no need to
check inside ac_build_sendmsg_gs_alloc_req again.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17130>
Needs more copy propagation before nir_opt_derefs picks it up.
Note that the full general problem of opaque types stored in
intermediate variables is still open, but that seems like a whole
can of worms, and no sense to have gfxbench stay broken during the
time it takes to solve that.
Cc: mesa-stable
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5945
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17012>
We tracked this down with the HW teams back in 2020 and there's now a
documented workaround. Comments from the HW team say this applies all
the way through XeHP but we're not sure beyond that.
This is a bug that we hit but the Windows drivers didn't because Jason
decided to allocate our memory structures from the top end of the VMA
range explicitly to catch bugs like this, while Windows allocates from
zero and up, so they would need to allocate more than 2GB of dynamic
state before running into it.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4880
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17216>
This lets us avoid the code duplication between BeginRendering and
BeginCommandBuffer and also lets us stop crawling core render pass
structs directly and instead focus on dynamic rendering concepts.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16953>
Instead of making drivers dive into the render pass and framebuffer
themselves, provide a helper that constructs a VkRenderingInfo for a
render pass resume that they can use instead. This should reduce code
duplication between driver implementations of BeginRendering and
BeginCommandBuffer.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16953>
get_dword_size() is misleading, its name implies it's returning
a size in dwords, but it's actually returning a size in bytes.
This led to a wrong size passed to emit_cbv(). Instead of fixing
get_dword_size(), let's inline the code in emit_ubo_var().
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17230>
The original offset value is overwritten in our first for(i: num_states)
iteration, messing up the compute push constant update if stageFlags
applies to both compute and graphics.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17233>
If we don't reset ctx.vm_cnt/gpr_map/etc, this will spam a lot of
s_waitcnt instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17207>
This commit changes to the physical device limits which were
missed during the 1.17 transition.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17206>
This should allow the compiler to optimize this out because it knows that
cmd_buffer is NULL.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17166>
this should be simpler to read through and maintain while providing
the same results as well as some possible perf and compile time improvements
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17192>
update_barriers has steadily grown more and more complex when the original
idea was for it to be a small function to handle deferred jit barriers and
simplify sync in patterns like bind_ubo -> draw -> buffer_subdata -> draw
instead, track the pending barrier info at bind time so that the stages and
access are already updated by the time draw/compute are reached
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17192>
this avoids the scenario where the full bo size isn't accounted for because
no variable for the block has been created
cc: mesa-stable
affects:
KHR-GL33.shaders.uniform_block.random.all_per_block_buffers.3
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17217>
GL_MAP_UNSYNCHRONIZED_BIT depends on the app having its threading
handled correctly. This allows us to force disable the bit when
they get it wrong.
CC: 22.1 22.0 <mesa-stable>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17199>
these values should have all been set during pipeline compositing above,
so reapplying the values is at best, redundant, and at worst, broken
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17219>
only the viewMask parameter of VkPipelineRenderingCreateInfoKHR can
be accessed in the fragment stage, so for pipeline libraries it should
be assumed that zs attachments exist for the purpose of copying dynamic
state values, and then these dynamic states will naturally be pruned
during final pipeline construction if the attachments turn out to not
be present
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17219>
[Alyssa: Add a default CPU implementation of pipe->clear_buffer(). This hook is
mandatory for OpenCL support. Even though this implementation isn't optimal by
any means, having a conformant default available in core will lower the barrier
of entry to OpenCL support.]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16044>
It is now unused. We cannot yet remove the streamout functionality in u_blitter
as r600g still uses it for clear_buffer on GPUs older than Evergreen.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17142>
r600g is the only user of util_blitter_copy_buffer in tree, which implements
buffer copies with streamout. This path for r600g was added in 8ac9801669
("r600g: accelerate buffer copying"), a commit from 2012. At that point there
was no DMA path for buffer copies. Since then, a DMA path has been added,
conditional only on the kernel version -- not the hardware. It appears the
required kernel support has been mainline for at least 4 years now. Mesa 22.2
doesn't need to provide optimal performance on an old kernel -- for performance,
a DMA-capable kernel should be used, and for compatability, the CPU fallback
(used for unaligned buffers as it is) is still available. Remove the streamout
path "in the middle" that appears ~unused today.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17142>
This really, really helps on platforms where fabs() isn't free. A great
many shaders use a * frsq(fabs(fdot(a, a))) to normalize a vector.
Since the result of the fdot must be non-negative, the fabs can be
eliminated by an existing algebraic rule.
shader-db results:
r300 (run on R420 - X800XL)
total instructions in shared programs: 1369807 -> 1368550 (-0.09%)
instructions in affected programs: 59986 -> 58729 (-2.10%)
helped: 609
HURT: 0
total vinst in shared programs: 512899 -> 512861 (<.01%)
vinst in affected programs: 1522 -> 1484 (-2.50%)
helped: 36
HURT: 0
total sinst in shared programs: 260690 -> 260570 (-0.05%)
sinst in affected programs: 1419 -> 1299 (-8.46%)
helped: 120
HURT: 0
total consts in shared programs: 957295 -> 957230 (<.01%)
consts in affected programs: 849 -> 784 (-7.66%)
helped: 65
HURT: 0
LOST: 0
GAINED: 3
The 3 gained shaders are all vertex shaders from XCom: Enemy Unknown.
I'm guessing that game is never going to run on my X800XL. :)
i915
total instructions in shared programs: 791121 -> 780843 (-1.30%)
instructions in affected programs: 220170 -> 209892 (-4.67%)
helped: 2085
HURT: 0
total temps in shared programs: 47765 -> 47766 (<.01%)
temps in affected programs: 9 -> 10 (11.11%)
helped: 0
HURT: 1
total const in shared programs: 93048 -> 92983 (-0.07%)
const in affected programs: 784 -> 719 (-8.29%)
helped: 65
HURT: 0
LOST: 0
GAINED: 36
Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16702250 -> 16697908 (-0.03%)
instructions in affected programs: 119277 -> 114935 (-3.64%)
helped: 1065
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 4.08 x̃: 4
helped stats (rel) min: 0.48% max: 10.17% x̄: 3.66% x̃: 3.94%
95% mean confidence interval for instructions value: -4.26 -3.89
95% mean confidence interval for instructions %-change: -3.76% -3.56%
Instructions are helped.
total cycles in shared programs: 880772068 -> 880734134 (<.01%)
cycles in affected programs: 2134456 -> 2096522 (-1.78%)
helped: 941
HURT: 324
helped stats (abs) min: 2 max: 2180 x̄: 123.06 x̃: 44
helped stats (rel) min: 0.04% max: 49.96% x̄: 7.08% x̃: 3.81%
HURT stats (abs) min: 2 max: 2098 x̄: 240.33 x̃: 35
HURT stats (rel) min: 0.04% max: 77.07% x̄: 12.34% x̃: 3.00%
95% mean confidence interval for cycles value: -47.93 -12.04
95% mean confidence interval for cycles %-change: -2.87% -1.34%
Cycles are helped.
No shader-db changes on any other Intel platform.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17181>
The modifiers are u_vectors, but the code was trying to access them
as dynarrays. This resulted in a wrong number of modifiers, which then
later on would also lead to invalid reads used as modifiers.
In the case of the iris driver, a wrongly read number of modifiers > 0
would also trigger an error message.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6643
Fixes: b5848b2dac ("egl/wayland: use surface dma-buf feedback to allocate surface buffers")
Reviewed-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17180>
This is similar to vk_shader_module_to_nir only it takes a
VkPipelineShaderStageCreateInfo and handles
VK_KHR_graphics_pipeline_library semantics for when a
VkShaderModuleCreateInfo is provided instead of an actual module.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17196>
This change involves two enums:
* rogue_texstate.xml: All COMPRESSED_* members of FORMAT are moved
to FORMAT_COMPRESSED (without the prefix). A second field is added
to IMAGE_WORD0 (texformat_compressed) which overlaps with the
original (texformat), and
* rogue_pbestate.xml: REG_WORD0_LINESTRIDE was not a real enum; it's
removed entirely. It only has value when feature
pbe_stride_align_1pixel is present, so a FIXME comment was added to
this effect.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17204>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking rop_reads_dst suggests that it may be
null, but it has already been dereferenced on all paths leading to the
check.
Fixes: 94be0dd0b8 ("tu: Implement extendedDynamicState2LogicOp")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17099>
SSBO access works very differently from UBO access. Straddling
loads/stores isn't an issue, loads/stores instead must be aligned to the
element size and can have up to 4 components.
We support 16-bit access with SSBOs on a650+, and sometimes the
vectorizer tries to create a misaligned 32-bit access when combining
32-bit and 16-bit accesses. The UBO-focused logic didn't reject this,
which is now fixed. This fixes a number of VK-CTS regressions on a650+.
Fixes: bf49d4a084 ("freedreno/ir3: Enable load/store vectorization for SSBO access, too.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17040>
The test can compile, but can not pass, so compile it but not running it
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
error message:
```
../../src/gallium/winsys/d3d12/wgl/d3d12_wgl_framebuffer.cpp:231:42: error: no matching function for call to 'operator new(sizetype, d3d12_wgl_framebuffer*&)'
231 | new (fb) struct d3d12_wgl_framebuffer();
| ^
<built-in>: note: candidate: 'void* operator new(long long unsigned int)'
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
Because we may compile mesa with both rtti=enabled and rtti=disabled because of LLVM
Fixes errors:
../../src/gallium/drivers/d3d12/d3d12_video_enc_h264.cpp:777:7: error: 'dynamic_cast' not permitted with '-fno-rtti'
777 | dynamic_cast<d3d12_video_bitstream_builder_h264 *>(pD3D12Enc->m_upBitstreamBuilder.get());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
Reprogram SF CLIP viewport pointer by not skipping its
dirty flag bit.
Many thanks to Lin, Shuicheng <shuicheng.lin@intel.com>,
Jerez Plata, Francisco <francisco.jerez.plata@intel.com>,
Graunke, Kenneth W <kenneth.w.graunke@intel.com>,
and others for their great help.
Signed-off-by: Zhang, Jianxun <jianxun.zhang@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17171>
We need to push loop nesting to handle this correctly -- at the end of
the innermost loop, the correct nesting is 1 (from the if), not 0.
Fixes assertion failure in
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17128>
We had it set up for arm64 asan already, do it for everyone else too. In
cleaning up the duplication, this fixes a pasteo in rpi3 which had the
"artifacts: false" on the wrong job, causing it to do a slow download of
the mesa build from gitlab.
Doing this required also moving the ".use-debian/arm_test" in as well, so
that its "needs:" didn't overwrite ours if it appeared after us in the
consumer's "extends:"
Should save about 20 seconds on rpi3 jobs.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17146>
There are several places that should have supported the various sized
versions of bcsel and the various nir_op_[fi]csel_* opcodes. Rather
than enumerate the whole list, add a property.
v2: Make the comment for NIR_OP_IS_SELECTION more descriptive.
Suggested by Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
For example, the proof for this pattern
(('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)),
would be
bcsel(a < 0, b, c)
bcsel(!(a < 0), c, b)
bcsel(a >= 0, c, b)
fcsel_ge(a, c, b)
However, !(a < 0) => (a >= 0) is well known to produce different
results if `a` is NaN.
Instead of that replacement, use this replacement:
bcsel(a < 0, b, c)
bcsel(-0 < -a, b, c)
bcsel(0 < -a, b, c)
fcsel_gt(-a, b, c)
This is NaN-safe and exact.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Fixes: 0f5b3c37c5 ("nir: Add opcodes for fused comp + csel and optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
This was missing, and the added validation caught it.
Fixes: 708c47e663 ("nir: Validate nir_tex_instr::dest_type bitsize")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17172>
All 4 jobs had a total of about 26 minutes of runner time, so squish them
onto 3 runners and use gbm for the .shader_tests to avoid X overhead and
hopefully succeed with full concurrency.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17172>
min/max pointsize clamping affects the value that must be used,
meaning that it may not be 1.0
in the case where clamping changes the value from 1.0, ensure the shader
export path is used if attenuation isn't enabled
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17145>
point size min/max values are provided through the state vars, so ensure
these are always applied in order to respect ARB_point_parameters
cc: mesa-stable
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17145>
As it turns out, MOVs weren't the only instructions that blocked precise
flags propagation in the transition to nir-to-tgsi.
This commit fixes some rendering regressions caused by a4a34cd3.
Fixes: a4a34cd3
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collanora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17144>
The RADEON_GEM_USERPTR_ANONONLY flag is hardcoded here which excludes
shared memory pages. DRM is actually capable of supporting shared file-
backed memory, but only if it's read-only. This mutability intent has to
be conveyed through the stack, so a flags argument is added to the winsys
regime to pass RADEON_FLAG_READ_ONLY.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16115>
This needs to be accurate so that when we split and then schedule a new
a0.x/a1.x/p0.x write we will eventually make progress. It wasn't taking
the kill_path into account which could create an infinite loop as we
keep scheduling writes whose uses are blocked because they are memory
instructions not on the kill_path.
Closes: #6413
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16635>
Even if there is libdrm we shouldn't use it if KGSL is selected.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17173>
For Turnip with KGSL we may have perffeto enabled but we don't
have libdrm.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17173>
Even with a noop FS, the color blend state can still be non-zero, and
then SPI color related registers won't be 0 and this would hang.
Fixes: bdf3797aeb ("ac,radeonsi: don't export null from PS if it has no effect on gfx10+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17169>
This matches specifications of both color and ZS fetch extensions.
Cc: mesa-stable
Signed-off-by: Pavel Asyutchenko <sventeam@yandex.ru>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13979>
This extension adds built-in variables gl_LastFragDepthARM and gl_LastFragStencilARM
which can be implemented almost the same as gl_LastFragData from color fetch extension.
Signed-off-by: Pavel Asyutchenko <sventeam@yandex.ru>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13979>
st/mesa will expose GL_ARM_shader_framebuffer_fetch_depth_stencil
if this new capability is supported by the driver.
Signed-off-by: Pavel Asyutchenko <sventeam@yandex.ru>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13979>
wezterm in fullscreen 4k was exceeding the xcb max request size
on the put image with llvmpipe. This fixes it to send sub-images,
the Xlib put image used in glx does this internally, but not
the xcb one, so just do it in sections here.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17155>
We have this in Anv and it could be reused in Iris for integrated
memory system.
Rework:
* Jordan: Drop regions.valid (Lionel implemented a fallback)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17075>
This advertises ARB_gpu_shader5 on Valhall, which should be working now. On the
GLES3.1 side, this notably adds support for sample variables and dynamic offsets
for texture gathers, both of which should now be working.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This is the instruction that the hardware actually supports. Do the rename, use
the more specific accurate model in the IR, and rework the Valhall texturing
code to emit MKVEC.v2i8 instead of MKVEC.v4i8.
Will fix:
dEQP-GLES31.functional.texture.gather.offset_dynamic.implementation_offset.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
They are in a different place, but the encoding is otherwise as usual. This will
be required for texture gathers with dynamic offsets.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
Valhall does not have Bifrost's 4-source MKVEC.v4i8. Instead, it has a (somewhat
limtied) 3-source MKVEC.v2i8. The full MKVEC.v4i8 may be lowered to a pair of
MKVEC.v2i8 instructions.
For good code quality on both Bifrost and Valhall, we need to model both
instructions in their full generality.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This avoids needless variation from Bifrost. While at it, fix the opcode
definition: there are no abs/neg/swizzle modifiers on the signed integer source,
and there's no clamp. However, there are round and infinity modes, like on
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This will fix:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.at_sample_position.default_framebuffer
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
We generate FADD_RSCALE.f32 in our sample variables implementations. Valhall
doesn't have a dedicated FADD_RSCALE.f32 implementation, it should be aliased to
FMA_RSCALE.f32. Handle that alias in isel lowering. This will fix:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
When lowering vars to scratch, we need to be careful with alignment on Valhall,
where packed TLS access must not straddle a 16-byte boundary. Fixes regressions
when enabling indirect access to temps on Valhall.
Fixes: 6761dbf891 ("panfrost: Use packed TLS on Valhall")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This was missing the message, breaking UBO-to-push and who-knows-what-else, when
enabling fp16 const buffers.
Fixes: 3dc2095b07 ("pan/bi: Model LD_BUFFER instructions")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
This pass is super easy to unit test, so we have no excuse not to test
thoroughly. va_mark_last only inserts annotations in a shader without any
annotations, so our test cases are simply annotated shaders. The CASE macro just
has to compare the case against the case with the annotations stripped and added
back with va_mark_last.
In retrospect, I should have used that technique for the flow control insertion
tests too.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
On Valhall, register reads may be marked as "last" [1]. Setting the last flag
promises the hardware that the value of the register is no longer required. This
may enable hardware optimizations. In particular, it may permit the hardware to
avoid register file writes if a write to the marked register is still in the
forwarding buffer. This may improve power efficiency.
In principle, this is trivial: run liveness analysis and mark killed sources,
like we would in an SSA-based register allocator. In practice, there are a few
wrinkles to avoid hazards around staging registers and 64-bit register pairs,
requiring some additional data flow analysis and fix ups. However, nothing here
is particularly "hard", and all the ideas are already in use for the Bifrost
scheduler and the Bifrost/Valhall scoreboard analyses.
[1] In Mesa's compiler, this is called discard for historical reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
This helps "contain the crazy" and avoids special casing BLEND in compiler
passes. The Valhall instruction is roughly the same as its Bifrost counterpart,
as long as we fix up the source order (as we already do for bitwise operations)
everything works out.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
Post-RA liveness relies on the caller updating the live variable with the
results of bi_postra_liveness_ins. It is not automatic, as with regular
liveness. This means ignoring the result of bi_postra_liveness_ins is surely an
error. Mark it as MUST_CHECK to catch that error at compile time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
Add a unit test for the quirk discovered in the previos commit, because this
will cause flakes (instead of fails) if we get it wrong. Better have a
deterministic fail mode.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
For some unknown reason, waiting for general slots (at least for memory stores)
doesn't work properly on a BARRIER instruction. We need to wait for all general
slots right before issuing the BARRIER in addition to the general wait on the
BARRIER itself. I don't know if this is a hardware bug or some hideous
gate-saving quirk, but I observe the Mali-G78 DDK using the same workaround,
which implies this really is necessary.
Fixes rare flakes in:
dEQP-GLES31.functional.compute.shared_var.work_group_size.float_128_1_1
Note that the flakes from that test are extremely timing dependent. Without this
change, that test is racy but we almost always win the race. Reproducing the
issue reliably requires high system load (e.g. running the CTS in the
background) and simultaneously running that test a large number of times.
Minimal shader-db impact. In particular, no cycle count regressions.
total instructions in shared programs: 2699419 -> 2699458 (<.01%)
instructions in affected programs: 22014 -> 22053 (0.18%)
helped: 2
HURT: 25
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
HURT stats (abs) min: 1.0 max: 3.0 x̄: 1.64 x̃: 1
HURT stats (rel) min: 0.07% max: 2.82% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: 1.01 1.87
95% mean confidence interval for instructions %-change: 0.38% 0.88%
Instructions are HURT.
total cvt in shared programs: 14468.81 -> 14469.42 (<.01%)
cvt in affected programs: 221.33 -> 221.94 (0.28%)
helped: 2
HURT: 25
helped stats (abs) min: 0.015625 max: 0.015625 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
HURT stats (abs) min: 0.015625 max: 0.046875 x̄: 0.03 x̃: 0
HURT stats (rel) min: 0.10% max: 4.44% x̄: 1.06% x̃: 0.79%
95% mean confidence interval for cvt value: 0.02 0.03
95% mean confidence interval for cvt %-change: 0.57% 1.36%
Cvt are HURT.
total quadwords in shared programs: 1462496 -> 1462528 (<.01%)
quadwords in affected programs: 4632 -> 4664 (0.69%)
helped: 0
HURT: 4
HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel) min: 0.35% max: 7.69% x̄: 4.03% x̃: 4.03%
95% mean confidence interval for quadwords value: 8.00 8.00
95% mean confidence interval for quadwords %-change: -2.71% 10.76%
Inconclusive result (%-change mean confidence interval includes 0).
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
Test cases for insert flow are necessarily the reference test cases with the
NOPs stripped out. That means we don't need to duplicate the test bodies.
Deduplicate.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
This definition is a hardware property. It's not specific to the flow control
insertion pass, so move it to common code where other passes can use it.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
This should help with "marge got stuck for an hour and all I got was this
failed job with no results/" when a system intermittently wedges.
This replaces the BM_POE_TIMEOUT ("did we get something on serial in the
last 3 minutes?") that rpi had, in favor of checking that the whole test
job gets through in 20 minutes.
Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
If you set a name of on a swapchain object, because the base object
struct has not been initialized with a VkDevice,
vk_object_base_finish() will segfault when trying to free the object
name.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: cb1e0db23e ("vulkan/wsi: Make wsi_swapchain inherit from vk_object_base")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17165>
In the following case :
vkCmdBindPipeline(compute_pipeline);
vkCmdDispatch(...);
vkCmdBindPipeline(graphics_pipeline);
vkCmdBindIndexBuffer(buffer)
vkCmdDraw(...);
We're emitting the 3DSTATE_INDEX_BUFFER instruction while the HW is
still in GPGPU mode, because we're dealing the pipeline selection to
vkCmdDraw().
Found while debugging Age Of Empire 4, HW is hung on
3DSTATE_INDEX_BUFFER instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17153>
Enabling depth clamping ensures that the Vulkan driver respects
the depth range that zink sets on viewport objects in zink_draw.
When depth clipping is required, use VK_EXT_depth_clip_enable to
enable that independently of depth clamping.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16929>
Fix the transform to make sure it doesn't disturb the depth range
of the blitted image. Set the Z coordinates of the vertices
by hand instead of relying on the transform to do it.
This is a pre-requisite to Zink always enabling depth clamping.
Fixes: 26c6640835
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16929>
Per Vulkan spec only "Draw" commands should be counted towards
occlusion query.
Apparently RB_SAMPLE_COUNT_CONTROL::UNK0 bool controls whether
sample counting is enabled, so we could use it to disable
sample counting for 3d blits which are sometimes used for
clear/copy/blit/gmem-store/resolve operations.
Fixes GL CTS tests running through Zink:
dEQP-GLES3.functional.occlusion_query.depth_clear
dEQP-GLES3.functional.occlusion_query.depth_clear_stencil_clear
dEQP-GLES3.functional.occlusion_query.scissor_depth_clear_stencil_clear
dEQP-GLES3.functional.occlusion_query.scissor_stencil_clear
dEQP-GLES3.functional.occlusion_query.stencil_clear
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6559
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17138>