I don't know when this was added but it's really neat and we should use
it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation
things will work better.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>
I don't know when this was added but it's really neat and we should use
it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation
things will work better.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>
implicit sync is hard, and many drivers get it wrong, so assume that
anyone who isn't mesa might need some hand-holding
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17009>
If we have a combined Z/S image, the image has depth, so we proceed down the
depth path, which does not set clear.s even though there's *also* a stencil
component. Unify the control flow to fix this.
Fixes (among others):
dEQP-VK.api.image_clearing.core.clear_depth_stencil_image.single_layer.d24_unorm_s8_uint_multiple_subresourcerange
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
Rather than generating shaders to clear depth and stencil attachments, run the
rasterizer without a shader and configure the depth/stencil hardware to do the
clear. These settings are known to be efficient on Valhall, presumably the
depth/stencil pipeline on Bifrost is similar enough that it is also the
efficient way there. It's certainly much simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
These were removed in an earlier series containing ae77c207e0 ("panvk: Use push
constants for copy shaders"), but the unused variables hung around.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
On Bifrost and newer, blend descriptors are decoupled from render target. That
means we can always use a clear shader reading from blend_descriptor_0 and
specify the desired render target in the sole blend descriptor we pass.
Likewise on Bifrost and newer we don't need blend descriptors when we don't
blend, which is the case for the Z/S clears.
This reduces the number of shaders compiled on startup from 468 to 426.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
Use radv_graphics_pipeline_info instead of pCreateInfo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
Use radv_graphics_pipeline_info instead of pCreateInfo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
Use radv_graphics_pipeline_info instead of pCreateInfo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
pCreateInfo pointers have to be completely replaced for graphics
pipeline library.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
usually inlining is optimal for cpu drivers since the majority of
time is spent in the shaders, and any amount of reduction to shader code
will be optimal
if, however, the shaders are still really big after inlining, this improvement
will be negated by the insane amount of time spent doing stupid llvm optimizer
passes, so check post-inline size to see whether it exceeds a size threshold
lavapipe release build - 1700% improvement
* spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-rd-after-barrier
before: 142.15s user 0.42s system 99% cpu 2:23.14 total
after: 8.60s user 0.07s system 99% cpu 8.677 total
fixes#6647
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16977>
The hardware writes one CRC per (effective) tile, the tile size of the CRC
buffer is the same as the configured effective tile size. However, all our CRC
infrastructure assumes 16x16 tiles. In case CRC is used with smaller tiles,
buffer overflows and incorrect rendering are all possible. Don't use CRC at
smaller tile sizes. Note disabling CRC correctly invalidates any bound CRC
buffers.
Fixes: 2e97d7c835 ("panfrost: Transaction elimination support")
Closes: #6332
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
It has a single user -- in a section of code that only runs for MFBD GPUs and
that has already decided whether to use CRCs -- so inlining it simplifies its
definition greatly and may avoid redeciding the CRC setting.
[Note for mesa-stable maintainers: This is not a bug fix but is marked for
backport so the next patch applies cleanly.]
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
info.fs.sidefx considers discard() to be a side effect. That definition is...
dubious at best. It certainly isn't the definition needed for forward pixel
kill. The only reason pixels couldn't be killed by FPK is if the shader has side
effects in the sense of writing to memory. Use that more precise condition so
FPK works more often.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #5607
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16984>
This fixes many tests in following groups on DG2:
dEQP-VK.memory_model.*
dEQP-VK.fragment_shader_interlock.*
v2: use memory scope and setup descriptor also
for barriers without defined scope (Curro),
use local scope and flush type none with
NIR_SCOPE_NONE scope, cleanups (Lionel)
v3: use LSC_FENCE_THREADGROUP for NIR_SCOPE_WORKGROUP,
remove default case (Curro), use eviction if scope
was not defined, use LSC_FENCE_GPU scope for vertex
stage
v4: use LSC_FENCE_TILE independent of stage for device
scope (Curro)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>
amdgpu returns 12 TB of GTT on Kaveri, which resulted in 0 KB of GTT
after the conversion to uint32_t, which caused us to report 0 as the UBO
size, which disabled UBOs and downgraded the driver to OpenGL 3.0.
Fixes: aee8ee17a5 - radeonsi: change max TBO/SSBO sizes again and rework max alloc size
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6642
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
We just need to pass the uses_discard flag to the epilog.
The hw skips the export anyway. This will hang if SPI registers declare
an output format or KILL_ENABLE is set because those cases require
an export with done=1.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
If tmp_buffer (in ssbo[1]) is NULL, setting the writable bit causes
the called function to access the NULL buffer.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
ac_llvm_add_target_dep_function_attr has no effect if the function is
inlined.
amdgpu-gds-size determines m0 for ds_sub_u32 gds, which hangs if it's 0.
This helps both gfx10 and gfx11, though it will only be used by gfx11
after we enable streamout.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>