Skip gather_push_constants when shared consts are enabled. This makes
sure push_consts is only zero-initialized, and reserved_user_consts is
0. This saves some space in the const file.
This change also adds a few asserts and a comment to
lower_load_push_constant. Because shared consts share the same range
for all stages, we should not apply per-stage offsets in
lower_load_push_constant. It worked because nir_lower_explicit_io
always sets base to 0 for nir_var_mem_push_const and
shader->push_consts.lo was always 0 for all stages.
Fixes: 0c787d57e6 ("tu: increase maxPushConstantsSize to 256.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17777>
Turnip supports VK_EXT_direct_mode_display and can use the common
implementation of AcquireDrmDisplayEXT() & GetDrmDisplayEXT() (which use
wsi->can_present_on_device() that turnip implements).
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17768>
A simple implementations of breadcrumbs tracking of GPU progress
intended to be the last resort when debugging unrecoverable hangs.
For best results use Vulkan traces to have a predictable place of hang.
Requires compilation with TU_BREADCRUMBS_ENABLED=1. See tu_cs_breadcrumbs.c
for details on how to use this feature.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15452>
Some of these are actually bugfixes, some like the drawcall information
are just for autotune so they are just performance fixes. However this
came from an audit into what state is used in CmdEndRenderPass().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
Emit it at EndRenderPass time, if the renderpass has tessellation. This
avoids all the special handling for secondaries, will work with
suspended/resumed render passes, and will handle secondaries that
contain render passes which will be allowed with dynamic rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
The workaround for draws that need a CP_WAIT_FOR_ME didn't work if the
barrier before the draw is in a separate command buffer from the draw.
The barrier would add a pending CP_WAIT_FOR_ME, but it would get dropped
on the floor at the end of the command buffer and the draw wouldn't have
a pending CP_WAIT_FOR_ME so it wouldn't emit one. We don't know in the
barrier if the destination is a draw with the workaround, so we have two
options:
- Emit any pending CP_WAIT_FOR_ME at the end of the command buffer (and
before secondaries) in case there is a workaround draw later. This
will emit an extra CP_WAIT_FOR_ME at the end of the command buffer in
case there is an indirect command barrier.
- Always assume at the beginning of the command buffer that there is a
pending CP_WAIT_FOR_ME. This will emit an extra CP_WAIT_FOR_ME before
the first workaround-requiring draw in the command buffer, in case
there was a barrier earlier.
The only draws requiring a workaround are currently
vkCmdDraw*IndirectCount(), which we assume are rarer than indirect
command barriers, so we implement the second option. This entails
treating it as a cache invalidate.
This fixes some upcoming dynamic rendering CTS tests that do
vkCmdDrawIndirectCount() in a secondary but put the barrier for it in
the primary.
Fixes: 37939e9c54 ("turnip: Fix the lack of WFM before indirect draws")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
The only thing it's used for is to get the image view, and we can't rely
on it existing anyway. With dynamic rendering, we only have the format
of the attachments and sample count, so moving forward we can't rely on
anything other than that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17378>
No driver doesn't use this option.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17757>
Fixes crash when capturing with RenderDoc.
From VK spec:
descriptorSetLayout [...] This parameter is ignored if templateType
is not VK_DESCRIPTOR_UPDATE_TEMPLATE_TYPE_DESCRIPTOR_SET.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17751>
The code assumed that if the source was d32s8 then the destination would
also be d32s8, in particular that depth_base_addr/stencil_base_addr
would also be filled out. Move the destination and source handling into
two different ifs with different conditions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
This was missed when we added support for VK_KHR_depth_stencil_resolve.
There is a similar feature where the stencil aspect of a D24S8 can be
copied "tightly" in CopyImageToBuffer, but it used the texture swizzle
and so required the 3d path. To get it to work with the 2D path, which
is required for resolves, we have to instead use the A8_UNORM format,
which works for texture sampling even for tiled images. We also have to
reuse the pre-existing image views because subpass resolves work on
image views rather than images, whereas before the fixup was applied
while creating the image view. This means threading through the
corresponding "opposite" format through setup, src, and dst functions,
doing the fixup there (through some shared helpers), and then getting
every user to specify the right format. As a bonus, we no longer need to
force the 3d path for the CopyImageToBuffer and CopyBufferToImage
special cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
primitive_param takes up 2 vec4's. Remove an align that I don't
understand.
The align upset
Test case 'dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_vec4'..
deqp-vk: ../src/freedreno/ir3/ir3_nir.c:1039:
void ir3_setup_const_state(nir_shader *, struct ir3_shader_variant *, struct ir3_const_state *):
Assertion `constoff <= ir3_max_const(v)' failed.
with an older version (android11-tests-dev branch) of deqp-vk. This is
because ir3_nir_opt_preamble uses the function for the worst case but
the function fails to replace the align by the worst case.
No regression with dEQP-VK.*tess*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17570>
Now there are two paths for push constants.
When it's range is under 128b, we can use shared consts.
When it's over 128b, we can instead do loading data through
regular path, which is same as the previous way.
Now we can satisfy emulations like vkd3d that requires 256b for
its root signatures and we think it fairly maps to push constants
rather than inline uniform blocks that requires one indirection.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Follow the way blob is doing for PushConstants though it supports only
128b, same as previous.
v1. Rename tu_push_constant_range.count into dwords to redue confusion.
( Danylo Piliaiev <dpiliaiev@igalia.com> )
v2. Enable shared constants only if necessary.
v3. Merge the two draw states TU_DRAW_STATE_SHADER_GEOM_CONST and
TU_DRAW_STATE_FS_CONST as shared constants are used.
Note that this leaves tu_push_constant_range in tu_shader so we could
use it again in the following patch.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Adds a shared consts base offset and a size of it(dwords) to ir3_compiler
since they might be depending on gpu generations. (Danylo Piliaiev <dpiliaiev@igalia.com> )
Adds a flag to present whether shared consts are enabled to
ir3_shader_options and then it sets to ir3_const_state when creating
an ir3 variant. Although this state is not per-shader state, this is
necessary when figureing out real constlens.
v1. Define a hw quirk for geometry shared const files and use it when
calculating const length.
v2. Don't hardcode when calculating a safe const length.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
According to the observation on a630/a650/a660, max_const_pipeline has
to be 512 when all geometry stages are present. Otherwise a gpu hang
happens. Acoordingly maximum safe size for each stage should be under
(max_const_pipeline / 5 (stages)).
Only when VS and FS stages are present, the limit is 640.
v1. Align max_const_safe to 4 vec4's.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Test runtime has crept up with more CTS tests and more features. The last
vk_full 1/2 run I tried timed out at:
Pass: 268488, Fail: 2, ExpectedFail: 7, Warn: 1, Skip: 602571, Duration: 1:29:29, Remaining: 45
Rude.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17662>
Allow folding constants/undef sources by sharing more code with the image_store
16bit folding pass.
Allow more than one set of sources because RADV wants two, one for
G16 (ddx/ddy) and one for A16 (all other sources).
Allow folding cube sampling destination conversions on radeonsi/radv because
I think the limitation only applies to sources.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16978>
The shared reg usage involved in the subgroup-related macros can cause
trouble for the spiller, and spilling may be implicated in CTS failures
with old versions of the subgroup tests, so let's make sure we get some
coverage. It does seem to catch a couple of failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17642>