if this isn't supported, don't use rect-related sampling
cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17714>
The CTS does a lot of 1x1x1 compute shaders (all that stuff like
dEQP-GLES31.functional.shaders.builtin_functions.precision.mul.highp_compute.scalar)
which finish with store_ssbos. Instead of doing the invocation loop in
that case (which LLVM has to later unroll), just emit the single
invocation's store.
Fixes timeouts running
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, which does
a spectacular number of SSBO stores in a long 1x1x1 compute shader.
Reduces runtime of on llvmpipe from 66s to 29s locally, and virgl from
1:38 to 43s. virgl
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.22
goes down to 7 seconds.
Fixes: #6797
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17730>
Let's be slightly more defensive here. If a texture image doesn't have
an associated pipe_resource allocated, then render_texture() will pass
that along to _mesa_update_renderbuffer_surface(), which will crash on a
NULL pointer dereference. So, if there isn't a pipe_resource, then we
should just skip this altogteher.
Today, this isn't an issue, because each gl_texture_image always
allocates a pipe_resource up front. On a branch of mine, I prototyped
some improvements to the compressed texture fallback handling, where it
would defer resource allocation, examine the source image's block data,
and dynamically select a format based on that, then allocate it later.
With that prototype in place, we saw crashes the Android "My Talking
Tom" series of games, which appear to be attaching ASTC textures to a
framebuffer color attachment. That FBO would be incomplete anyway, as
ASTC textures aren't renderable, but we got into a situation where the
render-to-texture code was crashing due to the lack of pt before it
could properly signal that it was incomplete and bailing.
Technically, we don't need this now, but I figure that being defensive
won't hurt and this would probably save whoever encounters such an issue
in the future a bunch of frustrating debugging.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17508>
Fixes crash when capturing with RenderDoc.
From VK spec:
descriptorSetLayout [...] This parameter is ignored if templateType
is not VK_DESCRIPTOR_UPDATE_TEMPLATE_TYPE_DESCRIPTOR_SET.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17751>
If we don't append subpass->self_dep_info last, other __vk_append_struct()
calls will update its pNext chain which lives in the subpass which
should be treated as immutable. This is easily fixable by just making
it the last thing we append to the chain.
Fixes: 7e11cdc77a ("vulkan/render_pass: Pass sample locations to barriers")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17748>
No shader-db changes on any Intel platform
Fossil-db results:
Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15
Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15
Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout. This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0
total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255
Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337
Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The code assumed that if the source was d32s8 then the destination would
also be d32s8, in particular that depth_base_addr/stencil_base_addr
would also be filled out. Move the destination and source handling into
two different ifs with different conditions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
This was missed when we added support for VK_KHR_depth_stencil_resolve.
There is a similar feature where the stencil aspect of a D24S8 can be
copied "tightly" in CopyImageToBuffer, but it used the texture swizzle
and so required the 3d path. To get it to work with the 2D path, which
is required for resolves, we have to instead use the A8_UNORM format,
which works for texture sampling even for tiled images. We also have to
reuse the pre-existing image views because subpass resolves work on
image views rather than images, whereas before the fixup was applied
while creating the image view. This means threading through the
corresponding "opposite" format through setup, src, and dst functions,
doing the fixup there (through some shared helpers), and then getting
every user to specify the right format. As a bonus, we no longer need to
force the 3d path for the CopyImageToBuffer and CopyBufferToImage
special cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
primitive_param takes up 2 vec4's. Remove an align that I don't
understand.
The align upset
Test case 'dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_vec4'..
deqp-vk: ../src/freedreno/ir3/ir3_nir.c:1039:
void ir3_setup_const_state(nir_shader *, struct ir3_shader_variant *, struct ir3_const_state *):
Assertion `constoff <= ir3_max_const(v)' failed.
with an older version (android11-tests-dev branch) of deqp-vk. This is
because ir3_nir_opt_preamble uses the function for the worst case but
the function fails to replace the align by the worst case.
No regression with dEQP-VK.*tess*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17570>
This is more accurate because it's computed directly in spirv_to_nir and
takes even unused SampleID and SamplePos builtings into account.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17669>
if this is only going to do a sample0 resolve, the functionality is
equivalent to just copying the first sample, and in llvmpipe terms,
this just means doing a direct copy at offset=0
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17705>
RECT textures used to be required to be supported by drivers. But since
the state-tracker learned how to lower these to 2D textures, some
drivers no longer support them.
While we have lowering in place for this, lowering it involves some
needless overhead. So let's just use a 2D texture instead of a RECT
texture.
Because having two versions and switching between them is more
complicated than it needs to be, let's just always use a 2D texture.
Similarly, let's just always multiply the reciprocal here, so we don't
have to test for PIPE_CAP_TGSI_DIV first.
Cc: mesa-stable
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17707>
these states were erroneously assigned to the pre-rasterization stage
for pipeline libraries when they instead belong to the vertex input stage
cc: mesa-stable
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17738>
this adds a mangled variation on nir_inline_uniforms that enables inlining
from any uniform buffer in order to try inlining every possible load
if the shader is too small or the ssa_alloc delta from inlining is too small,
then inlining is disabled for that shader to avoid pointlessly churning
the same shaders for no gain
with certain types of shaders, the speedup is astronomical
before:
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays (4750.76s)
after:
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays (0.505s)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17722>
Yes this is a round trip, but X_PresentPixmap is not itself a blocking
operation, it just instructs the server to do the next presentation at
some time. More importantly, if _we_ don't catch the presentation error,
xlib's error queue will, and the calling code is certainly not prepared
to handle errors from Present.
Forcing the round trip here is also a bit more correct semantically.
This is the end of the Vulkan client part of the present queue, and the
X_PresentPixmap request transfers the queue operation to the server, so
we should not return until we are sure the handoff has happened.
Fixes some flakiness with piglit@glx-visuals-* with zink+radv.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17685>
Checking for the extension isn't enough, we also need to check for the
feature-bit.
Fixes: 49a20e0981 ("zink: start a unified driver workarounds struct")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17709>