It can happen that the user reads an input attachment as the first use
of that attachment. In that case there are no subpass dependencies
required at all, because there could be a pipeline barrier before the
renderpass instead, and in any case we assume that dependencies with the
first subpass as a destination can be executed only once outside the
renderpass. The result is that we only do a CACHE_INVALIDATE once
before the entire renderpass, but it's actually required after each GMEM
load, because input attachments read GMEM through UCHE and those writes
to GMEM invalidate UCHE.
While we could add the missing CACHE_INVALIDATE "by hand" somehow, it
turns out it's actually just as easy to do an optimization the blob
does, where it simply doesn't patch those input attachments and reads
them directly instead. This means we can skip allocating memory in GMEM
for them entirely in some circumstances.
This fixes e.g.
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image.4_bit
with TU_DEBUG=forcebin.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12213>
Setting it to GPU can cause an L3$ flush in certain cases. That's not
what we want as we really only care about coherency within the GPU.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12291>
Only GFX10+ is affected because older chips don't support
comp-to-single. For them, we need to implement FCE on compute with DCC
and eventually CMASK.
Fixes the gap between concurrent vs exclusive queue with Scarlet Nexus,
also gives a boost with Doom Eternal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12088>
The code we had for this was a work in progress and not finished. Also,
it was geared towards partial writes caused by output packing (i.e.
fp16) and was ignoring partial updates caused by conditional writes,
which are far more common in our case.
This change provides an implementation for tracking conditional writes
that works in tandem with the previous spill change to narrow liveness
for their spills.
Fixes register allocation failures in:
dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite
We also gain one shader from shader-db:
total instructions in shared programs: 13339969 -> 13338584 (-0.01%)
instructions in affected programs: 185520 -> 184135 (-0.75%)
helped: 375
HURT: 130
Instructions are helped.
total threads in shared programs: 412038 -> 412040 (<.01%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
total uniforms in shared programs: 3746581 -> 3746585 (<.01%)
uniforms in affected programs: 49 -> 53 (8.16%)
helped: 0
HURT: 1
total max-temps in shared programs: 2359960 -> 2359947 (<.01%)
max-temps in affected programs: 289 -> 276 (-4.50%)
helped: 7
HURT: 0
Max-temps are helped.
total sfu-stalls in shared programs: 34351 -> 34359 (0.02%)
sfu-stalls in affected programs: 218 -> 226 (3.67%)
helped: 35
HURT: 37
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13374320 -> 13372943 (-0.01%)
inst-and-stalls in affected programs: 186653 -> 185276 (-0.74%)
helped: 373
HURT: 132
Inst-and-stalls are helped.
LOST: 0
GAINED: 1
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12278>
A spill of a conditional write generates code like this:
mov.ifa t5000, 0
mov tmud, t5000
nop t5001; ldunif (0x00008100 / 0.000000)
add tmua, t11, t5001
Here, we are spilling t5000, which has a conditional write, and we
produce an inconditional spill for it. This implicitly means that
our spill requires a correct value for all channels of t5000.
If we do a conditional spill, then we emit:
mov.ifa t5000, 0
mov tmud.ifa, t5000
nop t5001; ldunif (0x00008100 / 0.000000)
add tmua.ifa, t11, t5001
Which only uses channels of t5000 that have been written by the
instruction being spilled.
By doing the latter, we can then narrow down the liveness for t5000
more effectively, as we can use this to detect that the block only reads
(in the tmud instruction) the values that have been written previously
in the same block (in the mov instruction). This means that values in
other channels are not used, and therefore, we don't need them to be
alive at the start of the block. This means that if this is the only
write of t5000 in this block, we can consider that the block
completely defines t5000.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12278>
When an image supports comp-to-single, DCC is cleared to 0x10 (single)
and the clear color value is written to the beginning of each 256B
block in the image.
This allows to skip FCE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
When DCC is cleared with that code, the hardware expects the clear
color value to be stored at the beginning of each 256B block in
the image.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
As we aren't testing LLVMPipe in these jobs, and shader compilation is
currrently the bottleneck.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12196>
Sometimes, the VM powered off before all the output from the guest got
to the console.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12196>
Crosvm deals with virtio-gpu commands sequentially, so parallelization
in the host doesn't help much.
Also, too much parallelization in the guest causes some tests to time
out.
So reduce the number of dEQP instances being run concurrently, make sure
we dont limit the number of CPUs being used in the host and schedule
more jobs in CI to keep the times below 10 minutes.
Closes: #5172
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12196>
dEQP isn't high on rendering, but that is in the critical path as all
dEQP processes are waiting for Crosvm to single-threadedly service their
requests.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12196>
The Vulkan spec requires ~0 for 1x1.
Fixes dEQP-VK.fragment_shading_rate.misc.shading_rates.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
Minimum required value is 16 but we support up to 32
(2x2 VRS with MSAA 8x).
Fixes dEQP-VK.fragment_shading_rate.misc.limits.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
From the Vulkan spec 1.2.187.
"fragmentShadingRateWithCustomSampleLocations specifies whether
custom sample locations are supported for multi-pixel fragments.
It must be VK_FALSE if VK_EXT_sample_locations is not supported."
VK_EXT_sample_locations is disabled on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
By creating the KMS side handles we allow GBM to return the proper KMS
side GEM handles for imported buffers. Always creating the KMS side
handles adds a bit of overhead, as we don't need them on all imported
resources, but seems like the most robust solution for now.
Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12018>
There are a number of drivers which do a trial-and-error import
of buffers into the KMS side via renderonly. Some of those imports
are expected to fail, so we should not print a error message in
this case. All callers do proper error handling themselves.
Cc: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12018>
It's a RMW operation, also note that DB doesn't use L2 on GFX6-8.
Fixes test_clear_depth_stencil_view() and test_discard_resource() tests
from vkd3d-proton.
Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12223>
It's called anv_image_* so it really should take an anv_image. For the
couple of cases where we really want to pass in a set of aspects, we
leave an anv_aspect_to_plane() helper. anv_image_aspect_to_plane() is
then just a wrapper around it which grabs the aspects from the image.
While we're in the area, sprinkle some const around.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
The new versions should have identical output, just a simpler (and
probably faster) implementation and more/better asserts.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
The Vulkan 1.2.184 spec says:
"When creating a VkImageView, if sampler Y′CBCR conversion is
enabled in the sampler, the aspectMask of a subresourceRange used by
the VkImageView must be VK_IMAGE_ASPECT_COLOR_BIT.
When creating a VkImageView, if sampler Y′CBCR conversion is not
enabled in the sampler and the image format is multi-planar, the
image must have been created with
VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT, and the aspectMask of the
VkImageView’s subresourceRange must be VK_IMAGE_ASPECT_PLANE_0_BIT,
VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT."
Previously, for YCbCr images, we were flipping this around. For single-
plane views where VK_IMAGE_ASPECT_PLANE_N_BIT would be passed in by the
app, we would store VK_IMAGE_ASPECT_COLOR_BIT. For multi-plane views
where the client says VK_IMAGE_ASPECT_COLOR_BIT, we would store a all of
the planes. (There was also an extra bit of remapping that would
compact the planes in the non-existent case of a format with a non-
contiguous set of planes.) The idea behind this was that for things
like rendering or single-plane sampling, storage, or compute, we want it
to look as much like a single-plane image as possible but we wanted the
multi-plane case to be the awkward one.
This commit changes it around so that iview->aspects is always exactly
the subset of image->vk.aspects represented by the view. This is
identical to how aspects work for depth/stencil so it gains us some
consistency.
This commit also changes anv_image_view::aspect_mask to aspects to force
a full audit of the field. As can be seen, there are only a few uses of
this field and they're all mostly fine:
- A bunch of them are used to check for depth/stencil. That hasn't
changed.
- Most of the checks for color already used ANY_COLOR_BIT, only one
needed fixing.
- There's a check that both src/depth are color for MSAA resolves.
However, we don't support MSAA on YCbCr so there's no point in
checking for ANY_COLOR_BIT.
There is a hidden usage of planes in anv_descriptor_set_write_image_view
that's not as obvious. However, this function simply looks at
anv_image_view::n_planes and blindly fills out the descriptor
accordingly. As long as image views with a single plane continue to
claim n_planes == 1, this will be fine.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
Previously, we initialized vplane in anv_CreateImageView to 0 and
incremented it every iteration of the aspect loop. This only works
because planes are guaranteed to be in aspect-bit-order which wasn't
documented anywhere. Instead, drop this assumption and burn a couple
CPU cycles properly calculating vplane.
While we're here, make iplane const as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
When creating a single-plane view of a multi-plane image, we were
relying on vplane_aspect to be VK_IMAGE_ASPECT_COLOR_BIT so that
anv_get_format_plane of the single-plane view format would work.
Instead of relying on this quirk, we can drop vplane_aspect and rely
entirely on vplane to only be 0 in this case. In the case of depth or
stencil images, we still need to grab the format aspect but we can use
the actual aspect and don't need the vplane_aspect trickery.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
Once we get past depth/stencil, what we really want is plane 0 not the
color aspect. A bunch of those formats don't have a single color
aspect.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
Unlike anv_get_format_aspect, this takes a plane number which is
relative to the set of aspects on the format. There are a number of
cases where we already have the plane and so re-fetching it is useless.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
The comment about modifiers is bogus because we check the modifier
before this check and return early. Also, there's no reason why we need
to check the requested aspect when we could check the format itself.
anv_image_aspect_to_plane will ensure that the requested aspect is one
that actually exists.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
Vulkan allows us to, in theory, support ycbcr on single-plane formats if
the client really wants it. Also, these functions should work on a
multi-plane color image as long as the client specifies the right
aspect. This gets rid of our usage of can_ycbcr outside of anv_image.c.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12141>
In the compute dispatch path we do not allocate a huge amount
of space to cover everything so the individual functions have to
allocate. This was missing here, causing a hang in Cyberpunk when
accessing the system menu at some locations with thread tracing
enabled.
Fixes: bd1186572f ("radv: add support for push constants inlining when possible")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12271>
Lots of the MAX2 args end up subtracting two unsigned numbers, which
blows up when the result is negative.
Fixes: 4c99d6ff54 ("radv: flush L2 for images affected by the pipe misaligned issue on GFX10+")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12272>
If we return inside a pan_pack() the descriptor packing doesn't happen.
Cc: mesa-stable
Fixes: 8ba2f9f698 ("panfrost: Create a blitter library to replace the existing preload helpers")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12239>
The spill base setting instructions (which includes some uniforms) are
added in the entry block, not in the current block. When ldunif
optimization is applied, the cursor is pointing to instructions in the
entry block, but the current block is a different one. This leads to a
heap-buffer-overflow when going through the list of instructions
(detected by the address sanitizer).
Thus change the current block to entry block, and restore it after the
setup is done.
This fixes
dEQP-VK.ssbo.readonly.layout.single_struct.single_buffer.std430_instance_array_comp_access_store_cols
with address sanitizer enabled.
v2:
- Set current block instead of disabling ldunif optimization (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12221>