Nodes are allocated in order to registers so initially sorting
was used to ensure that nodes with smaller life ranges would
be assigned first and therefore be more likely to get
accumulators.
However, since d81a6e5f1d now we don't rely on order to make
decisions about accumulators and instead we make policy decisions
based on actual liveness, so sorting is no longer strictly
relevant to this decision.
Furthermore, we are not re-sorting nodes after each spill either,
since that would probably require that we rebuild the interference
graph after each spill (the graph identifies nodes by their index).
Shader-db results show a significant improvement in instruction
counts, due to more optimal accumulator assignments. The reason for
this is that we use a round-robin policy for choosing the next
accumulator to assign. The idea behind this is preventing nearby
temps to be assigned to the same accumulator so that QPU scheduling
is more flexible, but if we sort our nodes, we are basically not
assigning temps in program order any more and the round-robin policy
becomes less effective:
total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
instructions in affected programs: 11791267 -> 11454036 (-2.86%)
helped: 62890
HURT: 19987
total threads in shared programs: 415874 -> 415870 (<.01%)
threads in affected programs: 20 -> 16 (-20.00%)
helped: 2
HURT: 4
total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
uniforms in affected programs: 43430 -> 43402 (-0.06%)
helped: 134
HURT: 173
total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
max-temps in affected programs: 123334 -> 117280 (-4.91%)
helped: 4112
HURT: 1195
total spills in shared programs: 3870 -> 3860 (-0.26%)
spills in affected programs: 1013 -> 1003 (-0.99%)
helped: 14
HURT: 12
total fills in shared programs: 5560 -> 5573 (0.23%)
fills in affected programs: 1765 -> 1778 (0.74%)
helped: 14
HURT: 17
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
For us they are basically uniforms too so we want to make their
lifespans short to facilitate allocating them to accumulators.
total instructions in shared programs: 13043585 -> 13015385 (-0.22%)
instructions in affected programs: 8326040 -> 8297840 (-0.34%)
helped: 24939
HURT: 19894
total threads in shared programs: 415860 -> 415858 (<.01%)
threads in affected programs: 4 -> 2 (-50.00%)
helped: 0
HURT: 1
total uniforms in shared programs: 3721953 -> 3720451 (-0.04%)
uniforms in affected programs: 96134 -> 94632 (-1.56%)
helped: 744
HURT: 435
total max-temps in shared programs: 2173431 -> 2154260 (-0.88%)
max-temps in affected programs: 264598 -> 245427 (-7.25%)
helped: 10858
HURT: 841
total spills in shared programs: 4005 -> 4010 (0.12%)
spills in affected programs: 700 -> 705 (0.71%)
helped: 5
HURT: 10
total fills in shared programs: 5801 -> 5817 (0.28%)
fills in affected programs: 1346 -> 1362 (1.19%)
helped: 6
HURT: 11
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
If we are compiling with a strategy that does not allow TMU spills
we should not allow spilling anything that is not a uniform.
Otherwise the RA cost/benefit algorithm may choose to spill a
temp that is not uniform and that will cause us to immediately
fail the strategy and fallback to the next one, even if we
could've instead chosen to spill more uniforms to compile the
program successfully with that strategy.
Some relevant shader-db stats:
total instructions in shared programs: 13040711 -> 13043585 (0.02%)
instructions in affected programs: 234238 -> 237112 (1.23%)
helped: 73
HURT: 172
total threads in shared programs: 415664 -> 415860 (0.05%)
threads in affected programs: 196 -> 392 (100.00%)
helped: 98
HURT: 0
total uniforms in shared programs: 3717266 -> 3721953 (0.13%)
uniforms in affected programs: 12831 -> 17518 (36.53%)
helped: 6
HURT: 100
total max-temps in shared programs: 2174177 -> 2173431 (-0.03%)
max-temps in affected programs: 4597 -> 3851 (-16.23%)
helped: 79
HURT: 21
total spills in shared programs: 4010 -> 4005 (-0.12%)
spills in affected programs: 55 -> 50 (-9.09%)
helped: 5
HURT: 0
total fills in shared programs: 5820 -> 5801 (-0.33%)
fills in affected programs: 186 -> 167 (-10.22%)
helped: 5
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
Our cost was 5 which matches the number of instructions we have to
add for a TMU spill (a fill is 4 instructions).
Uniform spills on the other hand add an extra instruction for each
fill and remove one instruction for the spill itself. These have
a cost of 1.
Therefore, if we have a single spill+fill, we end up with +9
instructions if it is a TMU spill and +0 instructions with a uniform
spill, so making the former only 5 times more costly is probably
not a good idea, and this is without even considering the added
latency of the TMU accesses.
Relevant shader-db changes show this causes as a marginal instruction
count increase in a few shaders but better thread counts and lower
TMU spilling overall:
total instructions in shared programs: 13037315 -> 13040711 (0.03%)
instructions in affected programs: 370106 -> 373502 (0.92%)
helped: 187
HURT: 321
total threads in shared programs: 415090 -> 415664 (0.14%)
threads in affected programs: 574 -> 1148 (100.00%)
helped: 287
HURT: 0
total uniforms in shared programs: 3706674 -> 3717266 (0.29%)
uniforms in affected programs: 63075 -> 73667 (16.79%)
helped: 40
HURT: 395
total max-temps in shared programs: 2176080 -> 2174177 (-0.09%)
max-temps in affected programs: 15838 -> 13935 (-12.02%)
helped: 316
HURT: 34
total spills in shared programs: 4247 -> 4010 (-5.58%)
spills in affected programs: 2599 -> 2362 (-9.12%)
helped: 107
HURT: 14
total fills in shared programs: 6121 -> 5820 (-4.92%)
fills in affected programs: 3622 -> 3321 (-8.31%)
helped: 108
HURT: 13
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
This commit adds support for Vulkan backend on a630_skqp job.
= Needed changes
- Needed to install libvulkan-dev package on system
- Refactored the way the available skqp reports are printed
tested in development builds with skia tools
Piglit expectations had to be updated in various drivers due to !14750 not
having bumped the tags when it tried to uprev.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14686>
On V3D the quality of the code we generate is significantly affected by
how we decide to assign accumulators during register allocation, which
is determined by liveness, favoring short-lived temps.
There are many shaders that end up doing a whole lot of uniform loads
first, and using them later, which is very inconvenient for our register
allocation process because this increases uniform liveness and causes
us to use accumulators less efficientely, leading to significant churn.
To fix this, we move uniforms right before their first use in the same
block, but we need to do this after NIR scheduling, which means we are
doing it in non-SSA form, since the scheduler has a tendency to undo
this optimization and it is not easy to modify it to avoid it, since it
works in more abstract terms, using instruction dependencies, estimated
register pressure and instruction delay information to do its work,
which are very different concepts.
total instructions in shared programs: 13316738 -> 13033613 (-2.13%)
instructions in affected programs: 10389172 -> 10106047 (-2.73%)
helped: 55442
HURT: 16144
total threads in shared programs: 413722 -> 415048 (0.32%)
threads in affected programs: 1428 -> 2754 (92.86%)
helped: 680
HURT: 17
total loops in shared programs: 1716 -> 1690 (-1.52%)
loops in affected programs: 26 -> 0
helped: 26
HURT: 0
total uniforms in shared programs: 3704313 -> 3705181 (0.02%)
uniforms in affected programs: 687730 -> 688598 (0.13%)
helped: 2920
HURT: 7384
total max-temps in shared programs: 2364785 -> 2175190 (-8.02%)
max-temps in affected programs: 1215387 -> 1025792 (-15.60%)
helped: 49667
HURT: 1556
total spills in shared programs: 4241 -> 4248 (0.17%)
spills in affected programs: 642 -> 649 (1.09%)
helped: 11
HURT: 19
total fills in shared programs: 6115 -> 6125 (0.16%)
fills in affected programs: 1276 -> 1286 (0.78%)
helped: 11
HURT: 21
total sfu-stalls in shared programs: 34381 -> 36578 (6.39%)
sfu-stalls in affected programs: 16055 -> 18252 (13.68%)
helped: 3647
HURT: 5206
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15056>
If we have a postponed spill, the temp we create at ip is no longer
the spilled temp and therefore is affected by the thrsw injection.
Fixes corruption in the additive blending animation demo from
Three.js.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15112>
Not all Vulkan implementations allows rendering to linear images, so in
order to support scanning out from these on Windows we might have to copy
through a buffer like we do in the PRIME path.
To avoid reimplementing the same, let's instead generalize the code a
bit so it doesn't have to specfy any PRIME-specific details.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12210>
When we spill we add new temps. We should be careful not to access
liveness for these until we have re-computed it after all spills and
fill for that the spilled temp have been processed so as to avoid
out-of-bounds accesses to the c->temp_start and c->temp_end arrays.
This fixes a crash in a Three.js demo when we try to patch register
classes after a TMU spill that was caused because we would incorrectly
try to patch the same temps we had just added for the spill itself,
which is not only unnecessary but also incorrect since we these temps
would not have liveness information available yet and thus would
cause out of bounds accesses.
Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15107>
Make piglit test jobs to run always, as piglit testsuite offers more
coverage for the VC4 driver.
On the other hand, make the EGL testing manually, as we don't have
enough devices to execute all the tests fast enough.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15045>
We added spill_count to handle uniform batch spills, which we no longer do.
What we want now is a way to know if we are spilling registers.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Instead, we only recompute liveness and we add new nodes and
interferences to the graph manually (we also need to patch
register classes in some cases).
To assist in this process, we also add an ip counter to our
instructions that we also recompute after each spill, which we use
to identify registers that cross thrsw boundries introduced with
TMU spills and fills and adjust their register classes accordingly
(removing their capacity to use accumulators).
This significantly reduces the CPU cost of spills. Using
shaders/closed/gputest/piano/7.shader_test as reference:
Compile time up to the first successful compile strategy in main is
~24s and with this change it is ~11s. With this speed up, we can now
try all 2-thread compile strategies (including the fallback scheduler)
in only ~15s.
A full shader-db run results in:
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
We may be pipelining TMU writes and reads, in which case we can
see both TMUWT and LDTMU at the end of a TMU sequence, so we should
not assume that a TMUWT always terminates a sequence.
Also, we had a bug where we were using inst instead of scan_inst
to check if we find another TMUWT after the curent instruction.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Instead of whether they are allowed to spill or not. This is more flexible.
Also, while we are not currently enabling spilling on any 4-thread strategies,
should we do that in the future, always prefer a 4-thread compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
Until now we would only allow spilling as a last resort in the
last 2 strategies, however, it is possible that in some cases
earlier strategies may produce less spills if we allowed spilling
on them.
Likewise, the fallback scheduler can sometimes produce less spills
than 2 threads with optimizations disabled.
With this change, we start allowing all our 2-thread strategies to
spill, and instead of choosing the first strategy that is successful,
we choose the one that doesn't spill or the one with the least amount
of spilling.
It should be noted that this may incur in a significant increase
of compile times. We will address this in a follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
When I originally added vk_image_view, I was overly clever when it came
to the format field. I decided to make it only contain the bits of the
format contained in the selected aspects. However, this is confusing
(not generally a good thing) and it's also not always what you want.
The Vulkan 1.3.204 spec says:
"When using an image view of a depth/stencil image to populate a
descriptor set (e.g. for sampling in the shader, or for use as an
input attachment), the aspectMask must only include one bit, which
selects whether the image view is used for depth reads (i.e. using a
floating-point sampler or input attachment in the shader) or stencil
reads (i.e. using an unsigned integer sampler or input attachment in
the shader). When an image view of a depth/stencil image is used as
a depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So, while the restricted format makes sense for texturing, it doesn't
for when the image is being used as an attachment. What we probably
actually want is both versions of the format. We'll call the one given
by the VkImageViewCreateInfo vk_image_view::format and the restricted
one vk_image_view::view_format.
This is just the first commit which switches format to view_format so
the compiler will make sure we get them all. The next commit will
re-add vk_image_view::format but this time unmodified.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15007>
The only interesting information stored in v3dv_cmd_pool is the list of
command buffers and that's already tracked by vk_command_pool.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14917>
Looks like 3 implementations already have that field in their private
command_buffer struct, and having it at the vk_command_buffer opens the
door for generic (but suboptimal) secondary command buffer support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14917>
Add more failing tests to the expected failures.
These are obtained after executing the full Vulkan CTS.
v2:
- Add comments in the tests (Alejandro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14948>
Use drmSyncobjSignal to signal out_syncobjs when a GPU job submission
ends in the simulator. With this, we can enable multisync support in the
simulator and keep the multisync approach to process fence by submitting
a serialized no-op job that adds the fence to the array of out syncobjs,
i.e. syncobjs to be signaled in the kernel when a job completes (job
post deps).
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14768>
The main thing is VK 1.3 testing, but also includes test bugfixes. The
1.3 CTS required an uprev of deqp-runner to handle a new style of test
output, and that deqp-runner brings in some neat new features, too (piglit
in your deqp-runner suite, and extension list checking).
A bunch of VK tests got renamed, so I replaced panvk's custom test list
with simple include filters on the main test list.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14920>
Meson devenv is a feature added in meson 0.58 (thus the features is
version guarded) that allows creating a shell environment with
environment variables automatically setup for running the project inside
the build dir. Some variables (such as LD_LIBRARY_PATH and PATH) are set
automatically, others must be added by the project.
For vulkan is is relativley simple, we create a new, uninstalled, icd
file for each driver and set the VK_ICD_FILENAMES variable
appropriately. This can be used with:
```sh
meson devenv -C $builddir
```
then, vulkan applications will automatically use the uninstall vulkan
driver, no need to install.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14826>
This brings in some interesting new vulkan tests and fixes for the
spurious KHR-GL TF failures. Also, reduces the runtime of
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36 so that it
should stop timing out.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
After starting to use a new version of the simulator, it got
outdated.
We made some initial effort to update it, but it was not
working. Taking into account that no one is using it, it is better to
just remove it.
We keep the noop drm drivers, as they could have some value for
developers that doesn't have access to the v3dv3 simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14682>
Semaphores info was stored as an info of event_wait cpu jobs and this
leads to mem leak when the same event_wait job in the same cmd buffer
batch was submitted more than once. As a result,
`dEQP-VK.api.command_buffers.record_simul_use_primary` fails due to a
double-free of sems_info.
In this patch, we no longer use v3dv_event_wait_cpu_job_info to store
semaphores from a submit info, since semaphores is related to a queue
submission and not to the event_wait job type. If we spawn a wait_thread,
we copy semaphores to an auxiliary struct (v3dv_wait_thread_info) that
will be used in wait_thread to get job and semaphores information. When
the spawned thread finishes, it releases the related
v3dv_wait_thread_info and the semaphores copy as well.
Fixes: d5bd18fb ("v3dv: store wait semaphores in event_wait_cpu_job_info")
Suggested-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14736>
Just as with all other TLB operations, we can only use the TLB if the render
area is aligned to tile boundaries. If it is not, then the operation would
overwrite pixels outside the render area, which is not allowed.
In this case, we can't even emit a previous TLB load to fix this because the
TLB has the multisampled attachment, not the resolve attachment, which is
just a destination buffer for the tile store.
Because the condition for tile alignment has to be determined for each
subpass, we handle this by storing this information in the attachment
state of the command buffer with the start of each subpass. We store
whether the attachment is to be resolved and whether it can use the
TLB (considering tile alignment restrictions).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14752>
This is a left over from when we added multi-version support in the
driver, where we turned this helper into a versioned scheme.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14704>
The new runner reduces the runtime by about 1/3 thanks to using rust
instead of python, and includes automatic flake handling so you don't just
have to skip flaky tests. The wrapper script also includes IRC flake
reporting (so one can track and update the flakes list to improve CI
reliability), always uploading results to CI for review (so you can
diagnose flakes and look at timings), has a prettier regressions report
and a helpful timing report, and is the same as what's used by all the HW
runners as well.
The downside is that by dropping the massive list of skips, you no longer
get flagged if Mesa refactors end up accidentally disabling extensions and
thus making tests skip. For that, I've started on
https://gitlab.freedesktop.org/anholt/deqp-runner/-/merge_requests/33 so
that hardware drivers get extension checking coverage too.
Thanks to the perf improvement, we get to drop one of the jobs for
llvmpipe.
xfail lists were mostly sed-jobs from the prior expectations lists. The
exceptions to that you'll find in the form of whitespace around the
affected test group (usually changes of capitalization or
special-characters), or an explanation for the more interesting changes
(which thankfully we can now record in the xfails lists!).
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14604>
Allow to vectorize operations from a smaller bit-size into
scalar operations of a larger bit-size. This allows us to
turn 2x8-bit into a equivalent scalar 16-bit load/store.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This generalizes the support we added for 16-bit to also handle
8-bit loads via ldunifa. The story is the same: we align the address
to 32-bit downwards and we skip any bytes that are not of interest.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Just like with 16-bit, this mode only supports scalar access, but
we are already lowering all non 32-bit accesses to scalar.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Since ldunif is a 32-bit instruction we need to demote these to
UBO loads, like we do for indirect indexing, with the exception
of scalar 16bit uniforms with an offset that is 32-bit aligned.
For the exception where we can use lfdunif we read a 32-bit slot
from memory where the uniform data is in the lower 16-bit and we
will read garbage in the upper 16-bit which we won't use anyway.
It should be noted that by using ldunif, we are consuming
32-bit from the uniform stream, but this is fine because
if there is valid uniform data in the upper 16-bit (i.e.
we had a ivec2 uniform aligned to a 32-bit address), since
we scalarize 16-bit loads, we would see another load uniform
with an unaligned offset for the second component, which we
will demote to UBO.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
These are required by VK_KHR_16bit_storage. Our hardware, however,
doesn't provide any mechanism to decide on the rounding mode of
the conversion and it seems to be using RTE, so we implement
RTZ in software.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
If we know we have a load with a constant offset, then even if it
is not aligned to 32-bit we can still produce an aligned offset
and then skip over the bytes we don't need.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Even though ldunifa is strictly 32-bit we may be able to use it
to load 16-bit values that sit at 32-bit aligned addresses.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
The vectorization pass can inject 32_2x16 (un)packing opcodes
upon successful vectorization of 16-bit operations into 32-bit
counterparts, so make sure we lower these to something our
backend can handle.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
This allows us to implement 16-bit access on uniform and
storage buffers.
Notice that V3D hardware can only do general access on scalar
16-bit elements, which we currently enforce by running a lowering
pass during shader compile.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.
It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Without this the simulator wrapper will abort upon seeing this
query, rendering the driver unusable in that context.
Also, it seems the simulator environment doesn't quite work with
multisync at present, so do not enable it until we figure out what
the issue is.
Reviewed-by: Melissa Wen <mwen@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14678>
When we create a image view with D24S8 format we made a reformatting
to RGBA8UI if the aspect selected is just STENCIL. But when
configuring the stores we select the aspects based on the attachment
format. Quoting from cmd_buffer_render_pass_emit_stores:
/* From the Vulkan spec, VkImageSubresourceRange:
*
* "When an image view of a depth/stencil image is used as a
* depth/stencil framebuffer attachment, the aspectMask is ignored
* and both depth and stencil image subresources are used."
*
* So we ignore the aspects from the subresource range of the image
* view for the depth/stencil attachment, but we still need to restrict
* the to aspects compatible with the render pass and the image.
*/
const VkImageAspectFlags aspects =
vk_format_aspects(ds_attachment->desc.format);
So we could ending trying to store on a Z+Stencil buffer, using a
RGBA8UI format.
So far this only affected some tests when using the simulator
(assert). Those were working on the real hw, but probably would fail
on other situations, so lets use the original image format on that
case.
v2 (Iago)
* Improve comment grammar
* Do the same on load too (not just store)
v3 (Iago)
* Re-word comments.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14635>
We track last submitted jobs by queue type. After all cmd buffer
batches have been submitted, we emit a noop job that waits all jobs
submitted to each GPU queue complete and signals the fence.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphores support, we can use a GPU job to handle
multiple signal semaphores in the end of a cmd buffer batch. It
means, the last job in the last cmd buffer will be in change of
signalling semaphores as long as it meets some conditions:
1 - A GPU-job signals semaphores whenever we only have submitted
jobs for the same queue (there is no syncobj created for any
other type). Otherwise, we emit a noop job that waits on the
completion of all jobs submitted and then signals semaphores.
2 - A CPU-job is never in charge of signalling semaphores. We
process it first and emit a noop job that depends on all jobs
previously submitted to signal semaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With multiple semaphore support, we can improve the way we handle
wait semaphores considering different job types and cmd buffer
batch scenarios, that means:
- A GPU job depends on wait semaphores whenever it is the first job
submitted to a queue in this command buffer batch (the `first` flag
for the job's queue type is set).
- For the first CPU job, if there are wait semaphores, we should
wait for the CPU and GPU being idle to process the job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
The order in which a GPU job is scheduled is guaranteed within the
same queue type (CL, TFU, CSD), but the order of completion of jobs
from different queues cannot be guaranteed. Since we have multiple
semaphores support now, we can track the completion of the last job
submitted to each queue and therefore better determine when gpu is
idle. We do it using an array of syncobj (last_job_syncs) for each
GPU queue (CL, TFU, CSD). With this, job serialization also become
more accurate. We also keep tracking the very last job submitted
(last_job_sync became an element of the last_job_syncs array as
V3DV_QUEUE_ANY) for the case we don't have multisync support.
To help in handling wait semaphores, we set a flag per queue to
indicate we are starting a new cmd buffer batch and a job submitted
to this queue will be the first.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
In addition to keep a copy of wait semaphores, extend
v3dv_submit_info_semaphores to hold a copy of signal semaphores too.
With a copy of wait and signal semaphores, we can enable GPU jobs to
handle more than one wait and signal semaphores.
By now, we don't change the way as we signalling semaphores when all
jobs complete, i.e., we still use the master thread to signal
semaphores. In this context, no GPU job is actually in charge of
signalling, but the support for multiple signal semaphores is done
here.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in a tfu job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Whenever v3d kernel-driver supports multisync extension, use it to
enable more than one semaphores in cl submission. In CL, we have two
kind of job (bin and render), therefore, we need also to determine
the stage to sync, that means to add job dependencies/wait
semaphores.
Also, as we currently process all signal semaphores of a cmd buffer
batch together in the submit master thread (when the last wait
thread completes), there isn't now a situation in which GPU jobs
need to handle signal VkSemaphores.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of a boolean (sem_wait) in v3dv_event_wait_cpu_job_info,
that is used to determine wait condition for jobs put in a wait
thread, copy the wait semaphores data and store it as struct
v3dv_submit_info_semaphores. In the following patches we enable
multiple semaphores in GPU jobs, and therefore we need this data
to add wait semaphores as job dependencies for pending jobs
submitted from a wait thread.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Instead of pass pSubmit to queue_submit_cmd_buffer, create a struct
v3dv_submit_info_semaphores to wrap semaphores data from VkSubmitInfo.
In the next commit, this struct will help to handle wait condition for
jobs submitted in a wait event context, since we need to hold this
data when handle wait events and pass it to queue_submit_job() called
from wait threads. The main goal is to allow multiple wait semaphores
in a job submission. Later, this struct will be extended to include a
copy of signal semaphores too.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
is_wait_thread is passed, but not actually used; and cpu_queue_handle_idle
is in charge to handle wait threads spawned before this one.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
With the proper version checking in the common vulkan instance code
(commit 88b9b68) it is now possible to bring the reported interface
version up to v5.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14563>
This saves a lot of pointless gl.h includes across the board,
it moves the one place that needs GLenum into a separate file
only used in those passes that require it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.
don't store the enum in shader_info as it changes size, and confuses
other things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
Effectively moves most of v3dv_wsi_can_present_on_device to the
common code to be used in other drivers.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11091>
Double buffer mode splits the tile buffer size in half so we can
start processing the next tile while the current one is being
stored to memory. This mode is available only if MSAA is not enabled
and can, in theory, improve performance by reducing tile store
overhead, however, it comes at the cost of reducing the tile size,
which also causes some overhead of its own.
Testing shows that this helps some cases (i.e the Vulkan Quake
ports) but hurts others (i.e. Unreal Engine 4), so for the time
being we don't enable this by default but we allow to enable it
selectively by using V3D_DEBUG.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14551>
Because these formats are introduced trough an extension, their
enum values are exceedingly large and we cannot use them to index
directly into the format table we had for core formats. Instead,
we put these in a separate table and we always use the
VK_ENUM_OFFSET helper to index into these tables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14533>
vk_icdNegotiateLoaderICDInterfaceVersion now correctly identifies the
driver as supporting v4. Before, the driver did support the
functionality but didn't report supporting it through the negotiate
function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14299>
Allow the jobs to be available for MRs.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14361>
Brings in these changes:
af1785f31 occlusion_query_conform: skip GetQueryCounterBits test if needed
dad078717 occlusion_query_conform: convert to pilgit subtests
b52c1c761 glsl-1.30: test nested preprocessor concat
6c4da153b texture-storage: Fix subtest result handling of skips.
4343f19db fbo-integer: Remove the invalid DrawPixels test.
e3842f2fe arb_dsa: exclude stencil8 textures from test sets.
ce8649be7 spec/ext_external_objects: Fix build on Debian systems
4e553838f glsl: add basic tests for desktop GLSL invariant qualifier linking
7e61e5199 Tests for variable in and out of loop scope
f855ad1c8 fbo-mrt-alphatest: Only require GLSL 1.20
9be2fe999 glx: add glx-multi-display-single-pbuffer test
bfe290725 glx: add glx-swap-pbuffer test
efa64335e framework: Fix build on Windows when using waffle
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14468>
This is an actual functional change as we now plumb through the sync FD
instead of doing a vkQueueSubmit and trusting in implicit sync.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14372>
All three implementations are identical.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14372>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) and we would double-free
self-imported BOs. We tried to fix that in commit d809d9f3
but that broke 2) and we started to leak BOs for some imports.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5769
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14392>
Acknowledgements to android-rpi team and lineage-rpi maintainer (KonstaT)
for creating/testing initial vulkan support. Their experience was used as
a baseline for this work.
Most of the code is a copy of turnip and anv.
Improved by cleaning dEQP failures:
- Improved gralloc support (use allocation time stride, size, modifier).
- Fixed some dEQP crashes due to memory allocation issues.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14016>
Sort/rename the files so expected tests are classified by device.
No need to split the tests by driver (e.g., V3D vs V3DV).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13983>
This commit fixes apps using the following sequence:
1. XCreateWindow(dpy) -> win
2. glXCreateContextAttribsARB(dpy, ...) -> ctx
3. glXMakeCurrent(dpy, win, ctx)
4. glXQueryDrawable(dpy, win, GLX_FBCONFIG_ID, ...)
glXQueryDrawable returned 0 (while correctly returning a valid
GLXFCONFIG_ID for other types of drawables).
This commit adds the same dance as driInferDrawableConfig to get
the GLX visual from the Window, and then the GLXFBCONFIG_ID of
this visual.
This fixes:
* piglit: glx-query-drawable --attr=GLX_FBCONFIG_ID --type=WINDOW
* Maya which uses the config ID from step 4 as an input to
glXChooseFBConfig.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14174>
They are being used on integer to integer stores. From Vulkan sec,
final paragraph of 16.4.4 "Texel Output Format Conversion":
"Each component is converted based on its type and size (as
defined in the Format Definition section for each
VkFormat). ... Integer outputs are converted such that their value
is preserved. The converted value of any integer that cannot be
represented in the target format is undefined."
I didn't find a equivalent quote for OpenGL as all conversion entries
are forcused on float to integer, fixed-point to integer, etc, and not
on integer to integer. Didn't find any test failure with this change.
We didn't get any shader-db stats change with shaderdb (even
overriding to OpenGL 4.4 to get more shaders built), so as a reference
Vulkan shader-db stats with the pattern
dEQP-VK.image.*.with_format.*.*
total instructions in shared programs: 37534 -> 36522 (-2.70%)
instructions in affected programs: 12080 -> 11068 (-8.38%)
helped: 241
HURT: 0
Instructions are helped.
total uniforms in shared programs: 9100 -> 8550 (-6.04%)
uniforms in affected programs: 3004 -> 2454 (-18.31%)
helped: 229
HURT: 0
total max-temps in shared programs: 6110 -> 6014 (-1.57%)
max-temps in affected programs: 402 -> 306 (-23.88%)
helped: 43
HURT: 0
Max-temps are helped.
total nops in shared programs: 1523 -> 1526 (0.20%)
nops in affected programs: 21 -> 24 (14.29%)
helped: 3
HURT: 6
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14194>
This was somehow missed by me and during review.
Fixes fcfc4ddfccd5: ("v3dv: Fix V3DV_HAS_SURFACE preprocessor condition")
Signed-off-by: Roman Stratiienko <roman.o.stratiienko@globallogic.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14190>
Instead of stopping the merge process when we find an instruction
with an incompatible signal (such as an small immediate), keep
going and see if we can merge the thrsw in a previous instruction
that is compatible.
total instructions in shared programs: 13409835 -> 13356648 (-0.40%)
instructions in affected programs: 3556860 -> 3503673 (-1.50%)
helped: 17457
HURT: 18
Instructions are helped.
total max-temps in shared programs: 2353971 -> 2352956 (-0.04%)
max-temps in affected programs: 13960 -> 12945 (-7.27%)
helped: 703
HURT: 0
Max-temps are helped.
total spills in shared programs: 12301 -> 12301 (0.00%)
total sfu-stalls in shared programs: 32596 -> 32499 (-0.30%)
sfu-stalls in affected programs: 225 -> 128 (-43.11%)
helped: 79
HURT: 3
Sfu-stalls are helped.
total nops in shared programs: 347204 -> 325234 (-6.33%)
nops in affected programs: 99834 -> 77864 (-22.01%)
helped: 11515
HURT: 158
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14172>
Currently V3DV_HAS_SURFACE is always defined.
There is no WSI for Android in mesa3d, therefore WSI related extensions
should not be exposed.
1. Define V3DV_HAS_SURFACE only for platforms which has WSI implemented.
2. Rename V3DV_HAS_SURFACE -> V3DV_USE_WSI_PLATFORM to align naming
with other platforms.
Fixes dEQP-VK.wsi.android.surface#query_protected_capabilities
Fixes: 79e4451430 ("v3dv: move extensions table to v3dv_device")
Signed-off-by: Roman Stratiienko <roman.o.stratiienko@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14144>
This makes is a bit more portable, especially on 32bit architectures
with 64bit time_t defaults. Especially on musl its a must.
Fixes
../mesa-21.3.0/src/broadcom/vulkan/v3dv_bo.c:71:15: error: format specifies type 'long' but the argument has type 'time_t' (aka 'long long') [-Werror,-Wformat]
time.tv_sec);
^~~~~~~~~~~
Also reported here [1]
[1] https://github.com/agherzan/meta-raspberrypi/issues/969
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14118>
When mesa3d is built without VK_USE_PLATFORM_DISPLAY_KHR definition,
dEQP test fails:
dEQP : Test case 'dEQP-VK.info.instance_extensions'..
dEQP : Fail (Extension VK_KHR_get_display_properties2 is missing
dependency: VK_KHR_display)
dEQP : DONE!
Enable KHR_get_display_properties2 only if VK_USE_PLATFORM_DISPLAY_KHR
is enabled.
Fixes: f884c2e3be ("v3dv: implement VK_KHR_get_display_properties2")
Signed-off-by: Roman Stratiienko <roman.o.stratiienko@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14047>
Through their specific PIPE_CAP.
v2 (Iago)
- Add comment in test failure
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
In the V3D driver there is a NIR lowering step for `image_store`
intrinsic, where the image store format is required for doing the proper
lowering.
Thus, let's define it for the download FS instead of
keeping it as NONE.
v2 (Illia)
- Use format only for drivers not supporting format-less writing.
v4 (Illia):
- Use PIPE_CAP_IMAGE_STORE_FORMATTED to reduce combinations.
v5 (Ilia):
- Use indirect array for download FS in not formatless-store support
drivers.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13409>
The early-Z test uses Z values produced from FEP, so when
we write Z from a shader we need to disable EZ. However, there
are some instances where want to write the FEP-Z from the shader,
in which case we would not need to disable EZ.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
In some cases we need to make the shaders write the Z value produced
from rasterization (FEP). Track these instances because they are relevant
to early EZ setup.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
egl-copy-buffers test has been fixed for dri3. So remove
it from broadcom and freedreno ci fail list to prevent the
gitlab ci test fail:
spec@egl 1.4@egl-copy-buffers,UnexpectedPass
Also remove it from radeonsi ci fail list since I verified
on radeonsi.
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13868>
Even although the option is called shaderdb, it is not really used by
shaderdb (for V3D shaderdb uses the debug option "precompile"). And in
fact, right now the output format is not compatible with shaderdb.
This commit tries to fix that, and as we are here, also try to make
the option more useful for the Vulkan case, as that debug option also
works with v3dv.
We can't really fully imitate shaderdb use with OpenGL (run with a set
of glsl shader tests), but we can at least assign a unique name (the
pipeline sha1 in text format) so we can compare executions of the same
vulkan application. For that remember to disable the on-disk cache.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13938>
If we did, we would have the instruction coming right after ldvary write
to the same implicit destination as ldvary at the same time. We prevent
this when merging instructions, but we should make sure we prevent this
when we move ldvary around for pipelining too.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13921>
According to the spec the hardware locks the scoreboard on the first
or last thread switch (selected via shader state) and any TLB accesses
executed before this are not synchronized by hardware.
This change updates the logic to ensure we respect this requirement
and that we don't assume that the lock is acquired automatically
on the first TLB access, which is not valid at least since V3D 4.1+.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
Writes to physical registers are not allowed after thread end. We
were checking this for ALU writes, but we need to check it for
signal writes too.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
This solves a case where a NIR geometry shader was storing the output in
a non-constant:
vec4 32 ssa_1 = load_const (0xc0800000 /* -4.000000 */, 0xc1100000 /* -9.000000 */, 0x40400000 /* 3.000000 */, 0x40e00000 /* 7.000000 */)
vec1 32 ssa_7 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_8 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_9 = iadd ssa_7, ssa_8
vec1 32 ssa_19 = mov ssa_1.x
intrinsic store_output (ssa_19, ssa_9) (1, 1, 0, 160, 288) /* base=1 */ /* wrmask=x */ /* component=0 */ /* src_type=float32 */ /* location=32 slots=2 gs_streams(x=0 y=0 z=0 w=0) */
When lowering the VPM output we check if the destination (ssa_9 in this
case) is a constant to add to the VPM offset. We run a constant folding
optimization in an earlier VS lowering, and we should do the same for
GS.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
While fragment and geometry shader were handling structs as inputs, they
weren't doing for it arrays of structures.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures and
assertions.
v2:
- Fix style (Iago).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
v3dv, radv, and turnip are using several C&P format helpers (most of
them wrappers over util_format_description based helpers). methods.
This commit moves the common helpers to the already existing common
vk_format.h. For the case of v3dv we were able to remove the vk_format
header. For turnip and radv, a local vk_format.h header remains, with
methods that are only used for those drivers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13858>
When early fragment tests are mandated by the shader, we must use
the Z value produced by the FEP even if there are elements that
would typically require late fragment tests (such as discards,
sample to coverage, etc).
This change means we also need to be a bit more careful when
we promote shaders to use early fragment tests so we don't
promote anything with discards for example.
Fixes:
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_depth
dEQP-VK.fragment_operations.early_fragment.discard_early_fragment_tests_stencil
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13837>
We are using the same definitions for both OpenGL and Vulkan, so let's
move it to common.
As we are here we are also adding versioning on the TFU register
definition. Those are basically register bit places, so really likely
to change between versions.
Adding 33 as it is the first version they got defined.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13832>
Depth/stencil formats can, at worse (d32/d24s8), be exactly 32bpp,
which is the minimum we can program for the internal format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13816>
In order to implement GL_PRIMITIVES_GENERATED, v3d allocates a small
resource and adds a command to the job to store the prim counts to it.
However it was only doing this when TF was enabled which meant that if
the query was used with a geometry shader but no TF then the query would
always be zero. This patch makes the driver keep track of how many
PRIMITIVES_GENERATED queries are in flight and then enable writing the
prim count if its more than zero.
Fix dEQP-GLES31.functional.geometry_shading.query.primitives_generated_*
v2: Update CI expectations and references to fixed tests in commit log.
v3: - Add comment that GL_PRIMITIVES_GENERATED query is included because
OES_geometry_shader, but it is not part of OpenGL ES 3.1. (Iago)
- Update Fixes to commit introducing geometry shaders. (Iago)
Fixes: a1b7c084 ("v3d: fix primitive queries for geometry shaders")
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13712>
We had been storing pointers to a driver owned swizzle table
rather than storing the actual swizzle value in various shader
and pipeline keys on both GL and Vulkan drivers.
This doesn't look very robust, particularly since we also
compute sha1 hashes from these values and we may store these
hashes to disk (for the disk cache).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13738>
Typically, optimization passes go through all the blocks in a shader
and make adjustments on the fly, so we always want them to update
the current block or the current block pointer will become outdated.
Also, we don't need to keep track of the previous current block
pointer to restore it, since optimization passes run after we have
completed conversion to VIR, and therefore, anything that comes after
that should always set the current block before emitting code.
Fixes debug assert crashes when running shader-db:
vir.c:1888: try_opt_ldunif: Assertion `found || &c->cur_block->instructions == c->cursor.link' failed
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13625>
It exists precisely to handle this case without the driver looking up
trampolines itself. This is nearly identical to what ANV does.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13156>
The test names are definitely unique (deqp has specific prefixes, piglit
uses '@' as a separator instead of '.'), so we can just have a single file
regardless of test type. Merges the two groups of xfails together so you
can't mix up which file to edit (I certainly have), and so that we don't
need to introduce yet another set of files when we add gtest for libva.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13517>
We have two testsuites with the same format for fails/flakes/skips files,
and test names that are definitely unique. As I'm about to add a third
testsuite (gtest for libva-utils), so let's have just one file each for
fails/flakes/skips instead of one per type of testsuite. This starts the
move with just the bulk rename of deqp.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13517>
This was not quite correct in that our checks for the allowed cases
were not checking that there were no other peripheral access other
than the ones allowed.
For example, we allowed wrtmuc signal and TMU write other than
TMUC, and we also allowed TMU read and VPM read/write. But we
cannot allow wrtmuc with TMU write other than TMUC and at the
same time a VPM write for example, so we can't just check if we
have a combination of allowed peripherals, we still need to check
that those are the only ones in use by the combined instructions.
Another example is that even if we allow a TMU write (other than TMUC)
with a wrtmuc signal, the resulting instruction must still have just
one TMU write other than TMUC, but we were allowing the merge if one
instruction signaled wrtmuc and the other wrote to tmu other than tmuc
without testing if the combined result would have 2 tmu writes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13527>
We had an implementation for image copies and another for buffer to
image copies. Refactor the code so we have a single implementation
of this.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13481>
This was not considering the possibility that the driver has called
nir_before_block() or nir_after_block() to update the cursor, in which
case the cursor link points to the instruction list header and not
to an actual instruction.
Fixes incorrect debug-assert crash in:
dEQP-VK.graphicsfuzz.cov-increment-vector-component-with-matrix-copy
Fixes: 265515fa62 ("broadcom/compiler: check instruction belongs to current block")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13467>
A TSY barrier becomes effective at the point of the next thread switch,
so if we have one coming after a previous thread switch we need to
be careful not to emit it in its delay slots, or we would be effectively
moving the barrier earlier than intended.
Fixes simulator assert crash in:
dEQP-VK.graphicsfuzz.two-for-loops-with-barrier-function
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13468>
This is prettier in the log files, less shell code, and for non-suite mode
adds checking that the driver has the right git sha1. Also, no need for
suites to have a DEQP_VER to say which dEQP we should run for the renderer
check.
The version checks can help us make sure that GL version exposed doesn't
accidentally regress, and the ".*git" checks that we're using a git
version of Mesa rather than something that snuck in through distro
packages.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13372>
Another instance of not taking the Z offset from the right place. We had
not seen this one until now because we typically use the TFU path, where
we also fixed this same issue in commit df1d08533c.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13356>
The basic vertex+fragment shader state uses the packet
GL_SHADER_STATE, but when geometry shader are involved, the packet
used is GL_SHADER_STATE_INCLUDING_GS.
Without this commit any program using a geometry shader would dump
their shader state (and their shader state record and attribues) as
binaries.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13269>
At that point we didn't call all the v3dv lowerings. So the reference
nir shader used to call the v3d compiler could be different.
Note that at that point the nir shader is only available for internal
shaders (like gs multiview).
This specifically affected multiview tests that wrote gl_PointSize, as
the nir shader for the geometry shader were wrongly exposing
per_vertex_point_size as false, as we were basing our check on the
nir_shader_info, and that was gathered calling nir_shader_gather_info
at pipeline_lower_nir.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13325>
Now that all spirv_to_nir() users take care of converting sysvals to
varyings, we can unconditionally declare FragCoord as a sysval.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
This is an attempt at simplifying the spirv_to_nir() backend when it
comes to choosing between system values and input varyings. Let's patch
drivers to do the sysval to input varying conversion on their own so we
can get rid of the frag_coord_is_varying field in spirv_to_nir_options
and unconditionally create create sysvals for FragCoord, FrontFacing and
PointCoord inputs instead of adding new xxx_is_{sysval,varying} flags.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
these were duplicated all over the place, and it's annoying to have to keep
duplicating them any time a new component includes the vulkan header
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13141>
This commit fixes two issues:
* On CreateInstance, we are freeing the instance, and then trying to
use it when calling vk_error. This could be problematic, so let's
just use NULL.
* On CreateDevice, we are getting a unsupported feature error, and
then trying to call vk_error using the instance. That's is not
really a instance error, and will assert when the ongoing common
vk_error lands mesa. Let's use NULL instead, as the object it
applies, the device, was not created.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13219>
This is already handled by vk_device_init(); drivers no longer
need to do it themselves.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12867>
The core ones have some nifty stuff like asserts that it's a valid
vk_object_base and has the right type. We don't have real type safety
with Vulkan handles but this is as close as we can get. The core ones
also track when we've started handing out handles for logging purposes
which we want.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13101>
We had some with unlikely, some without it. Let's just put unlikely to
all of them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13071>
It is really small, and used just twice, so we just call qpu_magic.
We also update how it is used:
* QFILE_NULL is an undef so we can just load anything. Previously we
were using accumulator 0, but there isn't any real reason to use
an accumulator for this. Using reg 0.
* QFILE_LOAD_IMM: it seems that we don't use at all right now, so
let's add an assert
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13008>
Switch to using common structure.
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13000>
Switch to using common structure.
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13000>
All our GLES2-only divers were failing these KHR tests because we were
missing new OES_required_internalformat internalformats for CopyTexImage.
Cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12962>
The negative API tests ask to transpose a non-matrix uniform, and expect
the transpose error rather than the non-matrix error. This may be a test
bug about ambiguous results, but since every other driver is presumably
doing this too, just follow along.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12924>
Instead of creating our own based on the V3D version. CTS waivers
are registered using a combination of VendorID and DeviceID, so if
we want to reuse any wavers filed by Broadcom we want to use the
same identifiers. We are already using the Broadcom VendorId, so
let's start using the same deviceID as well.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12865>
dEQP-VK.api.external.fence.opaque_fd.signal_export_import_wait_permanent
became a flake in latest kernel (5.10.60-v8+)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12832>
Dumps the command list, excluding the binary resources.
v2 (Juan):
- Make this option independent from `cl`
v3 (Iago):
- Rename option name
- Fix style issues
- Do not print BO ranges
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12803>
Brings in these changes:
99be1b06ff36 framework/replay: Display the image differences if any
3074b9c72b3d glsl-predication-on-large-array: Test predication on values from large array
c97da22d35b4 cmake: Fix gbm test compiling
0cbccd68c3c1 piglit: Find our data directory when we're invoked through a symlink
4eb71fc10bbe arb_sso: add test that has explicit locations and array fields in ifc
fa9c82380273 glsl-1.30: test shadow var in a switch
aa7f042b0417 glsl-1.30: add tests for incorrect "compare to 0" optimizations
60138ef32ec1 add explicit tests for GetFragDataLocation/Index(gl_Frag*)
4a8806696b90 egl: add test for EGL_KHR_display_reference
d6b7053b4e52 glsl-1.30: test that switch expression is evaluated once
8023a3c945c3 arb_shader_storage_buffer_object: Require extension on the new test
8820cac60827 pbobench: Fix sometimes-uninitialized warning.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12675>
As we don't know if we are going to have spilling or not, emit always a
last thrsw at the end of the shader.
If later we don't have spillings and we don't need that last thrsw, we
remove it and switch back to the previous one.
This way we ensure all the spilling happens always before the last
thrsw.
v2 (Juan):
- Rework the code to force a last thrsw and remove later if no spilling
v3:
- Merge functionality inside vir_emit_last_thrsw (Iago)
- Add vir_restore_last_thrsw (Juan)
v4 (Iago):
- Fix/add new comments
- Rename variables/parameters
v5 (Iago):
- Fix comments
- Add assertion
Cc: mesa-stable
Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4760
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12322>
This function is only used in v3d_nir_to_vir(), so make it private.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12322>
We had an optimization to auto-enable early fragment tests when a shader
didn't have side effects, but of course, we cannot do that this if the
shader writes Z, as in that case the fragment tests need to use the
value written from the shader.
Also, if the shader enables early fragment tests, then any shader Z
writes should be ignored.
Fixes:
dEQP-VK.spirv_assembly.instruction.graphics.early_fragment.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12736>
I saw an unrelated marge pipeline fail on
dEQP-VK.robustness.buffer_access.through_pointers.graphics.reads.fragment.4B_out_of_memory_with_vec4_f32,
and it sure looks like all of these are flaky.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12681>
Fix defect reported by Coverity Scan.
Same on both sides (CONSTANT_EXPRESSION_RESULT)
pointless_expression: The expression inst->qpu.flags.auf !=
V3D_QPU_UF_NONE || inst->qpu.flags.auf != V3D_QPU_UF_NONE does not
accomplish anything because it evaluates to either of its identical
operands, inst->qpu.flags.auf != V3D_QPU_UF_NONE.
Fixes: 3f2c54a27f ("broadcom/compiler: rewrite partial update liveness tracking")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12385>
Prior to this commit, the stride, offset and modifier were fetched
via WINSYS_HANDLE_TYPE_KMS. However we can't make such a query
succeed if the buffer couldn't be imported to the KMS device.
Instead, implement the resource_get_param hook to allow users to
fetch this information without WINSYS_HANDLE_TYPE_KMS.
A tiny helper function is introduced to compute the modifier of a
resource.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 7bcb223639 ("v3d, vc4: Fix dmabuf import for non-scanout buffers")
Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12370>
Glue together all the GLES related jobs using the suites feature.
This allow us to reduce the total number of devices required, moving
some of them to help in other jobs, and the remaining free for other
pipelines in parallel.
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12453>
Right now opcode_desc struct, used to define data for all the
operations to pack/unpack, include a version field. In theory that
could be used to check if we are retrieving a opcode valid for our hw
version, or to get the correct opcode if a given one changed across hw
versions, or just the same if it didn't change.
In practice that field was not used. So for example, if by mistake we
asked for an opcode defined at version 41, while being on version 33
hardware, we would still get that opcode description.
This commit fixes that, and as we are here we expand the functionality
to allow to define version ranges, just in case a given opcode number
and their description is only valid for a given range.
v2 (from Iago feedback):
* Fixed some comment typos
* Simplified filtering opcode method
* Rename filtering opcode method
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12301>
Right now there is a helper to get the opcode description from a
packed instruction, used on unpack related instructions. This commit
adds a helper that refactors the equivalent that is already in use on
pack related instructions.
Right now the helper is small, but we plan to extend it on following
commits in order to use the opcode description version field.
To avoid any possible confusion we rename the existing lookup helper.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12301>
* Remove one about waddr 6 being reserved, when at some point it
become NOP
* Fix one comment about reserved signals on v41 map, as 24 and 25
are in fact defined. This seems a C&P issue (see v40 map).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12301>
This fixes an issue related with testing this with a kernel with the
performance counters enabled: it introduces a "pad" field that in the CL
submission structure that is not initialized.
Fixes: ca13868098 ("drm-uapi: add v3d performance counters")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12390>
When incrementing unifa address in DCE optimization, ensure that we
setup correctly the current block, so the ldfunif optimization is also
executed correctly.
This fixes
dEQP-VK.graphicsfuzz.cov-struct-float-array-mix-uniform-vectors
heap-buffer overflow with address sanitizer enabled.
v2 (Iago):
- Save and restore current block
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12339>
While this feature is optional in Vulkan 1.1 and we don't currently
expose it, the CTS still requires that the entry points exist.
From the Vulkan 1.1 spec:
"If the VK_KHR_sampler_ycbcr_conversion extension is not supported,
support for the samplerYcbcrConversion feature is optional."
(...)
"samplerYcbcrConversion specifies whether the implementation supports
sampler YCBCR conversion. If samplerYcbcrConversion is VK_FALSE,
sampler YCBCR conversion is not supported, and samplers using sampler
YCBCR conversion must not be used."
Fixes (with Vulkan 1.1 exposed):
dEQP-VK.api.version_check.entry_points
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12338>
The code we had for this was a work in progress and not finished. Also,
it was geared towards partial writes caused by output packing (i.e.
fp16) and was ignoring partial updates caused by conditional writes,
which are far more common in our case.
This change provides an implementation for tracking conditional writes
that works in tandem with the previous spill change to narrow liveness
for their spills.
Fixes register allocation failures in:
dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite
We also gain one shader from shader-db:
total instructions in shared programs: 13339969 -> 13338584 (-0.01%)
instructions in affected programs: 185520 -> 184135 (-0.75%)
helped: 375
HURT: 130
Instructions are helped.
total threads in shared programs: 412038 -> 412040 (<.01%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
total uniforms in shared programs: 3746581 -> 3746585 (<.01%)
uniforms in affected programs: 49 -> 53 (8.16%)
helped: 0
HURT: 1
total max-temps in shared programs: 2359960 -> 2359947 (<.01%)
max-temps in affected programs: 289 -> 276 (-4.50%)
helped: 7
HURT: 0
Max-temps are helped.
total sfu-stalls in shared programs: 34351 -> 34359 (0.02%)
sfu-stalls in affected programs: 218 -> 226 (3.67%)
helped: 35
HURT: 37
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13374320 -> 13372943 (-0.01%)
inst-and-stalls in affected programs: 186653 -> 185276 (-0.74%)
helped: 373
HURT: 132
Inst-and-stalls are helped.
LOST: 0
GAINED: 1
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12278>
A spill of a conditional write generates code like this:
mov.ifa t5000, 0
mov tmud, t5000
nop t5001; ldunif (0x00008100 / 0.000000)
add tmua, t11, t5001
Here, we are spilling t5000, which has a conditional write, and we
produce an inconditional spill for it. This implicitly means that
our spill requires a correct value for all channels of t5000.
If we do a conditional spill, then we emit:
mov.ifa t5000, 0
mov tmud.ifa, t5000
nop t5001; ldunif (0x00008100 / 0.000000)
add tmua.ifa, t11, t5001
Which only uses channels of t5000 that have been written by the
instruction being spilled.
By doing the latter, we can then narrow down the liveness for t5000
more effectively, as we can use this to detect that the block only reads
(in the tmud instruction) the values that have been written previously
in the same block (in the mov instruction). This means that values in
other channels are not used, and therefore, we don't need them to be
alive at the start of the block. This means that if this is the only
write of t5000 in this block, we can consider that the block
completely defines t5000.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12278>
The spill base setting instructions (which includes some uniforms) are
added in the entry block, not in the current block. When ldunif
optimization is applied, the cursor is pointing to instructions in the
entry block, but the current block is a different one. This leads to a
heap-buffer-overflow when going through the list of instructions
(detected by the address sanitizer).
Thus change the current block to entry block, and restore it after the
setup is done.
This fixes
dEQP-VK.ssbo.readonly.layout.single_struct.single_buffer.std430_instance_array_comp_access_store_cols
with address sanitizer enabled.
v2:
- Set current block instead of disabling ldunif optimization (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12221>
Brings in duplicate subtest fixes, gpu_shader4 tests, and more. This
shuffles the radeonsi fractional test run, so we get to catch up with more
failing subtests.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12110>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
Add support for performance counters when using the simulator.
v2 (Iago):
- Remove brackets from single-line conditionals
- Rename channel to channels
- Ensure perfmon start/stop function is implemented in all versions
- Use an array for perfmons instead of hash table
- Implement performance counters in CSD
v3 (Iago):
- Rename PERFMON_CHUNKS to PERFMONS_ALLOC_SIZE.
- Merge increasing lastid and ensuring space in a single function.
v4 (Iago):
- Assert perfid <= perfmons_size.
v7 (Iago):
- Do not stop perfmon on each submission
v8 (Iago):
- Add comment about stopping the perfmon when retrieving values.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10666>
v2:
- Read mustpass files from vk-default.txt (Matt)
- Remove freedreno atomic geom tests from fail list (Emma)
- Move freedreno flake to separated line (Emma)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12069>
Failure to create a buffer for scanout should not be fatal when
importing a buffer. Buffers allocated from a render-only device may not
be able to scanned out directly but can still be used for other
rendering purposes (e.g. as a texture).
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12081>
All vulkan drivers have been copying anv's code to convert
VkSpecializationInfo into nir_spirv_specialization.
Recently there was a Vulkan spec change on allowed values for
VkSpecializationInfo, and all drivers got affected.
This commits creates a new helper, and uses it on all Vulkan Mesa
drivers.
v2: use (uint8_t*) castings, instead of void*, to avoid C2036 with
MSVC (detected by the CI, inspired on what radv was doing)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12047>
When multiview is enabled, queries must begin and end in the
same subpass and N consecutive queries are implicitly used,
where N is the number of views enabled in the subpass.
Implementations decide how results are split across queries.
In our case, only one query is really used, but we still need
to flag all N queries as available by the time we flag the one
we use so that the application doesn't receive unexpected errors
when trying to retrieve values from them.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>