This avoids re-calculating the exec mask for ES vertices,
and makes it unnecessary to count the number of vertices left.
Fossil DB results on Sienna Cichlid (with NGGC on):
Totals from 58239 (45.27% of 128647) affected shaders:
CodeSize: 166521108 -> 166356072 (-0.10%); split: -0.10%, +0.00%
Instrs: 31961308 -> 31920041 (-0.13%); split: -0.13%, +0.00%
Latency: 138820463 -> 138815742 (-0.00%); split: -0.04%, +0.04%
InvThroughput: 22460177 -> 22459553 (-0.00%); split: -0.00%, +0.00%
SClause: 753744 -> 753746 (+0.00%)
Copies: 3093140 -> 3226647 (+4.32%); split: -0.03%, +4.34%
No Fossil DB changes with NGGC off.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.
Fossil DB results on Sienna Cichlid (with NGGC on):
Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%
No Fossil DB changed with NGGC off.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
We were trying to calculate how much space they need, That was already
difficult and one of the most opaque and hard-to-verify uses of sub_cs,
but it will become even more difficult with the 3D path. What's worse is
that sometimes we have to touch that path when we start touching
registers that would affect rasterization, and there's no indication
that you have to then recalculate the size etc. Just rip this out and
start keeping a separate CS for it instead. Note that this adds a small
amount of memory wastage and extra buffers (at worst one buffer per
command buffer).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12102>
Now that the vertices attributes are in RAM, we can easily
compute their hash and compare to earlier vertices (in the
same list so they have compatible vertex_size).
We can't do that for list that will be executed using
loopback because the replay code ignore the index buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11927>
Previously vertices were uploaded on-the-fly: each time
the position attribute was set, the newly added vertex
was copied to the mapped bo.
Replace this with a plain RAM buffer, and do the upload
at the end of compile_vertex_list.
This allows to remove the we-need-to-unmap-the-buffer-
before-drawing special case, but more importantly it
will allow to implement vertices deduplication in the
next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11927>
Two threads enter and see *id == 0. Both threads update the value.
Upon returning, one of the threads might see the overwritten value some
of the time and the updated value other times. Use cmpxchg to ensure
that there's only ever one value written to *id.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
There are two problems with the current architecture.
In OpenGL, the id is supposed to be a unique identifier for a particular
log source. This is done so that applications can (theoretically)
filter particular log messages. The debug callback infrastructure in
Mesa assigns a uniqe value when a value of 0 is passed in. This causes
the id to get set once to a unique value for each message.
By passing a stack variable that is initialized to 0 on every call,
every time the same message is logged, it will have a different id.
This isn't great, but it's also not catastrophic.
When threaded shader compiles are used, the id *pointer* is saved and
dereferenced at a possibly much later time on a possibly different
thread. This causes one thread to access the stack from a different
thread... and that stack frame might not be valid any more. :(
I have not observed any crashes related to this particular issue.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
There are two problems with the current architecture.
In OpenGL, the id is supposed to be a unique identifier for a particular
log source. This is done so that applications can (theoretically)
filter particular log messages. The debug callback infrastructure in
Mesa assigns a uniqe value when a value of 0 is passed in. This causes
the id to get set once to a unique value for each message.
By passing a stack variable that is initialized to 0 on every call,
every time the same message is logged, it will have a different id.
This isn't great, but it's also not catastrophic.
When threaded shader compiles are used, the id *pointer* is saved and
dereferenced at a possibly much later time on a possibly different
thread. This causes one thread to access the stack from a different
thread... and that stack frame might not be valid any more. :(
This fixes shader-db crashes of various kinds on Iris with threaded
shader compiles enabled.
Fixes: 42c34e1ac8 ("iris: Enable threaded shader compilation")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
Similar to the fix in 6bf8e960fa ("pan/bi: Do helper termination
analysis on clauses")
Though apparently a "theoretical issue only", fixes artefacts in
DarkPlaces with both D3D9 and GL renderers.
Fixes: 9a7f0e268b ("pan/mdg: Use the helper invo analyze passes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12156>
While char is signed on macOS AArch64, on Linux it is unsigned. This
means it cannot represent the -1 return value of getopt_long.
Change the type of `c` to int, the type that getopt_long returns, so
that the -1 will be kept intact and can be checked for.
Fixes: c6be4f85e3 ("pan/bi: Use getopt for bifrost_compiler")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12153>
Both Panfrost and the DDK add padding zero bytes to the end of
shaders, so we can use this instead of the end-of-shader clause for
checking whether to stop disassembling.
Shaders can have end-of-shader clauses partway through; these shaders
will now be completely disassembled instead of cut off at the first
end-of-shader clause.
A tag byte of zero is an invalid encoding, so unlike the previous
version of this test only check the first word.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12153>
Rather than just printing an offset such as '(pc + 192)', print the
target of branches as a clause number that matches up with the clause
headers printed by disassemble_bifrost.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12153>
v2:
- Read mustpass files from vk-default.txt (Matt)
- Remove freedreno atomic geom tests from fail list (Emma)
- Move freedreno flake to separated line (Emma)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12069>
Otherwise, it might not be correctly initialized for Android.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12103>
Mapping lentgh must be a multiple of 'nonCoherentAtomSize' bytes
when using VK_WHOLE_SIZE in vkFlushMappedMemoryRanges.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12131>
There's no need for us to do this lowering ourselves while emitting
code, when there's already a helper that can do this for us that we're
even using. Let's just set the right flag, and not worry about
projectors any more.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12126>
The existing code handled the case where the new definition of the
same field was larger than the old one.
This commit adds a check to handle the reverse case: the new def
is smaller than the old one (= so writing using the merged macro
would affect the next fields).
The affected fields are:
* LGKM_CNT (in SQ_WAVE_IB_STS)
* DONUT_SPLIT (in VGT_TESS_DISTRIBUTION)
* HEAD_QUEUE (in GDS_GWS_RESOURCE)
DONUT_SPLIT is the only one used by radeonsi/radv.
Fixes: e6184b0892 ("amd/registers: scripts for processing register descriptions in JSON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12063>
Add 1 to the key index otherwise we hit the following assert
in hash_table_insert:
assert(!key_pointer_is_reserved(ht, key));
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12105>
GitLab doesn't merge the rules array from a job that is extended, so we
were missing the changes rules.
To avoid this, create a .freedreno-rules-restricted job that includes
the changes rules and the restricted user checks.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Fixes: 92f9141f00 ("ci/freedreno: Test with non-redistributable traces")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5139
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12122>
Otherwise there is an uninitialized read, and the register allocation
will fail. (In the sense of failing a precondition. This manifests as
synthetic interference leading to higher register pressure and useless
moves. The allocation itself is ok, but it indicates a real bug.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12130>
When Valhall lands, we'll need to be more methodical about this. In the
mean time, this gets validation passing on
KHR-GLES31.core.compute_shader.atomic-case3 which was crashing in RA and
now again passes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12130>
This is much more maintainable, extensible, and easy to read than
hand-rolled structs approximating assembly. This also removes the last
use of the old hand-written packing structs. There are a few minor
differences:
- The shaders are larger because ir3 currently doesn't support (rpt),
which means that some shaders are larger than one instrlen and the
current logic has to be extended to allow for that. This seems a small
price to pay, ir3 will gain support for (rpt) eventually, and we
shouldn't have limitations like this baked in anyway. For example some
GL blob r8g8 <-> r16 copy shaders are apparently quite large.
- Due to the inability to switch inputs/outputs on the fly, we need to
split the VS into two variants. I made the layer-writing variant also
used for other clears, because the old method of overloading c0.z/c1.z
to mean both "src x coordinate" and "z clear value" in the same shader
seemed too clever and I didn't want to add yet another variant. This
means that non-layered clears will also write the layer (to 0), but
that shouldn't be a big deal performance-wise.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12079>
If the passed VkPipelineRasterizationLineStateCreateInfoEXT wasn't zero
initialized, we copy garbage values that are later on used to set the
state and may end up crashing when they are beyond the limits of the HW.
v2 (Lionel): Simplify if condition
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12121>
Specify gpu id with --gpu-id or marketing name with --gpu. Still have
compile/disasm as commands, but allow -v for verbose printing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12134>
Failure to create a buffer for scanout should not be fatal when
importing a buffer. Buffers allocated from a render-only device may not
be able to scanned out directly but can still be used for other
rendering purposes (e.g. as a texture).
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12081>