Commit Graph

2034 Commits

Author SHA1 Message Date
Timur Kristóf abcd0aa9e5 aco: Remove trailing whitespace.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Martin Roukala <martin.roukala@mupuf.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16702>
2022-05-25 12:29:30 +00:00
Timur Kristóf 7761b4d89e aco: Fix scratch with task shaders.
Task shaders work like compute shaders, their scratch pointer
is currently located at the first two user SGPRs.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16692>
2022-05-24 12:33:49 +00:00
Rhys Perry 4513cb8d41 aco: only add/subtract low bits of program addresses
fossil-db (Sienna Cichlid):
Totals from 4007 (2.47% of 162293) affected shaders:
Instrs: 3733239 -> 3728018 (-0.14%)
CodeSize: 20770340 -> 20749456 (-0.10%)
Latency: 46883958 -> 46872764 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 10550392 -> 10548698 (-0.02%); split: -0.02%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16460>
2022-05-23 11:52:54 +00:00
Rhys Perry 69d1f4186a aco/tests: add test for p_constaddr with a non-zero offset
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16460>
2022-05-23 11:52:54 +00:00
Rhys Perry bd8f8dda8c aco: fix p_constaddr with a non-zero offset
Seems this broke a while ago and we never noticed.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 0af7ff49fd ("aco: lower p_constaddr into separate instructions earlier")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16460>
2022-05-23 11:52:54 +00:00
Samuel Pitoiset 95d4e5435b radv: export implicit primitive ID in NIR for legacy VS or TES
It's implicit for VS or TES, while it's required for GS or MS.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16404>
2022-05-20 14:55:05 +00:00
Marek Olšák 2443054932 amd: rename fishes to Navi21, Navi22, Navi23, Navi24, and Rembrandt
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16604>
2022-05-19 11:55:50 +00:00
Samuel Pitoiset 0c8a07f25d aco: remove unnecessary intrinsics that are lowered at the ABI level
Fixes: f553076eaf ("aco: Remove now-superfluous intrinsics.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16577>
2022-05-19 06:49:07 +00:00
Samuel Pitoiset 8510d5daa3 aco: use ac_is_llvm_processor_supported() for checking LLVM asm support
It seems more universal but it's needed to create a temporary TM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16494>
2022-05-17 17:14:21 +00:00
Dave Airlie a179e1aede aco/radv: drop radv_nir_compiler_options from aco.
Add a new aco input and options structs, then convert from radv
pieces on submit.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16521>
2022-05-17 06:15:25 +00:00
Samuel Pitoiset 534cc99081 aco: do not emit the primitive ID twice for NGG VS or TES with GS
The primitive ID is required to be exported by the GS stage, so this
should only be needed for NGG VS or TES without a GS stage. Otherwise,
it's exported twice.

No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16498>
2022-05-16 09:08:49 +00:00
Marek Olšák 39800f0fa3 amd: change chip_class naming to "enum amd_gfx_level gfx_level"
This aligns the naming with PAL.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pellou-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16469>
2022-05-13 14:56:22 -04:00
Samuel Pitoiset 6d53922863 radv,aco: add a workaround for binding 2D views of a 3D image on GFX9
The hardware can't bind a slice of a 3D image as a 2D image, GFX10+
introduced ARRAY_PITCH to allow this without a shader workaround.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16294>
2022-05-13 17:53:25 +00:00
Samuel Pitoiset 76356ed208 aco: remove unreachable code about viewport index/layer and mesh shaders
If the mesh shaders exports the viewport index or the layer, the value
can't be NULL, and it should be implicitly zero.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16438>
2022-05-13 14:01:54 +00:00
Samuel Pitoiset 27f1da8215 radv,aco: do not implicitly export the primitive ID for mesh shaders
From the Vulkan spec:

    "VUID-VkGraphicsPipelineCreateInfo-PrimitiveId-06264
     If the pipeline is being created with pre-rasterization shader
     state, it includes a mesh shader and the fragment shader code
     reads from an input variable that is decorated with PrimitiveId,
     then the mesh shader code must write to a matching output variable,
     decorated with PrimitiveId, in all execution paths"

So, if PS uses PrimitiveID, MS must export it (like GS).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16438>
2022-05-13 14:01:54 +00:00
Samuel Pitoiset 07954a8fd6 aco: only retrieve the scratch offset when it's declared
This allows to run most of the fossils we have right now. I will fix
up scratch in upcoming patches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 5a119f15aa radv,aco: export alpha-to-coverage via MRTZ on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset b6ba4cb407 aco: do not set COMPR for exports but use 0x3 channel mask on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset a6e1445d5f aco: do not set RESOURCE_LEVEL for buffer descriptors on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 7647097d3d aco: update waitcnt on GFX11
Not sure if the vmcnt field can use more than 0x3f bits.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 765aa36b9d aco: update LDS allocation granularity for PS on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset a284b677ba aco: do not set GLC stores on GFX11
It has no effect.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset eea15d6706 aco: do not set DLC for loads on GFX11
It means something different.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset bc8da20dda aco: export MRT0 instead of NULL on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 42ef89e8db radv,aco: use the new TCS WaveID SGPR to compute vs_rel_patch_id on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 432cde7f00 radv,aco: add support for packed threadID VGPRs on GFX11
Thread ID are packed in one VGPR with 10 bits each.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 52952f51cd aco: do not align VGPRS to 8 or 16 on GFX11
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Samuel Pitoiset 0cb1b12ec0 aco: recognize GFX11 in few places
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>
2022-05-12 15:46:20 +00:00
Konstantin Seurer b30f96dd93 radv,aco: Use ray_launch_size_addr
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15712>
2022-05-12 15:04:31 +00:00
Dave Airlie 24176cae55 aco: drop unused radv include
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie c44d5d61ce aco: remove radv vs prolog key from aco internals.
This creates an aco specific key, and converts radv to it.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie 04c07a2413 aco/radv: convert to aco shader info at the radv level.
This removes the radv shader info type from aco completely.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie 199edce84d aco/info: add some more fields.
These fields are also used in aco.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie 8cfd8420ab aco: convert vs and so info over to aco structs.
This renames the vs to vp (vertex pipeline) on the way past.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie 3dba3458e9 aco: remove radv specific streamout info
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie 9bd89af1bc aco/info: reduce the gs ring info to what is needed.
Only one member was being used.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie 87df607ff5 aco: move to a minimal aco shader info struct.
This should be kept to only things aco uses, and expanded when
radeonsi support is added. Things should be removed if lowered in NIR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Dave Airlie a2701bfdb8 aco: move info pointer to a copy.
This is just setup to move this to a different struct later.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Timur Kristóf f553076eaf aco: Remove now-superfluous intrinsics.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13155>
2022-05-10 17:16:03 +00:00
Rhys Perry cc410dd4d1 aco: fix cmpswap global atomic definition on GFX6
Missed this one.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 2f0bb39e16 ("aco: ensure that definitions fixed to operands have matching regclasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16367>
2022-05-10 01:01:13 +00:00
Rhys Perry 152092b8ea aco: skip s_barrier if TCS patches are within subgroup
fossil-db (Sienna Cichlid):
Totals from 518 (0.32% of 162293) affected shaders:
Instrs: 124943 -> 123908 (-0.83%)
CodeSize: 708764 -> 704624 (-0.58%)
Latency: 618380 -> 618279 (-0.02%)
InvThroughput: 214061 -> 214051 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16356>
2022-05-09 16:30:27 +00:00
Rhys Perry 359e60cf5e aco: split load_sbt_amd result
fossil-db (Sienna Cichlid):
Totals from 11 (0.01% of 162293) affected shaders:
Instrs: 47857 -> 47738 (-0.25%)
CodeSize: 261556 -> 261080 (-0.18%)
Latency: 1176822 -> 1176245 (-0.05%)
InvThroughput: 784549 -> 784165 (-0.05%)
Copies: 5959 -> 5840 (-2.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16203>
2022-05-06 15:15:13 +00:00
Daniel Schürmann d70688492c aco/optimizer: re-combine and copy-propagate p_create_vector(p_split_vector)
Totals from 309 (0.23% of 134913) affected shaders: (GFX10.3)
CodeSize: 1853812 -> 1857732 (+0.21%); split: -0.05%, +0.27%
Instrs: 340810 -> 341789 (+0.29%); split: -0.07%, +0.36%
Latency: 3301814 -> 3301774 (-0.00%); split: -0.02%, +0.02%
InvThroughput: 590473 -> 590914 (+0.07%); split: -0.00%, +0.08%
Copies: 28751 -> 29731 (+3.41%); split: -0.87%, +4.28%
PreSGPRs: 14010 -> 14028 (+0.13%); split: -0.01%, +0.14%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15414>
2022-05-06 14:52:07 +00:00
Daniel Schürmann 5e6e47ecea aco/ra: improve split_vector register assignment if the operand is not killed
This allows for more coalescing when lowering the copies.

Totals from 44801 (33.21% of 134913) affected shaders: (GFX10.3)
VGPRs: 1513264 -> 1513248 (-0.00%)
CodeSize: 113354240 -> 113172872 (-0.16%); split: -0.16%, +0.00%
Instrs: 21648793 -> 21603397 (-0.21%); split: -0.21%, +0.00%
Latency: 95762290 -> 95757403 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 15427354 -> 15427341 (-0.00%); split: -0.00%, +0.00%
Copies: 2065330 -> 2019933 (-2.20%); split: -2.20%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15414>
2022-05-06 14:52:07 +00:00
Daniel Schürmann 499dc20e6a aco: don't re-create vectors for load_barycentric_* intrinsics
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15414>
2022-05-06 14:52:07 +00:00
Rhys Perry 2f0bb39e16 aco: ensure that definitions fixed to operands have matching regclasses
If the operand is not killed, the definition needs to be large enough so
that the new location for the operand does not intersect with the old
location.

Fixes with zink:
KHR-GL45.shader_image_load_store.basic-allTargets-atomicCS
KHR-GL45.shader_image_load_store.basic-allTargets-atomicGS
KHR-GL45.shader_image_load_store.basic-allTargets-atomicVS

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6276
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16106>
2022-05-05 19:56:48 +01:00
Rhys Perry 1b639a0ce5 aco/ra: fix vgpr_limit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: b98a4d4dd7 ("aco: refactor GPR limit calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16297>
2022-05-04 11:12:13 +00:00
Georg Lehmann 69cceab718 aco: Remove D16 zero components from image stores.
No foz-db changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15179>
2022-05-04 09:58:03 +00:00
Georg Lehmann 7a6dbe0c77 aco: Implement image_load d16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15179>
2022-05-04 09:58:03 +00:00
Georg Lehmann 7ffcaf9187 aco: Implement image_store d16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15179>
2022-05-04 09:58:03 +00:00
Joshua Ashton ce7966fcb4 aco: Use movk for AddressHi bits in vertex prolog
Eliminates a useless extra dword by transforming
        s_mov_b32 s47, 0xffff8000                                   ; beaf03ff ffff8000
to
        s_movk_i32 s47, 0x8000                                      ; b02f8000

which does the same thing.

Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15415>
2022-05-03 10:15:36 +00:00
Daniel Schürmann 58bd9a379e aco/ra: fix live-range splits of phi definitions
It could happen that at the time of a live-range split,
a phi was not yet placed in the new instruction vector,
and thus, instead of renamed, a new phi was created.

Fixes: dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i8vec2
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16248>
2022-05-03 09:54:30 +00:00
Daniel Schürmann 12d7f911c9 aco/optimizer: prevent any overflow between SGPR and const offset on MUBUF
Apparently, if the SGPR offset + const offset overflows,
it doesn't work.

Totals from 145 (0.11% of 134913) affected shaders: (GFX10.3)
SpillSGPRs: 134 -> 104 (-22.39%)
CodeSize: 1632676 -> 1645916 (+0.81%); split: -0.03%, +0.84%
Instrs: 316920 -> 320252 (+1.05%); split: -0.01%, +1.07%
Latency: 1456285 -> 1459686 (+0.23%); split: -0.02%, +0.25%
InvThroughput: 165785 -> 166086 (+0.18%); split: -0.02%, +0.20%
VClause: 6815 -> 6875 (+0.88%); split: -0.03%, +0.91%
SClause: 19089 -> 19079 (-0.05%); split: -0.06%, +0.01%
PreSGPRs: 7302 -> 7304 (+0.03%); split: -0.01%, +0.04%

Fixes: KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15866>
2022-04-29 15:57:57 +00:00
Daniel Schürmann d5dc0c0392 aco: adjust num_waves for LDS before scheduling
Totals from 67 (0.05% of 134913) affected shaders: (GFX10.3)
VGPRs: 2024 -> 2136 (+5.53%); split: -0.40%, +5.93%
CodeSize: 162364 -> 162348 (-0.01%); split: -0.08%, +0.07%
MaxWaves: 1882 -> 1816 (-3.51%); split: +0.11%, -3.61%
Instrs: 29176 -> 29162 (-0.05%); split: -0.09%, +0.04%
Latency: 329984 -> 327272 (-0.82%); split: -0.88%, +0.06%
InvThroughput: 54653 -> 54672 (+0.03%); split: -0.01%, +0.04%
VClause: 782 -> 761 (-2.69%); split: -2.81%, +0.13%
SClause: 833 -> 824 (-1.08%); split: -2.28%, +1.20%
Copies: 1872 -> 1873 (+0.05%); split: -0.37%, +0.43%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Daniel Schürmann 8d8c59b4cd aco: split num_waves adjustment into separate function
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Daniel Schürmann 6220046ad1 aco: remove 'max_waves' and use 'num_waves' to adjust for LDS and workgroup size
Totals from 21 (0.02% of 134913) affected shaders: (GFX10.3)
VGPRs: 1024 -> 1176 (+14.84%)
CodeSize: 127824 -> 127664 (-0.13%); split: -0.17%, +0.04%
MaxWaves: 416 -> 378 (-9.13%)
Instrs: 22521 -> 22502 (-0.08%); split: -0.17%, +0.09%
Latency: 146386 -> 143154 (-2.21%); split: -2.21%, +0.00%
InvThroughput: 28379 -> 28944 (+1.99%); split: -0.23%, +2.22%
VClause: 575 -> 579 (+0.70%); split: -0.87%, +1.57%
SClause: 692 -> 645 (-6.79%)
Copies: 780 -> 747 (-4.23%); split: -4.74%, +0.51%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Samuel Pitoiset 6873da0e42 aco: fix load_barycentric_at_{sample,offset} on GFX6-7
The computation was wrong.

Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*
with Zink on GFX6 (Pitcairn).

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16099>
2022-04-25 07:17:39 +00:00
Jason Ekstrand 1b8a43a0ba util: Remove util_cpu_detect
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:

...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.

As a consequence, this unit test:

...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.

This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.

Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called.  This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.

Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
2022-04-20 18:44:35 +00:00
Georg Lehmann d12b5e7633 aco: Reuse previous -1 result in find_msb to avoid using VOP3.
Totals:
CodeSize: 388934388 -> 388933712 (-0.00%)

Totals from 208 (0.15% of 134913) affected shaders:
CodeSize: 2008016 -> 2007340 (-0.03%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16011>
2022-04-19 15:18:58 +00:00
Georg Lehmann a8b29094c2 aco: Remove some old comments in aco_opcodes.py.
s_cmovk_i32 isn't GFX8_GFX9 only and s_version doesn't need a comment to say
it's GFX10+ exclusive. The encoding list is enough to provide this information,
as for other GFX10+ instructions.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16006>
2022-04-18 15:59:38 +00:00
Rhys Perry 63e40adf8c aco: fix disassembly of SMEM with both SGPR and constant offset
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15890>
2022-04-14 20:58:36 +00:00
Rhys Perry c883abda76 aco: implement load_shared2_amd/store_shared2_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778>
2022-04-13 23:08:07 +00:00
Rhys Perry 5aa5af7776 aco: handle read2st64/write2st64 in optimizer
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778>
2022-04-13 23:08:07 +00:00
Rhys Perry 2135c88d9c aco: fix signedness of DS_instruction::offset0/1
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778>
2022-04-13 23:08:07 +00:00
Daniel Schürmann d703a0e808 aco: remove register hints entirely
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 2fe005a3fe aco: remove occurences of VCC hint
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann b10c4d7dee aco: make program->needs_vcc independent of VCC hints
Totals from 5 (0.00% of 135048) affected shaders: (GFX9)
SGPRs: 208 -> 160 (-23.08%)
CodeSize: 2700 -> 2692 (-0.30%)
Instrs: 533 -> 531 (-0.38%)
Latency: 41688 -> 41680 (-0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 415a3820fc aco/ra: omit VCC affinity on VOPC_SDWA for GFX9+
VOPC_SDWA can also use arbitrary SGPR pairs on GFX9+.

Totals from 5607 (4.16% of 134913) affected shaders: (GFX10.3)
CodeSize: 42470760 -> 42452988 (-0.04%)
Instrs: 7943174 -> 7942883 (-0.00%)
Latency: 102887029 -> 102886305 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 20454456 -> 20454338 (-0.00%); split: -0.00%, +0.00%
Copies: 376818 -> 376865 (+0.01%); split: -0.00%, +0.01%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 6ebc61d71b aco/ra: create VCC-affinities during RA
instead of using register hints.

Totals from 88367 (65.50% of 134913) affected shaders: (GFX10.3)
CodeSize: 322492184 -> 322252912 (-0.07%); split: -0.08%, +0.01%
Instrs: 60615809 -> 60541260 (-0.12%); split: -0.12%, +0.00%
Latency: 557067980 -> 557009210 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 109676757 -> 109674804 (-0.00%); split: -0.00%, +0.00%
SClause: 1939703 -> 1939924 (+0.01%); split: -0.01%, +0.02%
Copies: 4557567 -> 4487530 (-1.54%); split: -1.54%, +0.00%
Branches: 1941123 -> 1937453 (-0.19%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 44fb9ba84a aco/ra: only use VCC if program->needs_vcc == true
A future commit will make VCC register assignment independent
from register hints. Up to GFX9, VCC can alternatively be used
as regular SGPR, so prevent overlap.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Rhys Perry 7478b00c7c aco: remove old global access intrinsics
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 9d1bab3615 aco: increase global_load_params.max_const_offset_plus_one
The callback now supports this. This shouldn't have any effect yet except
on GFX6 with 12 byte loads.

fossil-db (Pitcairn):
Totals from 246 (0.18% of 135668) affected shaders:
VGPRs: 14684 -> 14768 (+0.57%); split: -0.44%, +1.01%
CodeSize: 1765792 -> 1738040 (-1.57%)
Instrs: 344605 -> 340055 (-1.32%)
Latency: 4892904 -> 4861942 (-0.63%)
InvThroughput: 2479599 -> 2446070 (-1.35%)
VClause: 8782 -> 8735 (-0.54%)
SClause: 9854 -> 9853 (-0.01%)
Copies: 47327 -> 45401 (-4.07%); split: -4.08%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 3e9517c757 aco: implement _amd global access intrinsics
fossil-db (Sienna Cichlid):
Totals from 7 (0.01% of 134621) affected shaders:
VGPRs: 760 -> 776 (+2.11%)
CodeSize: 222000 -> 222044 (+0.02%); split: -0.01%, +0.03%
Instrs: 40959 -> 40987 (+0.07%); split: -0.01%, +0.08%
Latency: 874811 -> 886609 (+1.35%); split: -0.00%, +1.35%
InvThroughput: 437405 -> 443303 (+1.35%); split: -0.00%, +1.35%
VClause: 1242 -> 1240 (-0.16%)
SClause: 1050 -> 1049 (-0.10%); split: -0.19%, +0.10%
Copies: 4953 -> 4973 (+0.40%); split: -0.04%, +0.44%
Branches: 1947 -> 1957 (+0.51%); split: -0.05%, +0.56%
PreVGPRs: 741 -> 747 (+0.81%)

fossil-db changes seem to be noise.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 391bf3ea30 aco: don't expand smem/mubuf global loads
For example, dwordx3->dwordx4 or ubyte3->dwordx2.

Global loads don't have the bounds checking that buffer loads have that
makes this safe.

The alignment checks are added to global_load_callback() in case
byte_align_loads=false, align=1 and bytes_needed=3. Without them, the
callback will create a dword load.

fossil-db (Sienna Cichlid):
Totals from 267 (0.20% of 134621) affected shaders:
CodeSize: 1603352 -> 1606568 (+0.20%)
Instrs: 294946 -> 295482 (+0.18%); split: -0.00%, +0.18%
Latency: 2997003 -> 2997052 (+0.00%); split: -0.02%, +0.02%
InvThroughput: 526645 -> 526659 (+0.00%)
SClause: 9179 -> 9185 (+0.07%); split: -0.02%, +0.09%
Copies: 25363 -> 25375 (+0.05%); split: -0.08%, +0.13%
Branches: 8298 -> 8299 (+0.01%)

fossil-db (Polaris10):
Totals from 267 (0.20% of 135668) affected shaders:
CodeSize: 1636672 -> 1638756 (+0.13%); split: -0.00%, +0.13%
Instrs: 308484 -> 308733 (+0.08%); split: -0.01%, +0.09%
Latency: 3446045 -> 3446904 (+0.02%); split: -0.00%, +0.03%
InvThroughput: 1206722 -> 1206828 (+0.01%); split: -0.00%, +0.01%
SClause: 9308 -> 9311 (+0.03%); split: -0.08%, +0.11%
Copies: 36933 -> 36921 (-0.03%); split: -0.08%, +0.05%

fossil-db (Pitcairn):
Totals from 275 (0.20% of 135668) affected shaders:
SGPRs: 17616 -> 17520 (-0.54%); split: -0.64%, +0.09%
VGPRs: 15428 -> 15540 (+0.73%); split: -0.23%, +0.96%
CodeSize: 1885792 -> 1929120 (+2.30%); split: -0.00%, +2.30%
MaxWaves: 1284 -> 1285 (+0.08%)
Instrs: 368963 -> 376095 (+1.93%); split: -0.00%, +1.94%
Latency: 5122922 -> 5168398 (+0.89%); split: -0.01%, +0.90%
InvThroughput: 2562866 -> 2604279 (+1.62%)
VClause: 9268 -> 9296 (+0.30%); split: -0.13%, +0.43%
SClause: 10702 -> 10705 (+0.03%); split: -0.05%, +0.07%
Copies: 48620 -> 50629 (+4.13%); split: -0.08%, +4.21%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 6baad09711 aco: use saddr for global access with sgpr address
fossil-db (Sienna Cichlid):
Totals from 38 (0.03% of 134621) affected shaders:
CodeSize: 237196 -> 237060 (-0.06%); split: -0.09%, +0.03%
Instrs: 43895 -> 43894 (-0.00%); split: -0.02%, +0.01%
Latency: 914633 -> 916263 (+0.18%); split: -0.01%, +0.19%
InvThroughput: 468215 -> 468971 (+0.16%); split: -0.02%, +0.18%
SClause: 1239 -> 1242 (+0.24%)
PreSGPRs: 997 -> 1003 (+0.60%)
PreVGPRs: 936 -> 923 (-1.39%); split: -1.50%, +0.11%

Regression seems to be RA noise, creating a waitcnt.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry d957730b9b aco: use vcc for 64-bit vgpr addition
fossil-db (Sienna Cichlid):
Totals from 229 (0.17% of 134621) affected shaders:
CodeSize: 1520192 -> 1517644 (-0.17%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Timur Kristóf b874a2ed61 aco: Fix VOP2 instruction format in visit_tex.
There was a v_or_b32 that accidentally used SOP2.
It should use VOP2.

Issue found by looking at a gfxreconstruct trace posted by a user
in this bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5838

Cc: mesa-stable
Fixes: 93c8ebfa78 "aco: Initial commit of independent AMD compiler"

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15923>
2022-04-13 13:06:49 +00:00
Rhys Perry 773c7cbcbc radv,aco: implement 64-bit inline push constants
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 1932 -> 1560 (-19.25%)
Instrs: 357 -> 303 (-15.13%)
Latency: 6576 -> 5883 (-10.54%)
InvThroughput: 26304 -> 23532 (-10.54%)
SClause: 42 -> 24 (-42.86%)
Copies: 90 -> 105 (+16.67%); split: -10.00%, +26.67%
PreSGPRs: 144 -> 201 (+39.58%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Rhys Perry 7f6262bb85 radv: allow holes in inline push constants
Use a dword mask instead of a range to track which push constants to
inline.

fossil-db (Sienna Cichlid):
Totals from 5724 (4.25% of 134621) affected shaders:
CodeSize: 20894044 -> 20815748 (-0.37%); split: -0.39%, +0.02%
Instrs: 4002568 -> 3988385 (-0.35%); split: -0.38%, +0.02%
Latency: 29285060 -> 29224414 (-0.21%); split: -0.22%, +0.01%
InvThroughput: 5529700 -> 5526893 (-0.05%); split: -0.05%, +0.00%
VClause: 78093 -> 78240 (+0.19%); split: -0.23%, +0.41%
SClause: 135495 -> 131027 (-3.30%); split: -3.30%, +0.00%
Copies: 330856 -> 324552 (-1.91%); split: -2.37%, +0.46%
PreSGPRs: 226031 -> 224778 (-0.55%); split: -0.61%, +0.05%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Erik Faye-Lund 28dbabec8e aco: do not use designated initializers
Designated initializers are a C++20 feature, but we don't use C++20. In
fact, enabling C++20 for ACO triggers new compiler errors due to some
equality semantics details.

So let's instead stop using designated initializers here.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
2022-04-05 16:58:56 +00:00
Rhys Perry 5b4e41e4db aco: don't use v_mad_mix on GFX9 if 16-bit denormals must be preserved
This probably effectively disables the v_mad_mix optimization on GFX9.

fossil-db (Vega):
Totals from 11545 (7.15% of 161366) affected shaders:
MaxWaves: 43025 -> 42780 (-0.57%); split: +0.06%, -0.63%
Instrs: 18571635 -> 18734201 (+0.88%); split: -0.00%, +0.88%
CodeSize: 96483568 -> 96611012 (+0.13%); split: -0.11%, +0.24%
SGPRs: 1079056 -> 1077616 (-0.13%); split: -0.14%, +0.01%
VGPRs: 819248 -> 821868 (+0.32%); split: -0.04%, +0.36%
SpillSGPRs: 13313 -> 12464 (-6.38%)
Latency: 293804093 -> 295046122 (+0.42%); split: -0.09%, +0.51%
InvThroughput: 110002239 -> 110994978 (+0.90%); split: -0.03%, +0.93%
VClause: 342458 -> 342596 (+0.04%); split: -0.12%, +0.16%
SClause: 648566 -> 648046 (-0.08%); split: -0.12%, +0.04%
Copies: 1728225 -> 1726679 (-0.09%); split: -0.66%, +0.57%
Branches: 552973 -> 552963 (-0.00%); split: -0.02%, +0.02%
PreSGPRs: 862360 -> 856820 (-0.64%); split: -0.69%, +0.05%
PreVGPRs: 773689 -> 776818 (+0.40%); split: -0.02%, +0.42%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6178
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15718>
2022-04-04 19:27:12 +00:00
Daniel Schürmann 6d383159d4 aco/optimizer: check recursively if we can eliminate s_and exec
Totals from 2860 (2.12% of 134913) affected shaders: (GFX10.3)
CodeSize: 5990728 -> 5979164 (-0.19%); split: -0.20%, +0.01%
Instrs: 1094562 -> 1091653 (-0.27%); split: -0.28%, +0.01%
Latency: 8689841 -> 8684523 (-0.06%); split: -0.07%, +0.00%
InvThroughput: 1840533 -> 1840527 (-0.00%); split: -0.00%, +0.00%
SClause: 51437 -> 51439 (+0.00%)
Copies: 82461 -> 82472 (+0.01%)
PreSGPRs: 83136 -> 83172 (+0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15675>
2022-04-01 15:35:26 +02:00
Daniel Schürmann db8c401f71 aco/ra: fix stride check on subdword parallelcopies for create_vector
On GFX6/7, info.rc is in full dwords.

Fixes: 9476986e6f ('aco/ra: special-case get_reg_for_create_vector_copy()')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15393>
2022-03-31 08:55:07 +00:00
Samuel Pitoiset d32656bc65 radv: lower has_multiview_view_index in NIR
This lowering is done in a new NIR pass where the layer is written
before emit_vertex_with_counter for geometry shaders and after the
position for other vertex stages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15456>
2022-03-31 07:34:21 +00:00
Georg Lehmann 141ca78634 radv, aco: Packed iadd_sat/uadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Georg Lehmann 50f585254c aco: Implement scalar iadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Georg Lehmann 989f2b1785 aco: Implement 64bit uadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Georg Lehmann 01fad6aa72 aco: Remove 0 data components from image stores.
Image stores always write a full texel.
The hardware writes zero for components not included in dmask.

Totals from 387 (0.29% of 134913) affected shaders: (GFX10.3)
VGPRs: 17216 -> 17136 (-0.46%)
CodeSize: 1987652 -> 1981504 (-0.31%)
MaxWaves: 9054 -> 9058 (+0.04%)
Instrs: 361883 -> 361115 (-0.21%); split: -0.21%, +0.00%
Latency: 5383187 -> 5381227 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 1373830 -> 1372097 (-0.13%)
VClause: 9031 -> 9038 (+0.08%); split: -0.04%, +0.12%
Copies: 25923 -> 25153 (-2.97%); split: -2.98%, +0.01%
PreVGPRs: 14643 -> 14555 (-0.60%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14986>
2022-03-28 19:43:18 +00:00
Samuel Pitoiset 4cfb5332d6 radv: lower adjusting gl_FragCoord.z for VRS in NIR
fossils-db (Sienna Cichlid):
Totals from 4432 (3.29% of 134913) affected shaders:
VGPRs: 231232 -> 231880 (+0.28%)
CodeSize: 24738224 -> 24718008 (-0.08%); split: -0.08%, +0.00%
MaxWaves: 93120 -> 93000 (-0.13%)
Instrs: 4540970 -> 4541062 (+0.00%); split: -0.01%, +0.01%
Latency: 49658353 -> 49641444 (-0.03%); split: -0.05%, +0.01%
InvThroughput: 9604328 -> 9603041 (-0.01%); split: -0.02%, +0.01%
VClause: 66497 -> 66498 (+0.00%)
SClause: 209530 -> 209532 (+0.00%); split: -0.01%, +0.01%
Copies: 276135 -> 276249 (+0.04%); split: -0.14%, +0.18%
PreSGPRs: 189409 -> 189415 (+0.00%)
PreVGPRs: 207368 -> 207458 (+0.04%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15450>
2022-03-28 16:17:37 +02:00
Samuel Pitoiset a42b6a4d39 radv: lower load_sample_mask_in in NIR
No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15450>
2022-03-28 16:17:34 +02:00
Samuel Pitoiset 8c51874af4 radv,aco: lower color exports in NIR
fossils-db (Sienna Cichlid):
Totals from 27108 (20.09% of 134913) affected shaders:
VGPRs: 1260608 -> 1261424 (+0.06%); split: -0.00%, +0.07%
CodeSize: 112795868 -> 112785892 (-0.01%); split: -0.05%, +0.04%
MaxWaves: 628608 -> 628448 (-0.03%); split: +0.00%, -0.03%
Instrs: 20750003 -> 20749314 (-0.00%); split: -0.01%, +0.00%
Latency: 288088081 -> 288015865 (-0.03%); split: -0.06%, +0.04%
InvThroughput: 53944847 -> 53961693 (+0.03%); split: -0.01%, +0.04%
VClause: 396463 -> 396467 (+0.00%); split: -0.02%, +0.02%
SClause: 842088 -> 842150 (+0.01%); split: -0.03%, +0.04%
Copies: 1244982 -> 1259026 (+1.13%); split: -0.01%, +1.14%
PreSGPRs: 1251949 -> 1251909 (-0.00%)
PreVGPRs: 1099647 -> 1100879 (+0.11%); split: -0.03%, +0.14%

fossils-db (Polaris10):
Totals from 23928 (17.60% of 135960) affected shaders:
SGPRs: 1751792 -> 1751024 (-0.04%); split: -0.05%, +0.01%
VGPRs: 1098964 -> 1098556 (-0.04%); split: -0.13%, +0.09%
CodeSize: 99893472 -> 99837940 (-0.06%); split: -0.06%, +0.00%
MaxWaves: 138322 -> 138306 (-0.01%); split: +0.03%, -0.04%
Instrs: 19213995 -> 19211980 (-0.01%); split: -0.02%, +0.01%
Latency: 273026926 -> 273109402 (+0.03%); split: -0.01%, +0.04%
InvThroughput: 111160907 -> 111195187 (+0.03%); split: -0.04%, +0.07%
VClause: 343058 -> 343097 (+0.01%); split: -0.02%, +0.03%
SClause: 802756 -> 802884 (+0.02%); split: -0.04%, +0.06%
Copies: 1729387 -> 1739208 (+0.57%); split: -0.04%, +0.61%
PreSGPRs: 1090264 -> 1090303 (+0.00%); split: -0.00%, +0.01%
PreVGPRs: 959490 -> 960600 (+0.12%); split: -0.04%, +0.15%

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15263>
2022-03-28 11:47:35 +00:00
Rhys Perry 1ead285d92 aco: fix RA validation of 16-bit fma_mix operands
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15562>
2022-03-28 11:05:25 +01:00
Daniel Schürmann 007cb02db9 aco: use branch definition as scratch register for SSA lowering
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15505>
2022-03-28 07:36:46 +00:00
Daniel Schürmann 2d1e6b756e aco: remove 'high' parameter from can_use_opsel()
No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15551>
2022-03-25 22:02:50 +00:00
Daniel Schürmann b98a9dcc36 aco/optimizer: fix call to can_use_opsel() in apply_insert()
The definition index is -1.

Fixes: 54292e99c7 ('aco: optimize 32-bit extracts and inserts using SDWA ')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15551>
2022-03-25 22:02:50 +00:00
Boris Brezillon de039537da aco: Fix an MSVC warning
'warning C4804: '<<': unsafe use of type 'bool' in operation'

Fixes: 9934c86761 ("aco: use v_fma_mix to combine mul/add/fma input conversions")
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15518>
2022-03-24 09:11:13 +00:00
Rhys Perry 543a5487cf radv,aco: lower image descriptor loads in NIR
fossil-db (Sienna Cichlid):
Totals from 2926 (1.80% of 162293) affected shaders:
Instrs: 2315110 -> 2306644 (-0.37%); split: -0.37%, +0.00%
CodeSize: 12581592 -> 12546588 (-0.28%); split: -0.28%, +0.00%
VGPRs: 130216 -> 130208 (-0.01%)
SpillSGPRs: 477 -> 474 (-0.63%); split: -5.03%, +4.40%
Latency: 29686188 -> 29678804 (-0.02%); split: -0.05%, +0.02%
InvThroughput: 6926545 -> 6926286 (-0.00%); split: -0.02%, +0.02%
SClause: 73761 -> 72996 (-1.04%); split: -1.16%, +0.12%
Copies: 144068 -> 137279 (-4.71%); split: -4.78%, +0.07%
Branches: 47466 -> 47483 (+0.04%); split: -0.01%, +0.04%
PreSGPRs: 118042 -> 117377 (-0.56%); split: -1.34%, +0.77%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
2022-03-22 16:33:27 +00:00
Rhys Perry 15640e58d9 radv,aco: lower texture descriptor loads in NIR
fossil-db (Sienna Cichlid):
Totals from 39445 (24.30% of 162293) affected shaders:
MaxWaves: 875988 -> 875972 (-0.00%)
Instrs: 35372561 -> 35234909 (-0.39%); split: -0.41%, +0.03%
CodeSize: 190237480 -> 189379240 (-0.45%); split: -0.47%, +0.02%
VGPRs: 1889856 -> 1889928 (+0.00%); split: -0.00%, +0.01%
SpillSGPRs: 10764 -> 10857 (+0.86%); split: -2.04%, +2.91%
SpillVGPRs: 1891 -> 1907 (+0.85%); split: -0.32%, +1.16%
Scratch: 260096 -> 261120 (+0.39%)
Latency: 477701150 -> 477578466 (-0.03%); split: -0.06%, +0.03%
InvThroughput: 87819847 -> 87830346 (+0.01%); split: -0.03%, +0.04%
VClause: 673353 -> 673829 (+0.07%); split: -0.04%, +0.11%
SClause: 1385396 -> 1366478 (-1.37%); split: -1.65%, +0.29%
Copies: 2327965 -> 2229134 (-4.25%); split: -4.58%, +0.34%
Branches: 906707 -> 906434 (-0.03%); split: -0.13%, +0.10%
PreSGPRs: 1874153 -> 1862698 (-0.61%); split: -1.34%, +0.73%
PreVGPRs: 1691382 -> 1691383 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
2022-03-22 16:33:27 +00:00
Rhys Perry 52f850238a radv,aco: lower buffer descriptor loads in NIR
fossil-db (Sienna Cichlid):
Totals from 75420 (46.47% of 162293) affected shaders:
MaxWaves: 1878200 -> 1879228 (+0.05%); split: +0.06%, -0.00%
Instrs: 54021103 -> 54141370 (+0.22%); split: -0.04%, +0.26%
CodeSize: 287813520 -> 288293352 (+0.17%); split: -0.04%, +0.21%
VGPRs: 3267576 -> 3266296 (-0.04%); split: -0.04%, +0.00%
SpillSGPRs: 10445 -> 10904 (+4.39%); split: -0.31%, +4.70%
SpillVGPRs: 1818 -> 1811 (-0.39%); split: -1.05%, +0.66%
Scratch: 955392 -> 954368 (-0.11%)
Latency: 563477854 -> 562131282 (-0.24%); split: -0.31%, +0.08%
InvThroughput: 111860104 -> 111553968 (-0.27%); split: -0.30%, +0.02%
VClause: 958432 -> 961415 (+0.31%); split: -0.34%, +0.65%
SClause: 1917415 -> 1926952 (+0.50%); split: -0.69%, +1.19%
Copies: 3812945 -> 3916758 (+2.72%); split: -0.27%, +2.99%
Branches: 1611235 -> 1612022 (+0.05%); split: -0.04%, +0.08%
PreSGPRs: 3095505 -> 3126580 (+1.00%); split: -0.06%, +1.07%
PreVGPRs: 2773011 -> 2773013 (+0.00%)

Most regressions seem to be because ACO's convert_pointer_to_64_bit()
can't be CSE'd with radv_nir_apply_pipeline_layout()'s
convert_pointer_to_64_bit(). This should be improved by later commits.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
2022-03-22 16:33:27 +00:00
Rhys Perry 22050fcf9d radv,aco: lower vulkan_resource_index in NIR
fossil-db (Sienna Cichlid):
Totals from 31338 (19.31% of 162293) affected shaders:
MaxWaves: 758634 -> 758616 (-0.00%)
Instrs: 26398289 -> 26378282 (-0.08%); split: -0.09%, +0.01%
CodeSize: 141048208 -> 140971060 (-0.05%); split: -0.07%, +0.01%
VGPRs: 1373656 -> 1373736 (+0.01%)
SpillSGPRs: 9944 -> 9924 (-0.20%); split: -0.24%, +0.04%
SpillVGPRs: 1892 -> 1898 (+0.32%); split: -0.95%, +1.27%
Latency: 308570144 -> 308528462 (-0.01%); split: -0.03%, +0.02%
InvThroughput: 57698072 -> 57684901 (-0.02%); split: -0.07%, +0.04%
VClause: 440357 -> 440602 (+0.06%); split: -0.02%, +0.08%
SClause: 974724 -> 973315 (-0.14%); split: -0.18%, +0.04%
Copies: 1944925 -> 1945103 (+0.01%); split: -0.11%, +0.12%
Branches: 799444 -> 799461 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1619860 -> 1619233 (-0.04%); split: -0.05%, +0.02%
PreVGPRs: 1252813 -> 1252863 (+0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
2022-03-22 16:33:27 +00:00
Rhys Perry e6672f6fd1 radv: move radv_declare_shader_args() out of shader_variant_compile()
Declaring them earlier will allow us to access them in NIR.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
2022-03-22 16:33:27 +00:00
Rhys Perry 53996cf979 aco: implement load_{scalar,vector}_arg_amd and load_smem_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
2022-03-22 16:33:27 +00:00
Rhys Perry 177b54ebe9 aco/tests: add v_fma_mix tests
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry 1092f37805 aco: use v_fma_mix to combine mul/add/fma output conversions
fossil-db (Sienna Cichlid):
Totals from 42 (0.03% of 134913) affected shaders:
CodeSize: 596904 -> 596332 (-0.10%); split: -0.10%, +0.00%
Instrs: 110194 -> 109902 (-0.26%)
Latency: 1205239 -> 1204915 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 189697 -> 189375 (-0.17%)
VClause: 1365 -> 1366 (+0.07%)
Copies: 5429 -> 5414 (-0.28%); split: -0.33%, +0.06%
Branches: 4034 -> 4026 (-0.20%)

fossil-db (Navi):
Totals from 42 (0.03% of 134913) affected shaders:
CodeSize: 596044 -> 595488 (-0.09%); split: -0.10%, +0.00%
Instrs: 110845 -> 110540 (-0.28%)
Latency: 1206131 -> 1205747 (-0.03%)
InvThroughput: 190178 -> 189809 (-0.19%)
VClause: 1372 -> 1370 (-0.15%); split: -0.29%, +0.15%
Copies: 5671 -> 5641 (-0.53%); split: -0.56%, +0.04%
Branches: 4033 -> 4025 (-0.20%)

fossil-db (Vega):
Totals from 42 (0.03% of 135048) affected shaders:
CodeSize: 605824 -> 605352 (-0.08%); split: -0.08%, +0.00%
Instrs: 115975 -> 115706 (-0.23%)
Latency: 1399845 -> 1398912 (-0.07%)
InvThroughput: 489901 -> 489442 (-0.09%)
VClause: 1314 -> 1311 (-0.23%); split: -0.38%, +0.15%
Copies: 9673 -> 9666 (-0.07%); split: -0.12%, +0.05%
Branches: 4025 -> 4024 (-0.02%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry 21304b772c aco: apply clamp to v_fma_mix
fossil-db (Sienna Cichlid):
Totals from 2536 (1.88% of 134913) affected shaders:
CodeSize: 17314568 -> 17282960 (-0.18%)
Instrs: 3191438 -> 3187487 (-0.12%)
Latency: 59465090 -> 59407885 (-0.10%)
InvThroughput: 10271466 -> 10260512 (-0.11%)

fossil-db (Navi):
Totals from 2512 (1.86% of 134913) affected shaders:
CodeSize: 17194700 -> 17173396 (-0.12%)
Instrs: 3215093 -> 3212430 (-0.08%)
Latency: 60174315 -> 60142593 (-0.05%)
InvThroughput: 9491103 -> 9483979 (-0.08%)

fossil-db (Vega):
Totals from 2512 (1.86% of 135048) affected shaders:
CodeSize: 17186776 -> 17165472 (-0.12%)
Instrs: 3311166 -> 3308503 (-0.08%)
Latency: 65737409 -> 65716096 (-0.03%)
InvThroughput: 21735857 -> 21719792 (-0.07%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry 35196b6d89 aco: combine add/mul as v_fma_mix into fma
fossil-db (Sienna Cichlid):
Totals from 7345 (5.44% of 134913) affected shaders:
CodeSize: 73840060 -> 73768936 (-0.10%); split: -0.10%, +0.00%
Instrs: 13701603 -> 13684183 (-0.13%); split: -0.13%, +0.00%
Latency: 185389373 -> 185306538 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 33785020 -> 33757593 (-0.08%); split: -0.08%, +0.00%
VClause: 237337 -> 237338 (+0.00%)
SClause: 485728 -> 485720 (-0.00%)
Copies: 935900 -> 935279 (-0.07%); split: -0.07%, +0.00%
Branches: 480721 -> 480722 (+0.00%)

fossil-db (Navi):
Totals from 10649 (7.89% of 134913) affected shaders:
VGPRs: 756624 -> 756516 (-0.01%); split: -0.02%, +0.01%
CodeSize: 92156580 -> 91707900 (-0.49%); split: -0.49%, +0.00%
MaxWaves: 159402 -> 159476 (+0.05%); split: +0.07%, -0.02%
Instrs: 17155827 -> 17070449 (-0.50%); split: -0.50%, +0.00%
Latency: 246296456 -> 245487120 (-0.33%); split: -0.33%, +0.00%
InvThroughput: 41438159 -> 41117424 (-0.77%); split: -0.77%, +0.00%
VClause: 323790 -> 323867 (+0.02%); split: -0.00%, +0.03%
SClause: 612077 -> 612034 (-0.01%); split: -0.01%, +0.00%
Copies: 1103012 -> 1102775 (-0.02%); split: -0.03%, +0.01%
Branches: 555893 -> 555896 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 824372 -> 824378 (+0.00%)
PreVGPRs: 740390 -> 740363 (-0.00%); split: -0.01%, +0.01%

fossil-db (Vega):
Totals from 10950 (8.11% of 135048) affected shaders:
SGPRs: 1034528 -> 1034560 (+0.00%)
VGPRs: 794092 -> 794104 (+0.00%); split: -0.01%, +0.01%
CodeSize: 94409768 -> 93955568 (-0.48%); split: -0.48%, +0.00%
MaxWaves: 38950 -> 38939 (-0.03%); split: +0.00%, -0.03%
Instrs: 18162637 -> 18070934 (-0.50%); split: -0.51%, +0.00%
Latency: 291718455 -> 290772451 (-0.32%); split: -0.32%, +0.00%
InvThroughput: 109114674 -> 108489767 (-0.57%); split: -0.57%, +0.00%
VClause: 334498 -> 334579 (+0.02%); split: -0.01%, +0.03%
SClause: 628871 -> 628825 (-0.01%); split: -0.01%, +0.00%
Copies: 1674477 -> 1674850 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 834800 -> 834802 (+0.00%)
PreVGPRs: 750460 -> 750415 (-0.01%); split: -0.01%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry 9934c86761 aco: use v_fma_mix to combine mul/add/fma input conversions
fossil-db (Sienna Cichlid):
Totals from 11558 (8.57% of 134913) affected shaders:
VGPRs: 829392 -> 825200 (-0.51%); split: -0.52%, +0.02%
SpillSGPRs: 7845 -> 8399 (+7.06%)
CodeSize: 101822704 -> 101677172 (-0.14%); split: -0.25%, +0.11%
MaxWaves: 172216 -> 173182 (+0.56%); split: +0.59%, -0.03%
Instrs: 19061343 -> 18883450 (-0.93%); split: -0.93%, +0.00%
Latency: 256011590 -> 255177378 (-0.33%); split: -0.39%, +0.06%
InvThroughput: 46104438 -> 45604059 (-1.09%); split: -1.12%, +0.04%
VClause: 352211 -> 351948 (-0.07%); split: -0.21%, +0.13%
SClause: 676506 -> 676961 (+0.07%); split: -0.04%, +0.11%
Copies: 1246571 -> 1237745 (-0.71%); split: -0.97%, +0.26%
Branches: 626229 -> 626241 (+0.00%); split: -0.02%, +0.03%
PreSGPRs: 882176 -> 888853 (+0.76%); split: -0.00%, +0.76%
PreVGPRs: 796705 -> 792304 (-0.55%); split: -0.56%, +0.00%

fossil-db (Navi):
Totals from 11558 (8.57% of 134913) affected shaders:
VGPRs: 803900 -> 798660 (-0.65%); split: -0.73%, +0.08%
SpillSGPRs: 7894 -> 8492 (+7.58%); split: -0.10%, +7.68%
CodeSize: 96892596 -> 97134716 (+0.25%); split: -0.05%, +0.29%
MaxWaves: 181454 -> 183014 (+0.86%); split: +0.94%, -0.08%
Instrs: 18186813 -> 18093994 (-0.51%); split: -0.56%, +0.05%
Latency: 253385909 -> 253325528 (-0.02%); split: -0.15%, +0.12%
InvThroughput: 43315355 -> 42805541 (-1.18%); split: -1.33%, +0.15%
VClause: 338755 -> 338535 (-0.06%); split: -0.16%, +0.10%
SClause: 656561 -> 656829 (+0.04%); split: -0.07%, +0.11%
Copies: 1162235 -> 1153558 (-0.75%); split: -1.07%, +0.32%
Branches: 588536 -> 588542 (+0.00%); split: -0.03%, +0.03%
PreSGPRs: 854849 -> 861640 (+0.79%); split: -0.00%, +0.80%
PreVGPRs: 783401 -> 779031 (-0.56%); split: -0.56%, +0.00%

fossil-db (Vega):
Totals from 11516 (8.53% of 135048) affected shaders:
SGPRs: 1072128 -> 1076288 (+0.39%); split: -0.01%, +0.40%
VGPRs: 821312 -> 818124 (-0.39%); split: -0.43%, +0.04%
SpillSGPRs: 11952 -> 12677 (+6.07%)
CodeSize: 96378496 -> 96707596 (+0.34%); split: -0.04%, +0.38%
MaxWaves: 42614 -> 42883 (+0.63%); split: +0.68%, -0.04%
Instrs: 18672844 -> 18600274 (-0.39%); split: -0.44%, +0.05%
Latency: 296658786 -> 296338296 (-0.11%); split: -0.21%, +0.10%
InvThroughput: 111665547 -> 111283559 (-0.34%); split: -0.40%, +0.06%
VClause: 343001 -> 342826 (-0.05%); split: -0.14%, +0.09%
SClause: 646684 -> 646657 (-0.00%); split: -0.05%, +0.04%
Copies: 1715316 -> 1712895 (-0.14%); split: -0.53%, +0.39%
PreSGPRs: 850737 -> 856543 (+0.68%); split: -0.04%, +0.72%
PreVGPRs: 775293 -> 772215 (-0.40%); split: -0.41%, +0.02%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry eeef1bbe65 aco: refactor selection of mad/fma
In the future, whether we need to use fma will depend on which
multiplication is chosen.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry e12bee3cb7 aco: improve support for v_fma_mix
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry 79c8740c6e aco: fix fp16 opcode definitions
The v_fma_mix optimizations assume v_cvt_f16_f32 and v_mul_f16 use a v2b
definition.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14769>
2022-03-17 19:04:17 +00:00
Rhys Perry c4cf92cad7 radv,aco,ac/llvm: fix indirect dispatches on the compute queue on GFX7-10
Since neither PKT3_LOAD_SH_REG_INDEX nor PKT3_COPY_DATA work with compute
queues on GFX7-10, we have to load the dispatch size from memory in the
shader.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15064>
2022-03-14 19:54:36 +00:00
Rhys Perry 973967c49d aco: split and recombine unaligned sgpr inputs
An example is the num_work_groups argument. Fixes invalid assembly with
func.compute.num-workgroups.basic.q0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15064>
2022-03-14 19:54:36 +00:00
Daniel Schürmann 70aea6b41a aco/ra: refactor collect_vars() to return a sorted vector
The vector of IDs is sorted with decreasing sizes,
and by increasing assigned registers.
This decouples register assingment from ssa IDs.

Totals from 12694 (9.41% of 134913) affected shaders: (GFX10.3)
VGPRs: 757864 -> 757848 (-0.00%); split: -0.00%, +0.00%
CodeSize: 72350540 -> 72348688 (-0.00%); split: -0.02%, +0.02%
MaxWaves: 237018 -> 237020 (+0.00%); split: +0.00%, -0.00%
Instrs: 13545494 -> 13544699 (-0.01%); split: -0.03%, +0.02%
Latency: 148539203 -> 148533292 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 30319086 -> 30320382 (+0.00%); split: -0.01%, +0.01%
VClause: 326875 -> 327028 (+0.05%); split: -0.05%, +0.09%
SClause: 479833 -> 479837 (+0.00%); split: -0.00%, +0.00%
Copies: 862152 -> 860914 (-0.14%); split: -0.43%, +0.28%
Branches: 317775 -> 317777 (+0.00%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11526>
2022-03-14 08:32:10 +00:00
Daniel Schürmann 61c36b6dc0 aco/ra: refactor find_vars() to return a vector
instead of std::set<>

No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11526>
2022-03-14 08:32:10 +00:00
Daniel Schürmann 9476986e6f aco/ra: special-case get_reg_for_create_vector_copy()
This function implements separate handling for
p_create_vector during get_regs_for_copies().
This simplifies some code and lets more precisely select
swap instructions if possible.

Totals from 876 (0.65% of 134913) affected shaders: (GFX10.3)
VGPRs: 53312 -> 53336 (+0.05%)
CodeSize: 3792936 -> 3788160 (-0.13%); split: -0.15%, +0.03%
MaxWaves: 16084 -> 16078 (-0.04%)
Instrs: 707449 -> 706385 (-0.15%); split: -0.19%, +0.04%
Latency: 6288293 -> 6286677 (-0.03%); split: -0.03%, +0.01%
InvThroughput: 4264450 -> 4263671 (-0.02%); split: -0.02%, +0.00%
VClause: 18655 -> 18679 (+0.13%); split: -0.20%, +0.33%
Copies: 55397 -> 54353 (-1.88%); split: -2.45%, +0.57%
Branches: 12426 -> 12415 (-0.09%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11526>
2022-03-14 08:32:10 +00:00
Daniel Schürmann 9181e8ceba aco/ra: count constant moves in get_reg_create_vector()
Also implements a correct_pos_mask to keep track which
operands are already at the right target position.
To mitigate some regressions, call get_reg_impl() less often.

Totals from 1229 (0.91% of 134913) affected shaders: (GFX10.3)
VGPRs: 60216 -> 59848 (-0.61%)
CodeSize: 3716496 -> 3711268 (-0.14%); split: -0.19%, +0.05%
MaxWaves: 27952 -> 28004 (+0.19%)
Instrs: 685983 -> 685035 (-0.14%); split: -0.20%, +0.06%
Latency: 6727587 -> 6725340 (-0.03%); split: -0.06%, +0.02%
InvThroughput: 9289043 -> 9289866 (+0.01%); split: -0.02%, +0.03%
VClause: 17730 -> 17740 (+0.06%); split: -0.25%, +0.30%
Copies: 54352 -> 53420 (-1.71%); split: -2.46%, +0.75%
Branches: 12122 -> 12121 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11526>
2022-03-14 08:32:10 +00:00
Samuel Pitoiset 6c1c9067d9 aco: always emit vk_cvt_pkrtz_f16_f32 for nir_op_pack_half_2x16_split
From the VK_KHR_shader_float_controls extension:

    "5) Do any of the “Pack” GLSL.std.450 instructions count as
     conversion instructions and have the rounding mode applied?"

    "RESOLVED: No, only instructions listed in “section 3.32.11.
     Conversion Instructions” of the SPIR-V specification count as
     conversion instructions."

This is also the same logic as the LLVM backend.

No fossils-db changes on Sienna Cichlid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15301>
2022-03-09 16:24:20 +00:00
Samuel Pitoiset 342e6f8332 radv,aco,llvm: lower post shuffle vertex in NIR
fossils-db (Sienna Cichlid):
Totals from 774 (0.57% of 134913) affected shaders:
VGPRs: 26496 -> 26312 (-0.69%)
CodeSize: 1825936 -> 1828812 (+0.16%); split: -0.04%, +0.20%
MaxWaves: 22046 -> 22062 (+0.07%)
Instrs: 347634 -> 347975 (+0.10%); split: -0.05%, +0.15%
Latency: 1363949 -> 1356426 (-0.55%); split: -0.59%, +0.04%
InvThroughput: 221529 -> 221380 (-0.07%); split: -0.10%, +0.04%
VClause: 5682 -> 5676 (-0.11%); split: -1.46%, +1.36%
SClause: 7485 -> 7411 (-0.99%); split: -1.48%, +0.49%
Copies: 30481 -> 30420 (-0.20%); split: -0.51%, +0.31%
PreVGPRs: 19717 -> 19656 (-0.31%)

fossil-db (Polaris10):
Totals from 896 (0.66% of 135960) affected shaders:
SGPRs: 49824 -> 49648 (-0.35%); split: -0.39%, +0.03%
VGPRs: 31040 -> 29948 (-3.52%); split: -3.62%, +0.10%
CodeSize: 875960 -> 875920 (-0.00%); split: -0.06%, +0.05%
MaxWaves: 6380 -> 6429 (+0.77%)
Instrs: 171522 -> 171482 (-0.02%); split: -0.07%, +0.05%
Latency: 1356082 -> 1334386 (-1.60%); split: -1.61%, +0.01%
InvThroughput: 553389 -> 552957 (-0.08%); split: -0.08%, +0.00%
VClause: 4317 -> 4244 (-1.69%); split: -2.41%, +0.72%
SClause: 6157 -> 6139 (-0.29%); split: -0.45%, +0.16%
Copies: 9340 -> 9235 (-1.12%); split: -1.24%, +0.12%
PreVGPRs: 22366 -> 22116 (-1.12%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15113>
2022-03-08 19:18:01 +00:00
Rhys Perry d068eb53e8 aco/insert_exec_mask: optimize top-level transition to exact before demote
fossil-db (Sienna Cichlid):
Totals from 5767 (3.55% of 162293) affected shaders:
Instrs: 3264949 -> 3257527 (-0.23%); split: -0.23%, +0.00%
CodeSize: 17835692 -> 17806004 (-0.17%); split: -0.17%, +0.00%
Latency: 45990060 -> 45987924 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 7643850 -> 7643835 (-0.00%); split: -0.00%, +0.00%
Copies: 193641 -> 186219 (-3.83%); split: -3.84%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
2022-03-08 12:49:59 +00:00
Rhys Perry 42a5be975a aco/insert_exec_mask: use get_exec_op
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
2022-03-08 12:49:59 +00:00
Rhys Perry aa55ecc296 aco/insert_exec_mask: fix top-level to-exact with non-global exact mask
After transitioning to exact after a discard, the exec stack might be:
[exact|global, wqm, exact]

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
2022-03-08 12:49:59 +00:00
Rhys Perry d28b6b6856 aco: rework removal of jumps over branches
Only allow this in situations where we know it's safe. In particular, this
stops removal of unconditional branches like with
block_kind_continue_or_break.

Fixes dEQP-VK.graphicsfuzz.fragcoord-control-flow hang.

fossil-db (Sienna Cichlid):
Totals from 34 (0.02% of 162293) affected shaders:
Instrs: 84115 -> 84178 (+0.07%); split: -0.00%, +0.08%
CodeSize: 463372 -> 463624 (+0.05%); split: -0.00%, +0.06%
Latency: 3467316 -> 3467652 (+0.01%)
InvThroughput: 3085493 -> 3085578 (+0.00%)
Branches: 3221 -> 3284 (+1.96%); split: -0.03%, +1.99%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f030b75b7d ("aco: relax condition to remove branches in case of few instructions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15214>
2022-03-04 12:32:36 +00:00
Samuel Pitoiset 9b113f1b6c aco: implement nir_op_pack_{uint,sint}_2x16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15231>
2022-03-04 08:06:56 +00:00
Daniel Schürmann ccf4bcd162 aco/ra: don't immediately assign a register for p_branch
These get now assigned after handling phis.

Totals from 564 (0.42% of 134913) affected shaders: (GFX10.3)
CodeSize: 5519744 -> 5515308 (-0.08%)
Instrs: 1063045 -> 1061936 (-0.10%)
Latency: 11880452 -> 11875904 (-0.04%)
InvThroughput: 2259933 -> 2259581 (-0.02%); split: -0.02%, +0.00%
Copies: 86908 -> 85799 (-1.28%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Rhys Perry c3070773f8 aco/tests: add test for branch definition RA
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Rhys Perry 32d0bae8ec aco: fix branch definition validation
Like how they have to be register allocated differently, branch
definitions at merge block predecessors need to be validated differently.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Rhys Perry bed5a31005 aco: add validate_instr_defs()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Rhys Perry d5349a99c2 aco/ra: fix register allocation of branch definitions
fossil-db (Sienna Cichlid):
Totals from 704 (0.52% of 134913) affected shaders:
CodeSize: 7177288 -> 7182072 (+0.07%); split: -0.00%, +0.07%
Instrs: 1371781 -> 1372977 (+0.09%); split: -0.00%, +0.09%
Latency: 17993572 -> 18001344 (+0.04%); split: -0.00%, +0.04%
InvThroughput: 4198996 -> 4199569 (+0.01%); split: -0.00%, +0.01%
Copies: 122456 -> 123516 (+0.87%); split: -0.01%, +0.88%
Branches: 43815 -> 43818 (+0.01%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Rhys Perry 608d48b787 aco/ra: add get_reg_phi() helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Rhys Perry ceca5e68c4 aco: remove vcc hint from branch definitions
This doesn't seem to have much benefit anymore.

fossil-db (Sienna Cichlid):
Totals from 198 (0.15% of 134913) affected shaders:
CodeSize: 2610536 -> 2610872 (+0.01%); split: -0.01%, +0.02%
Instrs: 479001 -> 479085 (+0.02%); split: -0.01%, +0.03%
Latency: 7310684 -> 7300735 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 2439084 -> 2437446 (-0.07%); split: -0.07%, +0.00%
SClause: 14760 -> 14722 (-0.26%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Samuel Pitoiset 516aee64cc radv,aco: do not lower nir_op_pack_{unorm,snorm}_2x16
v_cvt_pknorm_{u16,i16}_f32 can be emitted instead, it's supported on
all generations.

No fossils-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15215>
2022-03-03 14:54:12 +01:00
Timur Kristóf 11957d3863 aco: Remove superfluous code for mesh shader workgroup ID.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15199>
2022-03-01 15:37:12 +00:00
Daniel Schürmann f030b75b7d aco: relax condition to remove branches in case of few instructions
This patch relaxes the conditions under which
we remove branch instructions.

Totals from 27246 (20.20% of 134913) affected shaders: (GFX10.3)
CodeSize: 193413312 -> 192924928 (-0.25%)
Instrs: 36146788 -> 36024692 (-0.34%)
Latency: 528374112 -> 528469044 (+0.02%); split: -0.01%, +0.02%
InvThroughput: 106198759 -> 106216583 (+0.02%); split: -0.00%, +0.02%
Branches: 1040640 -> 918543 (-11.73%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8647>
2022-02-25 15:38:08 +00:00
Timur Kristóf 1ca6b2f216 aco: Support memory modes properly with load/store_buffer_amd.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15161>
2022-02-25 14:08:39 +01:00
Timur Kristóf ba4b48e787 aco: Support task_payload with barriers, refactor allowed storage class.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15161>
2022-02-25 14:08:36 +01:00
Timur Kristóf cd0dd5d6b7 aco: Add storage class for Task Shader payload.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15161>
2022-02-25 13:20:08 +01:00
Timur Kristóf 9cc9cf77a8 aco: Fix multiview view index for mesh shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034>
2022-02-25 06:31:33 +00:00
Timur Kristóf 082b691141 aco: Fix workgroup_id.y and .z for NV_mesh_shader.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034>
2022-02-25 06:31:33 +00:00
Timur Kristóf 10ebfb3bf2 aco: Allow 1-byte loads and stores with load/store_buffer_amd
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15034>
2022-02-25 06:31:33 +00:00
Georg Lehmann d223b7f096 radv, aco: Add u_foreach_bit to .clang-format.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15083>
2022-02-22 14:57:29 +00:00
Samuel Pitoiset 369b8cffea radv,aco,llvm: lower adjusting vertex alpha in NIR
Instead of duplicating the same lowering in both compiler backends.
This pass will be used to do more VS input lowering.

fossils-db (Polaris10):
Totals from 48 (0.04% of 135960) affected shaders:
VGPRs: 1692 -> 1684 (-0.47%)
CodeSize: 54016 -> 53964 (-0.10%); split: -0.11%, +0.01%
MaxWaves: 339 -> 341 (+0.59%)
Instrs: 11260 -> 11247 (-0.12%); split: -0.13%, +0.02%
Latency: 88165 -> 88113 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 36153 -> 36093 (-0.17%)
Copies: 583 -> 568 (-2.57%)

fossils-db (Pitcairn):
Totals from 43 (0.03% of 135960) affected shaders:
VGPRs: 1548 -> 1552 (+0.26%)
CodeSize: 47900 -> 47820 (-0.17%)
Instrs: 10751 -> 10731 (-0.19%)
Latency: 83029 -> 82873 (-0.19%)
VClause: 168 -> 164 (-2.38%)
SClause: 393 -> 391 (-0.51%)
Copies: 705 -> 685 (-2.84%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15076>
2022-02-22 08:08:42 +00:00
Samuel Pitoiset cbd5724a6d aco: implement nir_intrinsic_load_vrs_rates_amd
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14713>
2022-02-16 08:11:19 +01:00
Daniel Schürmann 1bbbabedb7 aco/insert_exec_mask: refactor and remove some unnecessary WQM handling code
Some cases cannot happen and don't need to be handled anymore.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann d7d7b9974a aco/insert_exec_mask: refactor and simplify get_block_needs()
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann fcc5dec8d6 aco/insert_exec_mask: remove ever_again_needs and Exact_Branch
This information is not required anymore.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann cbb1b095ca aco/insert_exec_mask: remove some unnecessary WQM loop handling code
These workarounds are were necessary to prevent infinite loops
with helper lane registers containing wrong data.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann 580a63b4ac aco/insert_exec_mask: remove Preserve_WQM flag
If WQM is needed anywhere after discard_if(), it will also
be flagged as WQM. We can rely on that to preserve the WQM mask.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann a5a40beefa aco: don't emit WQM for bool_to_scalar_condition
This was only necessary to ensure that the source is computed in WQM
if the current exec mask is in WQM.

Totals from 23170 (17.17% of 134913) affected shaders: (GFX10.3)
VGPRs: 1384464 -> 1383400 (-0.08%); split: -0.08%, +0.01%
SpillSGPRs: 7575 -> 7574 (-0.01%)
CodeSize: 146444752 -> 146317104 (-0.09%); split: -0.13%, +0.04%
MaxWaves: 429870 -> 429868 (-0.00%)
Instrs: 27202586 -> 27170316 (-0.12%); split: -0.17%, +0.05%
Latency: 379488313 -> 379335412 (-0.04%); split: -0.07%, +0.03%
InvThroughput: 69500561 -> 69487704 (-0.02%); split: -0.03%, +0.01%
VClause: 473080 -> 473038 (-0.01%); split: -0.02%, +0.01%
SClause: 1080576 -> 1080571 (-0.00%); split: -0.06%, +0.06%
Copies: 1492865 -> 1504604 (+0.79%); split: -0.40%, +1.19%
Branches: 711295 -> 716849 (+0.78%); split: -0.01%, +0.79%
PreSGPRs: 1331362 -> 1328402 (-0.22%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann f816dd1be7 aco: don't propagate WQM for p_as_uniform
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.

Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann 825cd696dc aco/insert_exec_mask: stay in WQM while helper lanes are still needed
This patch flags all instructions WQM which don't require
Exact mode, but depend on the exec mask as long as WQM
is needed on any control flow path afterwards.
This will mostly prevent accidental copies of WQM values
within Exact mode, and also makes a lot of other workarounds
unnecessary.

Totals from 17374 (12.88% of 134913) affected shaders: (GFX10.3)
VGPRs: 526952 -> 527384 (+0.08%); split: -0.01%, +0.09%
CodeSize: 33740512 -> 33766636 (+0.08%); split: -0.06%, +0.14%
MaxWaves: 488166 -> 488108 (-0.01%); split: +0.00%, -0.02%
Instrs: 6254240 -> 6260557 (+0.10%); split: -0.08%, +0.18%
Latency: 66497580 -> 66463472 (-0.05%); split: -0.15%, +0.10%
InvThroughput: 13265741 -> 13264036 (-0.01%); split: -0.03%, +0.01%
VClause: 122962 -> 122975 (+0.01%); split: -0.01%, +0.02%
SClause: 334805 -> 334405 (-0.12%); split: -0.51%, +0.39%
Copies: 275728 -> 282341 (+2.40%); split: -0.91%, +3.31%
Branches: 92546 -> 90990 (-1.68%); split: -1.68%, +0.00%
PreSGPRs: 504119 -> 504352 (+0.05%); split: -0.00%, +0.05%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00