Currently without alignment check, so that
we can only use the _byte and _short versions
and multi-component stores are split.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
To not having to split the register file into single bytes,
we maintain a map with registers which contain subdword variables.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
This commit completely rewrites the way we extract a structured CFG from
SPIR-V. The new approach is different in a few ways:
1. It does a breadth-first search instead of depth-first. This means
that we've visited the merge node for a construct before we visit
any of the nodes inside the construct. This makes it easier to
validate things like loop and switch nesting.
2. We record more information in the CFG. Earlier commits added a
parent pointer to vtn_cf_node but we now record all of the merge and
other special blocks for each CFG node. This lets us validate
things more precisely.
3. It makes heavy use of merge blocks for walking the CFG. Previously,
we sort of used them as hints for trying to guess the CFG structure
but things got dicey whenever a merge was missing. We had some
heuristics for how to handle short-circuiting if statements but it
was a bunch of special cases.
Now, we make them a fundamental part of walking the CFG. When we
encounter a control-flow construct, we add the body components of
the construct to the BFS work list and then jump to the merge block
if one exists to continue scanning the current CFG nesting level.
If no merge block exists, we assume that means that control-flow
never re-converges in a normal way and that the only way to get back
to normality is with a direct jump such as a loop break or continue.
This should make things far more robust when trying to deal with the
more creative placement (or lack thereof) of merge instructions.
Reviewed-by: Alan Baker <alanbaker@google.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
This commit enables the I/O vectorization pass that was originally
written for ACO for Intel drivers. We enable it for UBOs, SSBOs, global
memory, and SLM. We only enable vectorization for the scalar back-end
because it vec4 makes certain alignment assumptions.
Shader-db results with iris on ICL:
total instructions in shared programs: 16077927 -> 16068236 (-0.06%)
instructions in affected programs: 199839 -> 190148 (-4.85%)
helped: 324
HURT: 0
helped stats (abs) min: 2 max: 458 x̄: 29.91 x̃: 4
helped stats (rel) min: 0.11% max: 38.94% x̄: 4.32% x̃: 1.64%
95% mean confidence interval for instructions value: -37.02 -22.80
95% mean confidence interval for instructions %-change: -5.07% -3.58%
Instructions are helped.
total cycles in shared programs: 336806135 -> 336151501 (-0.19%)
cycles in affected programs: 16009735 -> 15355101 (-4.09%)
helped: 458
HURT: 154
helped stats (abs) min: 1 max: 77812 x̄: 1542.50 x̃: 75
helped stats (rel) min: <.01% max: 34.46% x̄: 5.16% x̃: 2.01%
HURT stats (abs) min: 1 max: 22800 x̄: 336.55 x̃: 20
HURT stats (rel) min: <.01% max: 17.11% x̄: 2.12% x̃: 1.00%
95% mean confidence interval for cycles value: -1596.83 -542.49
95% mean confidence interval for cycles %-change: -3.83% -2.82%
Cycles are helped.
total sends in shared programs: 814177 -> 809049 (-0.63%)
sends in affected programs: 15422 -> 10294 (-33.25%)
helped: 324
HURT: 0
helped stats (abs) min: 1 max: 256 x̄: 15.83 x̃: 2
helped stats (rel) min: 1.33% max: 67.90% x̄: 21.21% x̃: 15.38%
95% mean confidence interval for sends value: -19.67 -11.98
95% mean confidence interval for sends %-change: -23.03% -19.39%
Sends are helped.
LOST: 7
GAINED: 2
Most of the helped shaders were in the following titles:
- Doom
- Deus Ex: Mankind Divided
- Aztec Ruins
- Shadow of Mordor
- DiRT Showdown
- Tomb Raider (Rise, I think)
Five of the lost programs are SIMD16 shaders we lost from dirt showdown.
The other two are compute shaders in Aztec Ruins which switched from
SIMD8 to SIMD16.
Vulkan pipeline-db stats on ICL:
Instructions in all programs: 296780486 -> 293493363 (-1.1%)
Loops in all programs: 149669 -> 149669 (+0.0%)
Cycles in all programs: 90999206722 -> 88513844563 (-2.7%)
Spills in all programs: 1710217 -> 1730691 (+1.2%)
Fills in all programs: 1931235 -> 1958138 (+1.4%)
By far the most help was in the Tomb Raider games. A couple of Batman
games with DXVK were also helped. In Shadow of the Tomb Raider:
Instructions in all programs: 41614336 -> 39408023 (-5.3%)
Loops in all programs: 32200 -> 32200 (+0.0%)
Cycles in all programs: 1875498485 -> 1667034831 (-11.1%)
Spills in all programs: 196307 -> 214945 (+9.5%)
Fills in all programs: 282736 -> 307113 (+8.6%)
Benchmarks of real games I've done on this patch:
- Rise of the Tomb Raider: +3%
- Shadow of the Tomb Raider: +10%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
These were clearly copied and pasted from SSBOs. The shared atomics
don't have an SSBO index so their offset is src0 and data is src1.
Fixes: ce9205c03b "nir: add a load/store vectorization pass"
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
We're about to do load/store vectorization right before this but we need
that to happen after we've done a round of optimization. Otherwise,
we'll be getting unoptimized NIR in from ANV and the vectorizer won't be
able to do anything with it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
This commit makes us take both bit size and alignment into account so
that we can properly handle cases such as when we have a 32-bit store
to an 8-bit-aligned address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Thanks to the NIR vectorizing pass, we're about to see alignments that
are higher than the bit size. Previously, we could use either and we
just happened to choose alignment (probably the wrong choice) so it's
harmless to switch to detecting based on bit size. This commit changes
things to take both into account which is more accurate to what the
messages we're using do. We also beef up the asserts and make them more
consistent, more accurate, and more complete.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>