Commit Graph

596 Commits

Author SHA1 Message Date
Alyssa Rosenzweig 90beea75f6 pan/bi: Don't reorder push with no_ubo_to_push
Otherwise, load_push_constant won't work properly. This could probably be made
to work if we tried hard enough, but we still don't want reordering for internal
(meta) shaders which are layed out deliberately.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig 17ea1642e2 pan/bi: Implement load_push_constant
Bifrost supports "fast access uniforms" loaded from a single contiguous buffer.
This maps directly to Vulkan push constants, with some caveats:

* No indirect access. Indirects need to be lowered to a UBO pull.
* Strict alignment requirements. These will be met in practice.

Implement the NIR intrinsic and map it to the native hardware construct.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig e57dfed419 pan/bi: Implement b2i with MUX
The result_type modifier propagation looks for MUX instructions, so using this
canonical b2i implementation allows the sequence b2i(cmp) to be fused.
It's also faster on its own: on Valhall, MUX may be implemented as CSEL on the
CVT unit, while AND may only be implemented on the SFU unit. So in case this
doesn't get fused, we expect 4x better throughput for b2i with this
implementation. Similarly, on Bifrost, MUX may be scheduled to either unit (as
CSEL on FMA or MUX on ADD), whereas AND may only be scheduled to FMA.

Results on Mali-G52:

total instructions in shared programs: 2419171 -> 2414814 (-0.18%)
instructions in affected programs: 272203 -> 267846 (-1.60%)
helped: 767
HURT: 0
helped stats (abs) min: 1.0 max: 138.0 x̄: 5.68 x̃: 2
helped stats (rel) min: 0.12% max: 15.57% x̄: 2.09% x̃: 0.68%
95% mean confidence interval for instructions value: -6.68 -4.68
95% mean confidence interval for instructions %-change: -2.37% -1.82%
Instructions are helped.

total tuples in shared programs: 1932822 -> 1929234 (-0.19%)
tuples in affected programs: 76485 -> 72897 (-4.69%)
helped: 380
HURT: 3
helped stats (abs) min: 1.0 max: 138.0 x̄: 9.46 x̃: 1
helped stats (rel) min: 0.14% max: 15.96% x̄: 3.81% x̃: 0.92%
HURT stats (abs)   min: 1.0 max: 6.0 x̄: 2.67 x̃: 1
HURT stats (rel)   min: 0.38% max: 8.57% x̄: 3.80% x̃: 2.44%
95% mean confidence interval for tuples value: -11.30 -7.44
95% mean confidence interval for tuples %-change: -4.27% -3.22%
Tuples are helped.

total clauses in shared programs: 356094 -> 355992 (-0.03%)
clauses in affected programs: 3264 -> 3162 (-3.12%)
helped: 80
HURT: 0
helped stats (abs) min: 1.0 max: 9.0 x̄: 1.27 x̃: 1
helped stats (rel) min: 0.81% max: 50.00% x̄: 4.83% x̃: 3.39%
95% mean confidence interval for clauses value: -1.49 -1.06
95% mean confidence interval for clauses %-change: -6.23% -3.43%
Clauses are helped.

total cycles in shared programs: 167337.10 -> 167329.19 (<.01%)
cycles in affected programs: 510.08 -> 502.17 (-1.55%)
helped: 80
HURT: 2
helped stats (abs) min: 0.041665999999999315 max: 0.7916659999999993 x̄: 0.10 x̃: 0
helped stats (rel) min: 0.51% max: 13.64% x̄: 2.12% x̃: 1.34%
HURT stats (abs)   min: 0.041665999999999315 max: 0.0416669999999999 x̄: 0.04 x̃: 0
HURT stats (rel)   min: 0.39% max: 2.78% x̄: 1.58% x̃: 1.58%
95% mean confidence interval for cycles value: -0.12 -0.07
95% mean confidence interval for cycles %-change: -2.59% -1.48%
Cycles are helped.

total arith in shared programs: 73819.54 -> 73669.25 (-0.20%)
arith in affected programs: 2840.54 -> 2690.25 (-5.29%)
helped: 383
HURT: 3
helped stats (abs) min: 0.041665999999999315 max: 5.75 x̄: 0.39 x̃: 0
helped stats (rel) min: 0.33% max: 18.81% x̄: 4.39% x̃: 0.98%
HURT stats (abs)   min: 0.041665999999999315 max: 0.25 x̄: 0.11 x̃: 0
HURT stats (rel)   min: 0.39% max: 8.96% x̄: 4.04% x̃: 2.78%
95% mean confidence interval for arith value: -0.47 -0.31
95% mean confidence interval for arith %-change: -4.93% -3.71%
Arith are helped.

total quadwords in shared programs: 1679798 -> 1676259 (-0.21%)
quadwords in affected programs: 72826 -> 69287 (-4.86%)
helped: 381
HURT: 15
helped stats (abs) min: 1.0 max: 142.0 x̄: 9.35 x̃: 1
helped stats (rel) min: 0.25% max: 18.87% x̄: 4.33% x̃: 1.13%
HURT stats (abs)   min: 1.0 max: 6.0 x̄: 1.47 x̃: 1
HURT stats (rel)   min: 0.30% max: 6.25% x̄: 0.77% x̃: 0.35%
95% mean confidence interval for quadwords value: -10.76 -7.11
95% mean confidence interval for quadwords %-change: -4.71% -3.56%
Quadwords are helped.

Results on Mali-G57:

total instructions in shared programs: 2704193 -> 2699317 (-0.18%)
instructions in affected programs: 293366 -> 288490 (-1.66%)
helped: 758
HURT: 5
helped stats (abs) min: 1.0 max: 151.0 x̄: 6.45 x̃: 2
helped stats (rel) min: 0.11% max: 22.22% x̄: 2.05% x̃: 0.64%
HURT stats (abs)   min: 1.0 max: 7.0 x̄: 2.20 x̃: 1
HURT stats (rel)   min: 0.22% max: 1.69% x̄: 0.87% x̃: 1.08%
95% mean confidence interval for instructions value: -7.42 -5.36
95% mean confidence interval for instructions %-change: -2.27% -1.79%
Instructions are helped.

total cycles in shared programs: 141711.73 -> 141711.84 (<.01%)
cycles in affected programs: 214.36 -> 214.47 (0.05%)
helped: 4
HURT: 42
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.20 x̃: 0
helped stats (rel) min: 1.85% max: 12.78% x̄: 9.12% x̃: 10.93%
HURT stats (abs)   min: 0.015625 max: 0.09375 x̄: 0.02 x̃: 0
HURT stats (rel)   min: 0.17% max: 17.65% x̄: 0.84% x̃: 0.34%
95% mean confidence interval for cycles value: -0.02 0.03
95% mean confidence interval for cycles %-change: -1.23% 1.17%
Inconclusive result (value mean confidence interval includes 0).

total cvt in shared programs: 14479.14 -> 14474.19 (-0.03%)
cvt in affected programs: 2877.05 -> 2872.09 (-0.17%)
helped: 508
HURT: 209
helped stats (abs) min: 0.015625 max: 0.453125 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.25% max: 16.67% x̄: 1.23% x̃: 0.37%
HURT stats (abs)   min: 0.015625 max: 0.296875 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 0.15% max: 18.18% x̄: 1.70% x̃: 0.34%
95% mean confidence interval for cvt value: -0.01 -0.00
95% mean confidence interval for cvt %-change: -0.57% -0.18%
Cvt are helped.

total sfu in shared programs: 7875.69 -> 7590.75 (-3.62%)
sfu in affected programs: 1567.38 -> 1282.44 (-18.18%)
helped: 906
HURT: 0
helped stats (abs) min: 0.0625 max: 8.625 x̄: 0.31 x̃: 0
helped stats (rel) min: 2.38% max: 100.00% x̄: 16.80% x̃: 5.63%
95% mean confidence interval for sfu value: -0.37 -0.26
95% mean confidence interval for sfu %-change: -18.43% -15.17%
Sfu are helped.

total quadwords in shared programs: 1468152 -> 1465800 (-0.16%)
quadwords in affected programs: 37104 -> 34752 (-6.34%)
helped: 161
HURT: 2
helped stats (abs) min: 8.0 max: 80.0 x̄: 14.71 x̃: 8
helped stats (rel) min: 1.67% max: 20.00% x̄: 8.05% x̃: 7.69%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 3.57% max: 3.85% x̄: 3.71% x̃: 3.71%
95% mean confidence interval for quadwords value: -16.29 -12.57
95% mean confidence interval for quadwords %-change: -8.58% -7.22%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig 3b3cd59fb8 panfrost: Launch transform feedback shaders
We now have infrastructure in place to generate variants of vertex shaders
specialized for transform feedback. All that's left is launching these
compute-like kernels before the IDVS job, implementing both the
transform feedback and the regular rasterization pipeline. This implements
transform feedback on Valhall, passing the relevant GLES3.1 tests.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig 4e341e70d8 pan/bi: Handle transform feedback intrinsics
Translate the intrinsics we introduced to lower away transform feedback into
Panfrost system values which the GL driver can handle.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig ae3fa6cc1d pan/bi: Add transform feedback lowering pass
Add a simple NIR-based implementation of transform feedback, appropriate for
OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders).
Stores to varyings that will be captured are replaced by stores to transform
feedback buffers and some addressing math. This allows implementing the semantic
of transform feedback in a compute-like stage.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig 7535362204 pan/bi: Fix clper_xor on Mali-G31
Mali-G31 has the old CLPER instruction, not the new one, which means we don't
get to specify a custom lane op. But the clper_xor helper incorrectly checked
the arch, not the implementation quirk.

Fixes: c00e7b729f ("pan/bi: Optimize abs(derivative)")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16846>
2022-06-02 20:32:43 -04:00
Alyssa Rosenzweig bc4d42023d pan/bi: Respect swizzles in nir_op_pack_64_2x32_split
Triggered a BIR validation error, which made debugging a breeze. That validation
pass (dimensionality checks) gets a lot of use, it seems :-)

Fixes:

   dEQP-VK.ssbo.layout.2_level_array.std430.row_major_mat4x2_comp_access_store_cols

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>
2022-06-01 20:08:42 +00:00
Alyssa Rosenzweig 5067a26f44 pan/bi: Use flow control lowering on Valhall
Logically at the same part of the compile pipeline as clause scheduling on
Bifrost. Lots of similarities, too. Now that we generate flow control only as a
late pass, various hacks in the compiler are no longer necessary and are
dropped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig c0180f6bd3 pan/bi: Export helper termination analysis
The current helper termination analysis code is hardwired for clauses, so it
won't work for Valhall. However, the bulk of it is dataflow analysis which is
portable between Bifrost and Valhall. Export the interesting bits so we can
reuse them on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 7bb635316b pan/bi: Export bi_block_add_successor
For use in unit tests that need to create blocks.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 4627cd99de pan/bi: Preserve flow control for non-psiz variant
Otherwise we will get INSTR_INVALID_ENC faults when deleting the final STORE.end
instruction, after we rework our flow control code.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Daniel Schürmann bd151a256e nir/opt_vectorize: add callback for max vectorization width
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.

This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().

Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
2022-06-01 11:41:44 +00:00
Alyssa Rosenzweig 0170500627 pan/bi: Interpolate varyings at 16-bit
On Bifrost, we have a single "load float varying" instruction that controls the
bit size of the result, allowing us to fold a f2f16 into the load. However, the
larger benefit is that 16-bit varying loads are interpolated at 16-bit. Arm
claims that the varying unit has a 32-bit data path, allowing 16-bit varyings to
be interpolated in half the cycles from 32-bit. This change should therefore
improve performance for workloads that are varying units. This means we want to
be aggressive about 16-bit varying loads, even if it costs some extra f2f32
instructions.

glmark2 total score on Mali-G52 up from 1173fps to 1218fps with particular wins
in -brefract, -bshadow, -bjellyfish, and -bshading.

total instructions in shared programs: 2432246 -> 2423668 (-0.35%)
instructions in affected programs: 516056 -> 507478 (-1.66%)
helped: 3641
HURT: 432
helped stats (abs) min: 1.0 max: 12.0 x̄: 2.91 x̃: 2
helped stats (rel) min: 0.08% max: 54.55% x̄: 9.88% x̃: 5.71%
HURT stats (abs)   min: 1.0 max: 42.0 x̄: 4.71 x̃: 4
HURT stats (rel)   min: 0.23% max: 200.00% x̄: 12.58% x̃: 6.37%
95% mean confidence interval for instructions value: -2.21 -2.00
95% mean confidence interval for instructions %-change: -7.92% -7.07%
Instructions are helped.

total tuples in shared programs: 1941309 -> 1934647 (-0.34%)
tuples in affected programs: 353169 -> 346507 (-1.89%)
helped: 3233
HURT: 453
helped stats (abs) min: 1.0 max: 14.0 x̄: 2.46 x̃: 2
helped stats (rel) min: 0.12% max: 50.00% x̄: 9.90% x̃: 5.56%
HURT stats (abs)   min: 1.0 max: 25.0 x̄: 2.85 x̃: 2
HURT stats (rel)   min: 0.22% max: 150.00% x̄: 8.96% x̃: 5.26%
95% mean confidence interval for tuples value: -1.89 -1.72
95% mean confidence interval for tuples %-change: -8.01% -7.15%
Tuples are helped.

total clauses in shared programs: 357354 -> 356610 (-0.21%)
clauses in affected programs: 25794 -> 25050 (-2.88%)
helped: 994
HURT: 317
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.16 x̃: 1
helped stats (rel) min: 1.49% max: 33.33% x̄: 10.78% x̃: 10.00%
HURT stats (abs)   min: 1.0 max: 4.0 x̄: 1.31 x̃: 1
HURT stats (rel)   min: 1.19% max: 50.00% x̄: 13.56% x̃: 8.33%
95% mean confidence interval for clauses value: -0.63 -0.50
95% mean confidence interval for clauses %-change: -5.63% -4.16%
Clauses are helped.

total cycles in shared programs: 167697.96 -> 167431.15 (-0.16%)
cycles in affected programs: 12638.29 -> 12371.48 (-2.11%)
helped: 2652
HURT: 350
helped stats (abs) min: 0.04166399999999726 max: 0.75 x̄: 0.11 x̃: 0
helped stats (rel) min: 0.12% max: 100.00% x̄: 14.39% x̃: 5.04%
HURT stats (abs)   min: 0.041665999999999315 max: 0.5833329999999997 x̄: 0.11 x̃: 0
HURT stats (rel)   min: 0.00% max: 75.00% x̄: 7.90% x̃: 4.71%
95% mean confidence interval for cycles value: -0.09 -0.08
95% mean confidence interval for cycles %-change: -12.56% -11.02%
Cycles are helped.

total arith in shared programs: 74169.46 -> 73891.71 (-0.37%)
arith in affected programs: 13885.87 -> 13608.12 (-2.00%)
helped: 3215
HURT: 445
helped stats (abs) min: 0.04166399999999726 max: 0.5416680000000014 x̄: 0.10 x̃: 0
helped stats (rel) min: 0.12% max: 100.00% x̄: 14.16% x̃: 6.67%
HURT stats (abs)   min: 0.041665999999999315 max: 1.125 x̄: 0.12 x̃: 0
HURT stats (rel)   min: 0.00% max: 100.00% x̄: 9.76% x̃: 5.49%
95% mean confidence interval for arith value: -0.08 -0.07
95% mean confidence interval for arith %-change: -11.91% -10.59%
Arith are helped.

total texture in shared programs: 11936 -> 11931 (-0.04%)
texture in affected programs: 20 -> 15 (-25.00%)
helped: 10
HURT: 0
helped stats (abs) min: 0.5 max: 0.5 x̄: 0.50 x̃: 0
helped stats (rel) min: 14.29% max: 100.00% x̄: 45.71% x̃: 33.33%
95% mean confidence interval for texture value: -0.50 -0.50
95% mean confidence interval for texture %-change: -73.16% -18.26%
Texture are helped.

total vary in shared programs: 4180.88 -> 3447.19 (-17.55%)
vary in affected programs: 2109.88 -> 1376.19 (-34.77%)
helped: 2202
HURT: 39
helped stats (abs) min: 0.0625 max: 1.4375 x̄: 0.34 x̃: 0
helped stats (rel) min: 2.38% max: 66.67% x̄: 40.43% x̃: 50.00%
HURT stats (abs)   min: 0.125 max: 0.375 x̄: 0.26 x̃: 0
HURT stats (rel)   min: 0.00% max: 300.00% x̄: 92.54% x̃: 23.08%
95% mean confidence interval for vary value: -0.34 -0.32
95% mean confidence interval for vary %-change: -39.22% -37.01%
Vary are helped.

total quadwords in shared programs: 1689664 -> 1684852 (-0.28%)
quadwords in affected programs: 265522 -> 260710 (-1.81%)
helped: 2864
HURT: 447
helped stats (abs) min: 1.0 max: 14.0 x̄: 2.10 x̃: 2
helped stats (rel) min: 0.15% max: 31.58% x̄: 6.05% x̃: 4.65%
HURT stats (abs)   min: 1.0 max: 22.0 x̄: 2.67 x̃: 2
HURT stats (rel)   min: 0.27% max: 38.46% x̄: 6.79% x̃: 4.55%
95% mean confidence interval for quadwords value: -1.54 -1.37
95% mean confidence interval for quadwords %-change: -4.55% -4.08%
Quadwords are helped.

total threads in shared programs: 53656 -> 53688 (0.06%)
threads in affected programs: 32 -> 64 (100.00%)
helped: 32
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

total preloads in shared programs: 116212 -> 103476 (-10.96%)
preloads in affected programs: 45222 -> 32486 (-28.16%)
helped: 3022
HURT: 11
helped stats (abs) min: 1.0 max: 11.0 x̄: 4.23 x̃: 4
helped stats (rel) min: 7.14% max: 68.75% x̄: 30.39% x̃: 25.00%
HURT stats (abs)   min: 2.0 max: 4.0 x̄: 3.45 x̃: 4
HURT stats (rel)   min: 14.29% max: 50.00% x̄: 25.93% x̃: 25.00%
95% mean confidence interval for preloads value: -4.26 -4.14
95% mean confidence interval for preloads %-change: -30.68% -29.69%
Preloads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Chris Healy cphealy@gmail.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig 93f69e4b1c pan/bi: Model Valhall source formats
LD_VAR_BUF instructions on Valhall take a source format, indicating the
in-memory format of the varying independent from the register format, which we
still model within the compiler for compatibility with Bifrost. (Prior to
Valhall, source format is specified in the attribute descriptor as a physical
pixel format.)

Model this information, allowing us to generate fp16 LD_VAR_BUF instructions
correctly on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig 67f5721349 panfrost: Set allow_rotating_primitives
On Valhall, the driver should set this flag if the hardware may rotate
primitives. This happens if:

1. The rasterization of lines does not matter, AND
2. The provoking vertex does not matter.

The first condition we may satisfy by checking for LINES and the second by
checking for flat shading. Otherwise, we should set this flag to allow
optimizations. This may be more efficient for tiling.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>
2022-05-30 14:00:55 +00:00
Alyssa Rosenzweig 569e5dc745 pan/bi: Schedule for pressure pre-RA
Add a bottom-up pre-RA list scheduler that aims to reduce register pressure,
roughly the same as we use on Midgard to great effect. It uses a simple
heuristic: greedily select instructions that have reduce liveness.  To avoid
regressions, the algorithm throws away schedules that increase maximum number of
lives (used as an estimate of register pressure -- if we had SSA form, this
would be exact).

We might be better off using Sarkar. But for something I could type out in an
afternoon, I'll happily accept a >50% reduction in spills. Instruction count is
regressed due to extra moves around the blend shader ABI in some cases, at least
on Bifrost this is mostly hidden by the clause scheduler. Thread count and
spills/fills are both much improved here.

There are numerous opportunities for future improvements to pre-RA scheduling:

* Better heuristics? (Something more global than liveness alone)
* Reducing false dependencies with memory access
* Improve ILP for message-passing instructions? This is a tradeoff.
* Simplify the code if we have SSA in the future.

But for now, I think this is well worth it already.

v2: Various clean-ups and memory leak fix (Icecream95). Reduce false
dependencies to eliminate spilling in more shaders.

shader-db stats on Mali-G52:

total instructions in shared programs: 2438841 -> 2439698 (0.04%)
instructions in affected programs: 1206421 -> 1207278 (0.07%)
helped: 3113
HURT: 4011
helped stats (abs) min: 1.0 max: 50.0 x̄: 3.25 x̃: 2
helped stats (rel) min: 0.13% max: 44.83% x̄: 4.09% x̃: 2.11%
HURT stats (abs)   min: 1.0 max: 18.0 x̄: 2.73 x̃: 2
HURT stats (rel)   min: 0.11% max: 57.14% x̄: 3.86% x̃: 2.07%
95% mean confidence interval for instructions value: 0.02 0.22
95% mean confidence interval for instructions %-change: 0.23% 0.54%
Instructions are HURT.

total tuples in shared programs: 1927077 -> 1946583 (1.01%)
tuples in affected programs: 1118627 -> 1138133 (1.74%)
helped: 2874
HURT: 6295
helped stats (abs) min: 1.0 max: 82.0 x̄: 3.51 x̃: 2
helped stats (rel) min: 0.17% max: 33.33% x̄: 4.60% x̃: 3.57%
HURT stats (abs)   min: 1.0 max: 47.0 x̄: 4.70 x̃: 3
HURT stats (rel)   min: 0.20% max: 50.00% x̄: 5.16% x̃: 4.32%
95% mean confidence interval for tuples value: 2.00 2.25
95% mean confidence interval for tuples %-change: 1.97% 2.23%
Tuples are HURT.

total clauses in shared programs: 356053 -> 357793 (0.49%)
clauses in affected programs: 151578 -> 153318 (1.15%)
helped: 2196
HURT: 3813
helped stats (abs) min: 1.0 max: 49.0 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.18% max: 69.01% x̄: 10.26% x̃: 8.33%
HURT stats (abs)   min: 1.0 max: 25.0 x̄: 1.70 x̃: 1
HURT stats (rel)   min: 0.57% max: 66.67% x̄: 10.64% x̃: 8.33%
95% mean confidence interval for clauses value: 0.22 0.36
95% mean confidence interval for clauses %-change: 2.68% 3.33%
Clauses are HURT.

total cycles in shared programs: 167761.17 -> 167922.04 (0.10%)
cycles in affected programs: 24494.21 -> 24655.08 (0.66%)
helped: 862
HURT: 3054
helped stats (abs) min: 0.041665999999999315 max: 53.0 x̄: 0.69 x̃: 0
helped stats (rel) min: 0.28% max: 76.81% x̄: 5.65% x̃: 3.03%
HURT stats (abs)   min: 0.041665999999999315 max: 2.0416659999999993 x̄: 0.25 x̃: 0
HURT stats (rel)   min: 0.26% max: 41.18% x̄: 4.91% x̃: 3.92%
95% mean confidence interval for cycles value: -0.04 0.12
95% mean confidence interval for cycles %-change: 2.36% 2.81%
Inconclusive result (value mean confidence interval includes 0).

total arith in shared programs: 73875.37 -> 74393.17 (0.70%)
arith in affected programs: 43142.42 -> 43660.21 (1.20%)
helped: 3632
HURT: 5443
helped stats (abs) min: 0.041665999999999315 max: 1.2083360000000027 x̄: 0.15 x̃: 0
helped stats (rel) min: 0.22% max: 100.00% x̄: 6.70% x̃: 4.76%
HURT stats (abs)   min: 0.041665999999999315 max: 2.0416659999999993 x̄: 0.19 x̃: 0
HURT stats (rel)   min: 0.00% max: 166.67% x̄: 5.91% x̃: 4.08%
95% mean confidence interval for arith value: 0.05 0.06
95% mean confidence interval for arith %-change: 0.65% 1.07%
Arith are HURT.

total texture in shared programs: 11936 -> 11936 (0.00%)
texture in affected programs: 0 -> 0
helped: 0
HURT: 0

total vary in shared programs: 4180.88 -> 4180.88 (0.00%)
vary in affected programs: 0 -> 0
helped: 0
HURT: 0

total ldst in shared programs: 137551 -> 137028 (-0.38%)
ldst in affected programs: 834 -> 311 (-62.71%)
helped: 13
HURT: 0
helped stats (abs) min: 15.0 max: 53.0 x̄: 40.23 x̃: 53
helped stats (rel) min: 19.15% max: 100.00% x̄: 68.11% x̃: 76.81%
95% mean confidence interval for ldst value: -50.49 -29.98
95% mean confidence interval for ldst %-change: -84.37% -51.84%
Ldst are helped.

total quadwords in shared programs: 1684883 -> 1692021 (0.42%)
quadwords in affected programs: 949463 -> 956601 (0.75%)
helped: 3981
HURT: 5098
helped stats (abs) min: 1.0 max: 86.0 x̄: 3.53 x̃: 3
helped stats (rel) min: 0.18% max: 33.33% x̄: 5.82% x̃: 4.48%
HURT stats (abs)   min: 1.0 max: 50.0 x̄: 4.15 x̃: 3
HURT stats (rel)   min: 0.17% max: 50.00% x̄: 5.11% x̃: 3.85%
95% mean confidence interval for quadwords value: 0.67 0.90
95% mean confidence interval for quadwords %-change: 0.17% 0.47%
Quadwords are HURT.

total threads in shared programs: 53276 -> 53653 (0.71%)
threads in affected programs: 581 -> 958 (64.89%)
helped: 445
HURT: 68
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for threads value: 0.68 0.79
95% mean confidence interval for threads %-change: 75.70% 84.53%
Threads are helped.

total preloads in shared programs: 116312 -> 116312 (0.00%)
preloads in affected programs: 0 -> 0
helped: 0
HURT: 0

total loops in shared programs: 128 -> 128 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 92 -> 37 (-59.78%)
spills in affected programs: 55 -> 0
helped: 13
HURT: 0

total fills in shared programs: 658 -> 190 (-71.12%)
fills in affected programs: 468 -> 0
helped: 13
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16378>
2022-05-25 14:40:12 +00:00
Icecream95 f1a226dd24 pan/bi: Read base for combined stores
Fixes depth/stencil writes with MRT.

Fixes: 996645e479 ("pan/bi: Don't read base for combined stores")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
2022-05-24 16:13:33 +00:00
Alyssa Rosenzweig 3df5446cbd pan/bi: Simplify register precolouring in the IR
In the current IR, any register may be preloaded by reading it anywhere, and any
register may be precoloured by writing it anywhere. This is convenient for
instruction selection, but requires the register allocator to do considerable
gymnastics to ensure it doesn't clobber precoloured registers. It also breaks
the purity of our SSA representation, which complicates optimization passes
(e.g. copyprop).

Let's trade some instruction selection complexity for simplifying register
allocation by constraining how register precolouring works. Under the new model:

* Registers may only be preloaded at the start of the program.
* Precoloured destinations are handled explicitly by RA.

Internally, a stronger invariant is placed for preloading: registers may only be
preloaded by MOV.i32 instructions at the beginning of the block, and these moves
must be unique. These invariants ensure RA can trivially coalesce the moves.

A bi_preload helper is added as a safe version of bi_register respecting these
invariants, allowing a smooth transition for instruction selection.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 5febeae58e pan/bi: Emit collect and split
..Rather than using offsets during instruction selection.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 4731e9e55a pan/bi: Simplfy BLEND emit
We don't need to collect anything, now that Valhall handles this case correctly.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 7bfaa119f4 pan/bi: Lift split/collect cache from AGX
Design based on ACO (and fruitful discussions with Daniel).

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 8fdb01b96f pan/bi: Create COLLECT during isel
This transitions us away from the fake SSA we currently use for vectors.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 9924e6f291 pan/bi: Fix mov and pack_32_2x16
Move can take in a vector and write a scalar, depending on the swizzle. We need
to handle this case. Split out mov and pack_32_2x16 so we can specify correct
behaviour for both. Also drop unused 1-bit boolean stuff which obscured the fix.

Fixes: 76cea8e27b ("panfrost: Fix pack_32_2x16 implementation")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 896dc63623 pan/bi: Lower phis to scalar
If we don't lower phis to scalar, when we go out of SSA, we can get vector
nir_registers. In particular, we can get code like:

   r0 = vec2 r0.y, r0.x

This code looks like a move, but is in fact a swap. The trivial lowering of vec2
would not work -- the following fails to swap correctly:

   r0.x = r0.y
   r0.y = r0.x

Currently, we generate temporaries to handle these cases. It's easy to move the
complexity to NIR, though, and we'll want to scalarize phis for SSA-based RA
anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig c387096eca pan/va: Use 64-bit lowering for texturing
Texture instructions on Valhall take 64-bit sources. Now that we have
infrastructure to handle this properly, we don't need to use a non-SSA node to
hack around the optimization.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Jason Ekstrand 36bb62139e bifrost: Run nir_lower_global_vars_to_local before nir_lower_vars_to_scratch
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16483>
2022-05-16 21:43:47 +00:00
Jason Ekstrand f0a47d8602 bifrost,midgard: Allow providing a fixed sysval layout
Vulkan doesn't need nearly as many system values and would like to bake
its layout up-front instead of having it provided by the back-end
compiler.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 4e60f0655a panfrost,panvk: Make fixed_sysval_ubo < 0 mean compiler-assigned
In 3559efb9bf ("panfrost: Allow passing an explicit UBO index for the
sysval UBO"), an explicit UBO index was added and it was implicitly
assumed that it would be > num_ubos.  This was convenient because it
meant 0, the default for designated initializers, implicitly meant
compiler-assigned.  However, we're about to move the sysval UBO to 0
which breaks this assumption.   Also, we don't want the back-end
compiler to even look at num_ubos since it's meaningless in Vulkan.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:15 +00:00
Alyssa Rosenzweig 5cfae66cde pan/bi: Ensure the end NOP isn't eliminated
Otherwise the lowering doesn't work.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig 6d41a28a40 pan/bi: Support atomics on Valhall
Atomics on Valhall work basically the same as on Bifrost, however the
instruction selection is simplified as there are no clauses. Support the
simplified set of atomic instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig 21900ec8b0 pan/bi: Handle shared/scratch on Valhall
There's no .seg modifier, so we have some easy lowering to do ourselves.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig b683a67328 pan/bi: Handle shared atomic exchange on Valhall
Need to lower the WLS into a segment addition, since the .seg modifier was
dropped on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig 20f92871d8 pan/bi: Support image loads on Valhall
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig e53f44a4b8 pan/bi: Emit LEA_TEX on Valhall
As opposed to LEA_ATTR_TEX. In principle we could do this for Bifrost too, but
let's keep the Midgard compatible path for now.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig 423773faa9 pan/bi: Don't analyze td on Valhall
The implementation is based on clauses, so it won't work on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>
2022-05-11 21:57:13 +00:00
Jason Ekstrand 3c07c3e16d shader_info: Make images_used a bitset
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15988>
2022-05-10 11:23:11 -05:00
Alyssa Rosenzweig 0fcddd4d2c pan/bi: Rework varying linking on Valhall
Valhall introduces hardware-allocated varyings. Instead of allocating varying
descriptors on the CPU with a slot based interface, the driver just tells the
hardware how many bytes to allocate per vertex and loads/stores with byte
offsets. This is much nicer!

However, this requires us to rework our linking code to account for separable
shaders. With separable shaders, we can't rely on driver_location matching
between stages, and unlike on Midgard, we can't resolve the differences with
curated command stream descriptors. However, we *can* rely on slots matching. So
we should "just" determine the byte offsets based on the slot, and then
separable shaders work.

For GLES, it really is that easy.

For desktop GL, it's not -- desktop GL brings unpredictable extra varyings like
COL1 and TEX2. Allocating space for all of these unconditionally would hamper
performance. To cope, we key fragment shaders to the set of non-GLES varyings
written by the linked vertex shader. Then we may define an efficient ABI, where
only apps only pay for what they use.

Fixes various tests in dEQP-GLES31.functional.separate_shader.random.* on
Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16310>
2022-05-04 13:07:59 +00:00
Alyssa Rosenzweig 6b6ace5199 pan/bi: Add option to test spilling
BIFROST_MESA_DEBUG=spill now restricts the register file to 1/4 its usual size,
useful for testing register spilling (e.g. running CTS) as well as debugging
spilling on small shaders.

Note blend shaders are exempt, as we don't allow blend shaders to spill.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16314>
2022-05-04 12:48:27 +00:00
Alyssa Rosenzweig 80f8e9da16 pan/bi: Use a dynarray for predecessors
This is deterministic, unlike a set. Note we need the extra dereferencing to
keep the macro safe, simple, and standards compliant:

1. Nesting two for-loops would cause break/continue to fail.
2. Declaring variables outside the loop would pollute the namespace.
3. Declaring an anonymous struct is not conformant and doesn't compile in clang.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>
2022-05-03 17:56:16 +00:00
Alyssa Rosenzweig d496fe153a pan/bi: Count blocks
For u_worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>
2022-05-03 17:56:16 +00:00
Alyssa Rosenzweig eb0001bf2b pan/bi: Rename bi_block->name to bi_block->index
This is consistent with nir_block and (IMO) less confusing.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>
2022-05-03 17:56:16 +00:00
Alyssa Rosenzweig 54412afadc pan/bi: Handle texture offset + index
Fixes dEQP-VK.glsl.opaque_type_indexing.sampler.uniform.vertex.sampler1d

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16283>
2022-05-02 20:28:48 +00:00
Emma Anholt 536c8ee96d nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional.
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.

All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
  shadow2darray with NIR, since the shading languages don't expose txl of
  those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
2022-04-28 21:26:08 +00:00
Icecream95 76cea8e27b panfrost: Fix pack_32_2x16 implementation
Fixes: 6f0eff548c ("pan/bi: Implement packing ops between 32-bit vec1 and 16-bit vec2")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16181>
2022-04-27 15:30:09 +00:00
Alyssa Rosenzweig 2ca8b014d1 pan/bi: Implement pack_uvec[24]_to_uint
This maps nicely to Mali's weirdo MKVEC, so implement it rather than
scalarizing. The scalarization wants an extract implemented which we don't have.
Fixes dEQP-VK.glsl.builtin.function.pack_unpack.*

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16120>
2022-04-26 00:18:19 +00:00
Alyssa Rosenzweig c9b33fe7dc pan/bi: Implement fquantize2f16
Implement as f2f32(f2f16(x)) with the conversions in flush-to-zero mode.
Accessing flush-to-zero mode on Bifrost is nontrivial: it is specified
per-clause, rather than per-instruction. I've opted to pipe support for ftz
clauses through the scheduler. This solution has two nice properties:

* It uses the native hardware for flushing subnormals, avoiding extra lowering.
* It's "smart" about scheduling around FTZ requirements, meaning we get good
code generated even for a shader that e.g. quantizes a vector.

With an unrelated scheduler fix, the *V2F32_TO_V2F16/+F16_TO_F32 operation fits
in a single tuple, minimizing the overhead of the special FTZ clause.

We'll have to do something a bit different for Valhall (FLUSH.f32), but we'll
worry about when we actually have PanVK brought up on Valhall.

Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.*

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16123>
2022-04-25 16:29:31 +00:00
Alyssa Rosenzweig 1fb4427a7a pan/bi: Imply round mode most of the time
Much less noisy, and provides a path to further improvements. There is a slight
behaviour change: int-to-float conversions now use RTE instead of RTZ. For
32-bit opcodes, this affects conversions of integers with magnitude greater than
2^23 by at most 1 ulp. As this behaviour is unspecified in GLSL, this change is
believed to be acceptable.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>
2022-04-07 18:03:57 +00:00
Alyssa Rosenzweig 6e69c3369c pan/bi: Don't lower vertex_id for malloc IDVS
Based on hardware behaviour, it appears vertex_id is zero-based with the legacy
geometry flow but not with the new malloc IDVS flow. Since the geometry flow is
per-shader (not per-machine), there's not a good way to communicate this to NIR.
Rather than trying to shoehorn this obscure detail into NIR, just do the
lowering ourselves instead of in NIR. It's not much more code anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
2022-04-07 14:20:45 +00:00
Alyssa Rosenzweig ccdec68aee pan/bi: Report whether workgroups can be merged
This flag gates a Valhall hardware optimization for compute shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>
2022-04-07 14:20:45 +00:00