KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Alyssa Rosenzweig	90beea75f6	pan/bi: Don't reorder push with no_ubo_to_push Otherwise, load_push_constant won't work properly. This could probably be made to work if we tried hard enough, but we still don't want reordering for internal (meta) shaders which are layed out deliberately. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	17ea1642e2	pan/bi: Implement load_push_constant Bifrost supports "fast access uniforms" loaded from a single contiguous buffer. This maps directly to Vulkan push constants, with some caveats: * No indirect access. Indirects need to be lowered to a UBO pull. * Strict alignment requirements. These will be met in practice. Implement the NIR intrinsic and map it to the native hardware construct. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>	2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig	e57dfed419	pan/bi: Implement b2i with MUX The result_type modifier propagation looks for MUX instructions, so using this canonical b2i implementation allows the sequence b2i(cmp) to be fused. It's also faster on its own: on Valhall, MUX may be implemented as CSEL on the CVT unit, while AND may only be implemented on the SFU unit. So in case this doesn't get fused, we expect 4x better throughput for b2i with this implementation. Similarly, on Bifrost, MUX may be scheduled to either unit (as CSEL on FMA or MUX on ADD), whereas AND may only be scheduled to FMA. Results on Mali-G52: total instructions in shared programs: 2419171 -> 2414814 (-0.18%) instructions in affected programs: 272203 -> 267846 (-1.60%) helped: 767 HURT: 0 helped stats (abs) min: 1.0 max: 138.0 x̄: 5.68 x̃: 2 helped stats (rel) min: 0.12% max: 15.57% x̄: 2.09% x̃: 0.68% 95% mean confidence interval for instructions value: -6.68 -4.68 95% mean confidence interval for instructions %-change: -2.37% -1.82% Instructions are helped. total tuples in shared programs: 1932822 -> 1929234 (-0.19%) tuples in affected programs: 76485 -> 72897 (-4.69%) helped: 380 HURT: 3 helped stats (abs) min: 1.0 max: 138.0 x̄: 9.46 x̃: 1 helped stats (rel) min: 0.14% max: 15.96% x̄: 3.81% x̃: 0.92% HURT stats (abs) min: 1.0 max: 6.0 x̄: 2.67 x̃: 1 HURT stats (rel) min: 0.38% max: 8.57% x̄: 3.80% x̃: 2.44% 95% mean confidence interval for tuples value: -11.30 -7.44 95% mean confidence interval for tuples %-change: -4.27% -3.22% Tuples are helped. total clauses in shared programs: 356094 -> 355992 (-0.03%) clauses in affected programs: 3264 -> 3162 (-3.12%) helped: 80 HURT: 0 helped stats (abs) min: 1.0 max: 9.0 x̄: 1.27 x̃: 1 helped stats (rel) min: 0.81% max: 50.00% x̄: 4.83% x̃: 3.39% 95% mean confidence interval for clauses value: -1.49 -1.06 95% mean confidence interval for clauses %-change: -6.23% -3.43% Clauses are helped. total cycles in shared programs: 167337.10 -> 167329.19 (<.01%) cycles in affected programs: 510.08 -> 502.17 (-1.55%) helped: 80 HURT: 2 helped stats (abs) min: 0.041665999999999315 max: 0.7916659999999993 x̄: 0.10 x̃: 0 helped stats (rel) min: 0.51% max: 13.64% x̄: 2.12% x̃: 1.34% HURT stats (abs) min: 0.041665999999999315 max: 0.0416669999999999 x̄: 0.04 x̃: 0 HURT stats (rel) min: 0.39% max: 2.78% x̄: 1.58% x̃: 1.58% 95% mean confidence interval for cycles value: -0.12 -0.07 95% mean confidence interval for cycles %-change: -2.59% -1.48% Cycles are helped. total arith in shared programs: 73819.54 -> 73669.25 (-0.20%) arith in affected programs: 2840.54 -> 2690.25 (-5.29%) helped: 383 HURT: 3 helped stats (abs) min: 0.041665999999999315 max: 5.75 x̄: 0.39 x̃: 0 helped stats (rel) min: 0.33% max: 18.81% x̄: 4.39% x̃: 0.98% HURT stats (abs) min: 0.041665999999999315 max: 0.25 x̄: 0.11 x̃: 0 HURT stats (rel) min: 0.39% max: 8.96% x̄: 4.04% x̃: 2.78% 95% mean confidence interval for arith value: -0.47 -0.31 95% mean confidence interval for arith %-change: -4.93% -3.71% Arith are helped. total quadwords in shared programs: 1679798 -> 1676259 (-0.21%) quadwords in affected programs: 72826 -> 69287 (-4.86%) helped: 381 HURT: 15 helped stats (abs) min: 1.0 max: 142.0 x̄: 9.35 x̃: 1 helped stats (rel) min: 0.25% max: 18.87% x̄: 4.33% x̃: 1.13% HURT stats (abs) min: 1.0 max: 6.0 x̄: 1.47 x̃: 1 HURT stats (rel) min: 0.30% max: 6.25% x̄: 0.77% x̃: 0.35% 95% mean confidence interval for quadwords value: -10.76 -7.11 95% mean confidence interval for quadwords %-change: -4.71% -3.56% Quadwords are helped. Results on Mali-G57: total instructions in shared programs: 2704193 -> 2699317 (-0.18%) instructions in affected programs: 293366 -> 288490 (-1.66%) helped: 758 HURT: 5 helped stats (abs) min: 1.0 max: 151.0 x̄: 6.45 x̃: 2 helped stats (rel) min: 0.11% max: 22.22% x̄: 2.05% x̃: 0.64% HURT stats (abs) min: 1.0 max: 7.0 x̄: 2.20 x̃: 1 HURT stats (rel) min: 0.22% max: 1.69% x̄: 0.87% x̃: 1.08% 95% mean confidence interval for instructions value: -7.42 -5.36 95% mean confidence interval for instructions %-change: -2.27% -1.79% Instructions are helped. total cycles in shared programs: 141711.73 -> 141711.84 (<.01%) cycles in affected programs: 214.36 -> 214.47 (0.05%) helped: 4 HURT: 42 helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.20 x̃: 0 helped stats (rel) min: 1.85% max: 12.78% x̄: 9.12% x̃: 10.93% HURT stats (abs) min: 0.015625 max: 0.09375 x̄: 0.02 x̃: 0 HURT stats (rel) min: 0.17% max: 17.65% x̄: 0.84% x̃: 0.34% 95% mean confidence interval for cycles value: -0.02 0.03 95% mean confidence interval for cycles %-change: -1.23% 1.17% Inconclusive result (value mean confidence interval includes 0). total cvt in shared programs: 14479.14 -> 14474.19 (-0.03%) cvt in affected programs: 2877.05 -> 2872.09 (-0.17%) helped: 508 HURT: 209 helped stats (abs) min: 0.015625 max: 0.453125 x̄: 0.02 x̃: 0 helped stats (rel) min: 0.25% max: 16.67% x̄: 1.23% x̃: 0.37% HURT stats (abs) min: 0.015625 max: 0.296875 x̄: 0.03 x̃: 0 HURT stats (rel) min: 0.15% max: 18.18% x̄: 1.70% x̃: 0.34% 95% mean confidence interval for cvt value: -0.01 -0.00 95% mean confidence interval for cvt %-change: -0.57% -0.18% Cvt are helped. total sfu in shared programs: 7875.69 -> 7590.75 (-3.62%) sfu in affected programs: 1567.38 -> 1282.44 (-18.18%) helped: 906 HURT: 0 helped stats (abs) min: 0.0625 max: 8.625 x̄: 0.31 x̃: 0 helped stats (rel) min: 2.38% max: 100.00% x̄: 16.80% x̃: 5.63% 95% mean confidence interval for sfu value: -0.37 -0.26 95% mean confidence interval for sfu %-change: -18.43% -15.17% Sfu are helped. total quadwords in shared programs: 1468152 -> 1465800 (-0.16%) quadwords in affected programs: 37104 -> 34752 (-6.34%) helped: 161 HURT: 2 helped stats (abs) min: 8.0 max: 80.0 x̄: 14.71 x̃: 8 helped stats (rel) min: 1.67% max: 20.00% x̄: 8.05% x̃: 7.69% HURT stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8 HURT stats (rel) min: 3.57% max: 3.85% x̄: 3.71% x̃: 3.71% 95% mean confidence interval for quadwords value: -16.29 -12.57 95% mean confidence interval for quadwords %-change: -8.58% -7.22% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>	2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig	3b3cd59fb8	panfrost: Launch transform feedback shaders We now have infrastructure in place to generate variants of vertex shaders specialized for transform feedback. All that's left is launching these compute-like kernels before the IDVS job, implementing both the transform feedback and the regular rasterization pipeline. This implements transform feedback on Valhall, passing the relevant GLES3.1 tests. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	4e341e70d8	pan/bi: Handle transform feedback intrinsics Translate the intrinsics we introduced to lower away transform feedback into Panfrost system values which the GL driver can handle. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	ae3fa6cc1d	pan/bi: Add transform feedback lowering pass Add a simple NIR-based implementation of transform feedback, appropriate for OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders). Stores to varyings that will be captured are replaced by stores to transform feedback buffers and some addressing math. This allows implementing the semantic of transform feedback in a compute-like stage. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>	2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig	7535362204	pan/bi: Fix clper_xor on Mali-G31 Mali-G31 has the old CLPER instruction, not the new one, which means we don't get to specify a custom lane op. But the clper_xor helper incorrectly checked the arch, not the implementation quirk. Fixes: `c00e7b729f` ("pan/bi: Optimize abs(derivative)") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reported-by: Icecream95 <ixn@disroot.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16846>	2022-06-02 20:32:43 -04:00
Alyssa Rosenzweig	bc4d42023d	pan/bi: Respect swizzles in nir_op_pack_64_2x32_split Triggered a BIR validation error, which made debugging a breeze. That validation pass (dimensionality checks) gets a lot of use, it seems :-) Fixes: dEQP-VK.ssbo.layout.2_level_array.std430.row_major_mat4x2_comp_access_store_cols Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>	2022-06-01 20:08:42 +00:00
Alyssa Rosenzweig	5067a26f44	pan/bi: Use flow control lowering on Valhall Logically at the same part of the compile pipeline as clause scheduling on Bifrost. Lots of similarities, too. Now that we generate flow control only as a late pass, various hacks in the compiler are no longer necessary and are dropped. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>	2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig	c0180f6bd3	pan/bi: Export helper termination analysis The current helper termination analysis code is hardwired for clauses, so it won't work for Valhall. However, the bulk of it is dataflow analysis which is portable between Bifrost and Valhall. Export the interesting bits so we can reuse them on Valhall. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>	2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig	7bb635316b	pan/bi: Export bi_block_add_successor For use in unit tests that need to create blocks. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>	2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig	4627cd99de	pan/bi: Preserve flow control for non-psiz variant Otherwise we will get INSTR_INVALID_ENC faults when deleting the final STORE.end instruction, after we rework our flow control code. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>	2022-06-01 16:14:38 +00:00
Daniel Schürmann	bd151a256e	nir/opt_vectorize: add callback for max vectorization width The callback allows to request different vectorization factors per instruction depending on e.g. bitsize or opcode. This patch also removes using the vectorize_vec2_16bit option from nir_opt_vectorize(). Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>	2022-06-01 11:41:44 +00:00
Alyssa Rosenzweig	0170500627	pan/bi: Interpolate varyings at 16-bit On Bifrost, we have a single "load float varying" instruction that controls the bit size of the result, allowing us to fold a f2f16 into the load. However, the larger benefit is that 16-bit varying loads are interpolated at 16-bit. Arm claims that the varying unit has a 32-bit data path, allowing 16-bit varyings to be interpolated in half the cycles from 32-bit. This change should therefore improve performance for workloads that are varying units. This means we want to be aggressive about 16-bit varying loads, even if it costs some extra f2f32 instructions. glmark2 total score on Mali-G52 up from 1173fps to 1218fps with particular wins in -brefract, -bshadow, -bjellyfish, and -bshading. total instructions in shared programs: 2432246 -> 2423668 (-0.35%) instructions in affected programs: 516056 -> 507478 (-1.66%) helped: 3641 HURT: 432 helped stats (abs) min: 1.0 max: 12.0 x̄: 2.91 x̃: 2 helped stats (rel) min: 0.08% max: 54.55% x̄: 9.88% x̃: 5.71% HURT stats (abs) min: 1.0 max: 42.0 x̄: 4.71 x̃: 4 HURT stats (rel) min: 0.23% max: 200.00% x̄: 12.58% x̃: 6.37% 95% mean confidence interval for instructions value: -2.21 -2.00 95% mean confidence interval for instructions %-change: -7.92% -7.07% Instructions are helped. total tuples in shared programs: 1941309 -> 1934647 (-0.34%) tuples in affected programs: 353169 -> 346507 (-1.89%) helped: 3233 HURT: 453 helped stats (abs) min: 1.0 max: 14.0 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.12% max: 50.00% x̄: 9.90% x̃: 5.56% HURT stats (abs) min: 1.0 max: 25.0 x̄: 2.85 x̃: 2 HURT stats (rel) min: 0.22% max: 150.00% x̄: 8.96% x̃: 5.26% 95% mean confidence interval for tuples value: -1.89 -1.72 95% mean confidence interval for tuples %-change: -8.01% -7.15% Tuples are helped. total clauses in shared programs: 357354 -> 356610 (-0.21%) clauses in affected programs: 25794 -> 25050 (-2.88%) helped: 994 HURT: 317 helped stats (abs) min: 1.0 max: 3.0 x̄: 1.16 x̃: 1 helped stats (rel) min: 1.49% max: 33.33% x̄: 10.78% x̃: 10.00% HURT stats (abs) min: 1.0 max: 4.0 x̄: 1.31 x̃: 1 HURT stats (rel) min: 1.19% max: 50.00% x̄: 13.56% x̃: 8.33% 95% mean confidence interval for clauses value: -0.63 -0.50 95% mean confidence interval for clauses %-change: -5.63% -4.16% Clauses are helped. total cycles in shared programs: 167697.96 -> 167431.15 (-0.16%) cycles in affected programs: 12638.29 -> 12371.48 (-2.11%) helped: 2652 HURT: 350 helped stats (abs) min: 0.04166399999999726 max: 0.75 x̄: 0.11 x̃: 0 helped stats (rel) min: 0.12% max: 100.00% x̄: 14.39% x̃: 5.04% HURT stats (abs) min: 0.041665999999999315 max: 0.5833329999999997 x̄: 0.11 x̃: 0 HURT stats (rel) min: 0.00% max: 75.00% x̄: 7.90% x̃: 4.71% 95% mean confidence interval for cycles value: -0.09 -0.08 95% mean confidence interval for cycles %-change: -12.56% -11.02% Cycles are helped. total arith in shared programs: 74169.46 -> 73891.71 (-0.37%) arith in affected programs: 13885.87 -> 13608.12 (-2.00%) helped: 3215 HURT: 445 helped stats (abs) min: 0.04166399999999726 max: 0.5416680000000014 x̄: 0.10 x̃: 0 helped stats (rel) min: 0.12% max: 100.00% x̄: 14.16% x̃: 6.67% HURT stats (abs) min: 0.041665999999999315 max: 1.125 x̄: 0.12 x̃: 0 HURT stats (rel) min: 0.00% max: 100.00% x̄: 9.76% x̃: 5.49% 95% mean confidence interval for arith value: -0.08 -0.07 95% mean confidence interval for arith %-change: -11.91% -10.59% Arith are helped. total texture in shared programs: 11936 -> 11931 (-0.04%) texture in affected programs: 20 -> 15 (-25.00%) helped: 10 HURT: 0 helped stats (abs) min: 0.5 max: 0.5 x̄: 0.50 x̃: 0 helped stats (rel) min: 14.29% max: 100.00% x̄: 45.71% x̃: 33.33% 95% mean confidence interval for texture value: -0.50 -0.50 95% mean confidence interval for texture %-change: -73.16% -18.26% Texture are helped. total vary in shared programs: 4180.88 -> 3447.19 (-17.55%) vary in affected programs: 2109.88 -> 1376.19 (-34.77%) helped: 2202 HURT: 39 helped stats (abs) min: 0.0625 max: 1.4375 x̄: 0.34 x̃: 0 helped stats (rel) min: 2.38% max: 66.67% x̄: 40.43% x̃: 50.00% HURT stats (abs) min: 0.125 max: 0.375 x̄: 0.26 x̃: 0 HURT stats (rel) min: 0.00% max: 300.00% x̄: 92.54% x̃: 23.08% 95% mean confidence interval for vary value: -0.34 -0.32 95% mean confidence interval for vary %-change: -39.22% -37.01% Vary are helped. total quadwords in shared programs: 1689664 -> 1684852 (-0.28%) quadwords in affected programs: 265522 -> 260710 (-1.81%) helped: 2864 HURT: 447 helped stats (abs) min: 1.0 max: 14.0 x̄: 2.10 x̃: 2 helped stats (rel) min: 0.15% max: 31.58% x̄: 6.05% x̃: 4.65% HURT stats (abs) min: 1.0 max: 22.0 x̄: 2.67 x̃: 2 HURT stats (rel) min: 0.27% max: 38.46% x̄: 6.79% x̃: 4.55% 95% mean confidence interval for quadwords value: -1.54 -1.37 95% mean confidence interval for quadwords %-change: -4.55% -4.08% Quadwords are helped. total threads in shared programs: 53656 -> 53688 (0.06%) threads in affected programs: 32 -> 64 (100.00%) helped: 32 HURT: 0 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for threads value: 1.00 1.00 95% mean confidence interval for threads %-change: 100.00% 100.00% Threads are helped. total preloads in shared programs: 116212 -> 103476 (-10.96%) preloads in affected programs: 45222 -> 32486 (-28.16%) helped: 3022 HURT: 11 helped stats (abs) min: 1.0 max: 11.0 x̄: 4.23 x̃: 4 helped stats (rel) min: 7.14% max: 68.75% x̄: 30.39% x̃: 25.00% HURT stats (abs) min: 2.0 max: 4.0 x̄: 3.45 x̃: 4 HURT stats (rel) min: 14.29% max: 50.00% x̄: 25.93% x̃: 25.00% 95% mean confidence interval for preloads value: -4.26 -4.14 95% mean confidence interval for preloads %-change: -30.68% -29.69% Preloads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Tested-by: Chris Healy cphealy@gmail.com Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>	2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig	93f69e4b1c	pan/bi: Model Valhall source formats LD_VAR_BUF instructions on Valhall take a source format, indicating the in-memory format of the varying independent from the register format, which we still model within the compiler for compatibility with Bifrost. (Prior to Valhall, source format is specified in the attribute descriptor as a physical pixel format.) Model this information, allowing us to generate fp16 LD_VAR_BUF instructions correctly on Valhall. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>	2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig	67f5721349	panfrost: Set allow_rotating_primitives On Valhall, the driver should set this flag if the hardware may rotate primitives. This happens if: 1. The rasterization of lines does not matter, AND 2. The provoking vertex does not matter. The first condition we may satisfy by checking for LINES and the second by checking for flat shading. Otherwise, we should set this flag to allow optimizations. This may be more efficient for tiling. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>	2022-05-30 14:00:55 +00:00
Alyssa Rosenzweig	569e5dc745	pan/bi: Schedule for pressure pre-RA Add a bottom-up pre-RA list scheduler that aims to reduce register pressure, roughly the same as we use on Midgard to great effect. It uses a simple heuristic: greedily select instructions that have reduce liveness. To avoid regressions, the algorithm throws away schedules that increase maximum number of lives (used as an estimate of register pressure -- if we had SSA form, this would be exact). We might be better off using Sarkar. But for something I could type out in an afternoon, I'll happily accept a >50% reduction in spills. Instruction count is regressed due to extra moves around the blend shader ABI in some cases, at least on Bifrost this is mostly hidden by the clause scheduler. Thread count and spills/fills are both much improved here. There are numerous opportunities for future improvements to pre-RA scheduling: * Better heuristics? (Something more global than liveness alone) * Reducing false dependencies with memory access * Improve ILP for message-passing instructions? This is a tradeoff. * Simplify the code if we have SSA in the future. But for now, I think this is well worth it already. v2: Various clean-ups and memory leak fix (Icecream95). Reduce false dependencies to eliminate spilling in more shaders. shader-db stats on Mali-G52: total instructions in shared programs: 2438841 -> 2439698 (0.04%) instructions in affected programs: 1206421 -> 1207278 (0.07%) helped: 3113 HURT: 4011 helped stats (abs) min: 1.0 max: 50.0 x̄: 3.25 x̃: 2 helped stats (rel) min: 0.13% max: 44.83% x̄: 4.09% x̃: 2.11% HURT stats (abs) min: 1.0 max: 18.0 x̄: 2.73 x̃: 2 HURT stats (rel) min: 0.11% max: 57.14% x̄: 3.86% x̃: 2.07% 95% mean confidence interval for instructions value: 0.02 0.22 95% mean confidence interval for instructions %-change: 0.23% 0.54% Instructions are HURT. total tuples in shared programs: 1927077 -> 1946583 (1.01%) tuples in affected programs: 1118627 -> 1138133 (1.74%) helped: 2874 HURT: 6295 helped stats (abs) min: 1.0 max: 82.0 x̄: 3.51 x̃: 2 helped stats (rel) min: 0.17% max: 33.33% x̄: 4.60% x̃: 3.57% HURT stats (abs) min: 1.0 max: 47.0 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.20% max: 50.00% x̄: 5.16% x̃: 4.32% 95% mean confidence interval for tuples value: 2.00 2.25 95% mean confidence interval for tuples %-change: 1.97% 2.23% Tuples are HURT. total clauses in shared programs: 356053 -> 357793 (0.49%) clauses in affected programs: 151578 -> 153318 (1.15%) helped: 2196 HURT: 3813 helped stats (abs) min: 1.0 max: 49.0 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.18% max: 69.01% x̄: 10.26% x̃: 8.33% HURT stats (abs) min: 1.0 max: 25.0 x̄: 1.70 x̃: 1 HURT stats (rel) min: 0.57% max: 66.67% x̄: 10.64% x̃: 8.33% 95% mean confidence interval for clauses value: 0.22 0.36 95% mean confidence interval for clauses %-change: 2.68% 3.33% Clauses are HURT. total cycles in shared programs: 167761.17 -> 167922.04 (0.10%) cycles in affected programs: 24494.21 -> 24655.08 (0.66%) helped: 862 HURT: 3054 helped stats (abs) min: 0.041665999999999315 max: 53.0 x̄: 0.69 x̃: 0 helped stats (rel) min: 0.28% max: 76.81% x̄: 5.65% x̃: 3.03% HURT stats (abs) min: 0.041665999999999315 max: 2.0416659999999993 x̄: 0.25 x̃: 0 HURT stats (rel) min: 0.26% max: 41.18% x̄: 4.91% x̃: 3.92% 95% mean confidence interval for cycles value: -0.04 0.12 95% mean confidence interval for cycles %-change: 2.36% 2.81% Inconclusive result (value mean confidence interval includes 0). total arith in shared programs: 73875.37 -> 74393.17 (0.70%) arith in affected programs: 43142.42 -> 43660.21 (1.20%) helped: 3632 HURT: 5443 helped stats (abs) min: 0.041665999999999315 max: 1.2083360000000027 x̄: 0.15 x̃: 0 helped stats (rel) min: 0.22% max: 100.00% x̄: 6.70% x̃: 4.76% HURT stats (abs) min: 0.041665999999999315 max: 2.0416659999999993 x̄: 0.19 x̃: 0 HURT stats (rel) min: 0.00% max: 166.67% x̄: 5.91% x̃: 4.08% 95% mean confidence interval for arith value: 0.05 0.06 95% mean confidence interval for arith %-change: 0.65% 1.07% Arith are HURT. total texture in shared programs: 11936 -> 11936 (0.00%) texture in affected programs: 0 -> 0 helped: 0 HURT: 0 total vary in shared programs: 4180.88 -> 4180.88 (0.00%) vary in affected programs: 0 -> 0 helped: 0 HURT: 0 total ldst in shared programs: 137551 -> 137028 (-0.38%) ldst in affected programs: 834 -> 311 (-62.71%) helped: 13 HURT: 0 helped stats (abs) min: 15.0 max: 53.0 x̄: 40.23 x̃: 53 helped stats (rel) min: 19.15% max: 100.00% x̄: 68.11% x̃: 76.81% 95% mean confidence interval for ldst value: -50.49 -29.98 95% mean confidence interval for ldst %-change: -84.37% -51.84% Ldst are helped. total quadwords in shared programs: 1684883 -> 1692021 (0.42%) quadwords in affected programs: 949463 -> 956601 (0.75%) helped: 3981 HURT: 5098 helped stats (abs) min: 1.0 max: 86.0 x̄: 3.53 x̃: 3 helped stats (rel) min: 0.18% max: 33.33% x̄: 5.82% x̃: 4.48% HURT stats (abs) min: 1.0 max: 50.0 x̄: 4.15 x̃: 3 HURT stats (rel) min: 0.17% max: 50.00% x̄: 5.11% x̃: 3.85% 95% mean confidence interval for quadwords value: 0.67 0.90 95% mean confidence interval for quadwords %-change: 0.17% 0.47% Quadwords are HURT. total threads in shared programs: 53276 -> 53653 (0.71%) threads in affected programs: 581 -> 958 (64.89%) helped: 445 HURT: 68 helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 HURT stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00% 95% mean confidence interval for threads value: 0.68 0.79 95% mean confidence interval for threads %-change: 75.70% 84.53% Threads are helped. total preloads in shared programs: 116312 -> 116312 (0.00%) preloads in affected programs: 0 -> 0 helped: 0 HURT: 0 total loops in shared programs: 128 -> 128 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 92 -> 37 (-59.78%) spills in affected programs: 55 -> 0 helped: 13 HURT: 0 total fills in shared programs: 658 -> 190 (-71.12%) fills in affected programs: 468 -> 0 helped: 13 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16378>	2022-05-25 14:40:12 +00:00
Icecream95	f1a226dd24	pan/bi: Read base for combined stores Fixes depth/stencil writes with MRT. Fixes: `996645e479` ("pan/bi: Don't read base for combined stores") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>	2022-05-24 16:13:33 +00:00
Alyssa Rosenzweig	3df5446cbd	pan/bi: Simplify register precolouring in the IR In the current IR, any register may be preloaded by reading it anywhere, and any register may be precoloured by writing it anywhere. This is convenient for instruction selection, but requires the register allocator to do considerable gymnastics to ensure it doesn't clobber precoloured registers. It also breaks the purity of our SSA representation, which complicates optimization passes (e.g. copyprop). Let's trade some instruction selection complexity for simplifying register allocation by constraining how register precolouring works. Under the new model: * Registers may only be preloaded at the start of the program. * Precoloured destinations are handled explicitly by RA. Internally, a stronger invariant is placed for preloading: registers may only be preloaded by MOV.i32 instructions at the beginning of the block, and these moves must be unique. These invariants ensure RA can trivially coalesce the moves. A bi_preload helper is added as a safe version of bi_register respecting these invariants, allowing a smooth transition for instruction selection. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	5febeae58e	pan/bi: Emit collect and split ..Rather than using offsets during instruction selection. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	4731e9e55a	pan/bi: Simplfy BLEND emit We don't need to collect anything, now that Valhall handles this case correctly. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	7bfaa119f4	pan/bi: Lift split/collect cache from AGX Design based on ACO (and fruitful discussions with Daniel). Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	8fdb01b96f	pan/bi: Create COLLECT during isel This transitions us away from the fake SSA we currently use for vectors. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	9924e6f291	pan/bi: Fix mov and pack_32_2x16 Move can take in a vector and write a scalar, depending on the swizzle. We need to handle this case. Split out mov and pack_32_2x16 so we can specify correct behaviour for both. Also drop unused 1-bit boolean stuff which obscured the fix. Fixes: `76cea8e27b` ("panfrost: Fix pack_32_2x16 implementation") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	896dc63623	pan/bi: Lower phis to scalar If we don't lower phis to scalar, when we go out of SSA, we can get vector nir_registers. In particular, we can get code like: r0 = vec2 r0.y, r0.x This code looks like a move, but is in fact a swap. The trivial lowering of vec2 would not work -- the following fails to swap correctly: r0.x = r0.y r0.y = r0.x Currently, we generate temporaries to handle these cases. It's easy to move the complexity to NIR, though, and we'll want to scalarize phis for SSA-based RA anyway. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig	c387096eca	pan/va: Use 64-bit lowering for texturing Texture instructions on Valhall take 64-bit sources. Now that we have infrastructure to handle this properly, we don't need to use a non-SSA node to hack around the optimization. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>	2022-05-19 16:08:26 +00:00
Jason Ekstrand	36bb62139e	bifrost: Run nir_lower_global_vars_to_local before nir_lower_vars_to_scratch Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16483>	2022-05-16 21:43:47 +00:00
Jason Ekstrand	f0a47d8602	bifrost,midgard: Allow providing a fixed sysval layout Vulkan doesn't need nearly as many system values and would like to bake its layout up-front instead of having it provided by the back-end compiler. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>	2022-05-12 10:53:16 +00:00
Jason Ekstrand	4e60f0655a	panfrost,panvk: Make fixed_sysval_ubo < 0 mean compiler-assigned In `3559efb9bf` ("panfrost: Allow passing an explicit UBO index for the sysval UBO"), an explicit UBO index was added and it was implicitly assumed that it would be > num_ubos. This was convenient because it meant 0, the default for designated initializers, implicitly meant compiler-assigned. However, we're about to move the sysval UBO to 0 which breaks this assumption. Also, we don't want the back-end compiler to even look at num_ubos since it's meaningless in Vulkan. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>	2022-05-12 10:53:15 +00:00
Alyssa Rosenzweig	5cfae66cde	pan/bi: Ensure the end NOP isn't eliminated Otherwise the lowering doesn't work. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig	6d41a28a40	pan/bi: Support atomics on Valhall Atomics on Valhall work basically the same as on Bifrost, however the instruction selection is simplified as there are no clauses. Support the simplified set of atomic instructions. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig	21900ec8b0	pan/bi: Handle shared/scratch on Valhall There's no .seg modifier, so we have some easy lowering to do ourselves. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig	b683a67328	pan/bi: Handle shared atomic exchange on Valhall Need to lower the WLS into a segment addition, since the .seg modifier was dropped on Valhall. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig	20f92871d8	pan/bi: Support image loads on Valhall Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig	e53f44a4b8	pan/bi: Emit LEA_TEX on Valhall As opposed to LEA_ATTR_TEX. In principle we could do this for Bifrost too, but let's keep the Midgard compatible path for now. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Alyssa Rosenzweig	423773faa9	pan/bi: Don't analyze td on Valhall The implementation is based on clauses, so it won't work on Valhall. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16410>	2022-05-11 21:57:13 +00:00
Jason Ekstrand	3c07c3e16d	shader_info: Make images_used a bitset Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15988>	2022-05-10 11:23:11 -05:00
Alyssa Rosenzweig	0fcddd4d2c	pan/bi: Rework varying linking on Valhall Valhall introduces hardware-allocated varyings. Instead of allocating varying descriptors on the CPU with a slot based interface, the driver just tells the hardware how many bytes to allocate per vertex and loads/stores with byte offsets. This is much nicer! However, this requires us to rework our linking code to account for separable shaders. With separable shaders, we can't rely on driver_location matching between stages, and unlike on Midgard, we can't resolve the differences with curated command stream descriptors. However, we can rely on slots matching. So we should "just" determine the byte offsets based on the slot, and then separable shaders work. For GLES, it really is that easy. For desktop GL, it's not -- desktop GL brings unpredictable extra varyings like COL1 and TEX2. Allocating space for all of these unconditionally would hamper performance. To cope, we key fragment shaders to the set of non-GLES varyings written by the linked vertex shader. Then we may define an efficient ABI, where only apps only pay for what they use. Fixes various tests in dEQP-GLES31.functional.separate_shader.random.* on Valhall. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16310>	2022-05-04 13:07:59 +00:00
Alyssa Rosenzweig	6b6ace5199	pan/bi: Add option to test spilling BIFROST_MESA_DEBUG=spill now restricts the register file to 1/4 its usual size, useful for testing register spilling (e.g. running CTS) as well as debugging spilling on small shaders. Note blend shaders are exempt, as we don't allow blend shaders to spill. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16314>	2022-05-04 12:48:27 +00:00
Alyssa Rosenzweig	80f8e9da16	pan/bi: Use a dynarray for predecessors This is deterministic, unlike a set. Note we need the extra dereferencing to keep the macro safe, simple, and standards compliant: 1. Nesting two for-loops would cause break/continue to fail. 2. Declaring variables outside the loop would pollute the namespace. 3. Declaring an anonymous struct is not conformant and doesn't compile in clang. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>	2022-05-03 17:56:16 +00:00
Alyssa Rosenzweig	d496fe153a	pan/bi: Count blocks For u_worklist. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>	2022-05-03 17:56:16 +00:00
Alyssa Rosenzweig	eb0001bf2b	pan/bi: Rename bi_block->name to bi_block->index This is consistent with nir_block and (IMO) less confusing. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16279>	2022-05-03 17:56:16 +00:00
Alyssa Rosenzweig	54412afadc	pan/bi: Handle texture offset + index Fixes dEQP-VK.glsl.opaque_type_indexing.sampler.uniform.vertex.sampler1d Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16283>	2022-05-02 20:28:48 +00:00
Emma Anholt	536c8ee96d	nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional. This controls the whole lowering of "make tex ops with implicit derivatives on non-implicit-derivative stages be tex ops with an explicit lod of 0 instead", but it's really hard to describe that in a git commit summary. All existing callers get it added except: - nir_to_tgsi which didn't want it. - nouveau, which didn't want it (fixes regressions in shadowcube and shadow2darray with NIR, since the shading languages don't expose txl of those sampler types and thus it's not supported in HW) - optional lowering passes in mesa/st (lower_rect, YUV lowering, etc) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>	2022-04-28 21:26:08 +00:00
Icecream95	76cea8e27b	panfrost: Fix pack_32_2x16 implementation Fixes: `6f0eff548c` ("pan/bi: Implement packing ops between 32-bit vec1 and 16-bit vec2") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16181>	2022-04-27 15:30:09 +00:00
Alyssa Rosenzweig	2ca8b014d1	pan/bi: Implement pack_uvec[24]_to_uint This maps nicely to Mali's weirdo MKVEC, so implement it rather than scalarizing. The scalarization wants an extract implemented which we don't have. Fixes dEQP-VK.glsl.builtin.function.pack_unpack.* Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16120>	2022-04-26 00:18:19 +00:00
Alyssa Rosenzweig	c9b33fe7dc	pan/bi: Implement fquantize2f16 Implement as f2f32(f2f16(x)) with the conversions in flush-to-zero mode. Accessing flush-to-zero mode on Bifrost is nontrivial: it is specified per-clause, rather than per-instruction. I've opted to pipe support for ftz clauses through the scheduler. This solution has two nice properties: * It uses the native hardware for flushing subnormals, avoiding extra lowering. * It's "smart" about scheduling around FTZ requirements, meaning we get good code generated even for a shader that e.g. quantizes a vector. With an unrelated scheduler fix, the V2F32_TO_V2F16/+F16_TO_F32 operation fits in a single tuple, minimizing the overhead of the special FTZ clause. We'll have to do something a bit different for Valhall (FLUSH.f32), but we'll worry about when we actually have PanVK brought up on Valhall. Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16123>	2022-04-25 16:29:31 +00:00
Alyssa Rosenzweig	1fb4427a7a	pan/bi: Imply round mode most of the time Much less noisy, and provides a path to further improvements. There is a slight behaviour change: int-to-float conversions now use RTE instead of RTZ. For 32-bit opcodes, this affects conversions of integers with magnitude greater than 2^23 by at most 1 ulp. As this behaviour is unspecified in GLSL, this change is believed to be acceptable. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15187>	2022-04-07 18:03:57 +00:00
Alyssa Rosenzweig	6e69c3369c	pan/bi: Don't lower vertex_id for malloc IDVS Based on hardware behaviour, it appears vertex_id is zero-based with the legacy geometry flow but not with the new malloc IDVS flow. Since the geometry flow is per-shader (not per-machine), there's not a good way to communicate this to NIR. Rather than trying to shoehorn this obscure detail into NIR, just do the lowering ourselves instead of in NIR. It's not much more code anyway. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>	2022-04-07 14:20:45 +00:00
Alyssa Rosenzweig	ccdec68aee	pan/bi: Report whether workgroups can be merged This flag gates a Valhall hardware optimization for compute shaders. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15793>	2022-04-07 14:20:45 +00:00

1 2 3 4 5 ...

596 Commits