Commit Graph

4240 Commits

Author SHA1 Message Date
Alyssa Rosenzweig ed5a5a9d6d panfrost: Wire up transfrom feedback sysvals
Wire the Gallium interface for transform feedback up to the system values that
will be fed into our lowering code. This is based on our existing transform
feedback implementation for Midgard.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig 4e341e70d8 pan/bi: Handle transform feedback intrinsics
Translate the intrinsics we introduced to lower away transform feedback into
Panfrost system values which the GL driver can handle.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig ae3fa6cc1d pan/bi: Add transform feedback lowering pass
Add a simple NIR-based implementation of transform feedback, appropriate for
OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders).
Stores to varyings that will be captured are replaced by stores to transform
feedback buffers and some addressing math. This allows implementing the semantic
of transform feedback in a compute-like stage.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig ed4bd8738d panfrost/ci: Mark draw_buffers_indexed.* as flakes
These keep flaking. Icecream95 observes the issue relates to AFBC in the
discussion of the flake in issue 6604. Until the root cause can be identified
and fixed, mark the tests as known flakes for CI.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16855>
2022-06-03 21:05:22 +00:00
Alyssa Rosenzweig 7535362204 pan/bi: Fix clper_xor on Mali-G31
Mali-G31 has the old CLPER instruction, not the new one, which means we don't
get to specify a custom lane op. But the clper_xor helper incorrectly checked
the arch, not the implementation quirk.

Fixes: c00e7b729f ("pan/bi: Optimize abs(derivative)")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16846>
2022-06-02 20:32:43 -04:00
Alyssa Rosenzweig ad5c84999b pan/bi: Rework Valhall register alignment
Because we lower SPLIT and COLLECT before RA, we need to consider offsets when
determining the dimensions of vectors, in order to align properly. Lowering
COLLECT post-RA would avoid this special case.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
2022-06-02 17:13:16 +00:00
Alyssa Rosenzweig 0770e7a90c pan/bi: Align 64-bit register sources
Similar idea to aligning staging register sources.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
2022-06-02 17:13:16 +00:00
Alyssa Rosenzweig 8553dd97ad pan/bi: Allow vec6 for collects
Hit for some Valhall texturing instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
2022-06-02 17:13:16 +00:00
Icecream95 1bfff407b9 pan/bi: Use nodearrays for linear constraints
Speeds up compiling shaders/skia/781.shader_test in shader-db by 8x
(Icecream95).

...At least it did before I extended to support register allocation of vec8.  On
Valhall, texture instructions require up to 8 consecutive registers. To handle
this, provide for vec8 register allocation. Liveness was already (accidentally?)
vec8. The increased memory requirement is acceptable given that the interference
matrix is now stored sparsely (Alyssa).

Icecream95 reports the vec8 changes hurt RA performance by about 1% on average.
I consider this acceptable for now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
2022-06-02 17:13:16 +00:00
Icecream95 c70daa74f0 pan/bi: Add nodearray datastructure
This is an array which can either be sparse or dense, and was designed
to be used to track liveness and interference information.

Either a sparse array with sorted indices or dense array is used.
Other data structures were tried, such as red-black trees or hash
tables, but they were slower. When used for storing constraints, the
indices do not have to be sorted as duplicating elements is okay, but
the speedup from that was not enough to justify the extra complexity.

v2: Add a comment about how to potentially speed it up. But it seems
  fast enough even without this change.
v3: Use a custom struct rather than relying on util_dynarray.
v4: Split out functions only used for liveness analysis, rather than the simpler
  data structure needed for the register interference matrix. If we need to
  optimize liveness, that can follow on after. Also make it for vec8 (Alyssa).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
2022-06-02 17:13:16 +00:00
Icecream95 c24b78cceb pan/bi: Reverse linear constraint bits
This will make it simpler to implement parallel RA where multiple
possible registers for a node are tested at once.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16780>
2022-06-02 17:13:16 +00:00
Alyssa Rosenzweig bc4d42023d pan/bi: Respect swizzles in nir_op_pack_64_2x32_split
Triggered a BIR validation error, which made debugging a breeze. That validation
pass (dimensionality checks) gets a lot of use, it seems :-)

Fixes:

   dEQP-VK.ssbo.layout.2_level_array.std430.row_major_mat4x2_comp_access_store_cols

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>
2022-06-01 20:08:42 +00:00
Alyssa Rosenzweig 7831508740 panvk: Use vk_image_subresource_*_count for clears
This handles VK_REMAINING_* for us, instead of underflowing and clearing no
levels/layers.

Fixes dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.*

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16724>
2022-06-01 20:08:42 +00:00
Alyssa Rosenzweig 82d3eb7f18 panfrost: Handle texturing from AFBC on Valhall
We need to pack special AFBC-specific plane descriptors instead of the generic
plane descriptor. Nothing too fancy here, though.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
2022-06-01 19:44:31 +00:00
Alyssa Rosenzweig 9afa8cc555 panfrost: Support rendering to AFBC on Valhall
Add the required handling when packing render target and depth buffer
descriptors on Valhall. This is mostly equivalent to Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
2022-06-01 19:44:31 +00:00
Alyssa Rosenzweig c2207d27c2 panfrost: Add pan_afbc_compression_mode on Valhall
Map a canonical format (a hardware-independent pipe_format) to a compression
mode (Valhall-specific hardware enum defined in GenXML). To be used for packing
plane descriptors and render target descriptors when AFBC is in use on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
2022-06-01 19:44:31 +00:00
Alyssa Rosenzweig 87dcdbdad6 panfrost: Pass arch instead of dev into afbc_format
For callers that have a device object, it's easy to pass dev->arch instead of
dev. But this requires callers to have a reference to the device, which is
tricky for callers that only have the arch via PAN_ARCH. Pass dev->arch instead
of dev to accommodate them.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
2022-06-01 19:44:31 +00:00
Alyssa Rosenzweig 2cc2f217d4 panfrost: Fix XML for AFBC header on v9
Misnamed field due to copy/paste fail from Bifrost.

Fixes: c011ea6c26 ("panfrost: Shuffle render target AFBC for Valhall")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16800>
2022-06-01 19:44:31 +00:00
Alyssa Rosenzweig e596a0423b pan/mdg: Print outmods when printing IR
In particular, this lets us distinguish mul_high from regular mul.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig a099834b97 pan/mdg: Distinguish SSA vs reg when printing IR
This makes it easy to match the printed IR with the indices in the NIR.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig 520204ae18 pan/mdg: Only print 1 source for moves
This makes the printed IR easier to read at a glance.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig 0ee24c46e0 pan/mdg: Only print 2 sources for ALU
..and assert the other sources are null. The one place this might fail in the
future is for real FMA, but we don't support that for GL.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig 9c9db27e3c pan/mdg: Only print masked components of swizzle
This matches the IR printer with the disassembler, making the output of the IR
printer much easier to parse at a glance.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig c9093554d0 pan/mdg: Use "<<" instead of "lsl"
Easier to read and consistent with C code.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig 8c11f4809b pan/mdg: Remove uppercase write masks
These do not convey any additional information, and fail to account for
shrinking. In particular, a 64-bit writemask with .keephi would fail to
disassemble and instead trip the assertion, since that would be the ZW
components. Just delete the broken code.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:10 -04:00
Alyssa Rosenzweig 9e4b457958 pan/mdg: Scalarize with 64-bit sources
Otherwise, we can get vec3 with u2u32 with 64-bit sources which we need lowered.
Since our current approach is "scalarize all 64-bit ops", we need to check for
conversions too.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16798>
2022-06-01 14:24:05 -04:00
Alyssa Rosenzweig 5067a26f44 pan/bi: Use flow control lowering on Valhall
Logically at the same part of the compile pipeline as clause scheduling on
Bifrost. Lots of similarities, too. Now that we generate flow control only as a
late pass, various hacks in the compiler are no longer necessary and are
dropped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig a394c32cd2 pan/va: Unit test flow control merging
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 4b06e7f5b6 pan/va: Unit test flow control insertion
Test that we correctly track the scoreboard, helper invocations, reconvergence,
and ends and insert NOPs to effect this expected flow control.

As the pass inserts NOPs but does not otherwise modify the shader, this is easy
to test with well-defined behaviour of the pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 0fa9204049 pan/va: Respect assigned slots
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 492f4055dd pan/va: Assign slots roundrobin
This should reduce false dependencies with asynchronous instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig aa7393f81a pan/va: Add flow control merging pass
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 03d8439c0a pan/va: Terminate helper threads
On Bifrost, to terminate helper threads we set the td bit on the clause. On
Valhall, we need to use the .discard flow control. Extend the flow control NOP
insertion to insert NOP.discard where necessary to terminate helper threads.
This should reduce wasted work in fragment shaders.

This requires fairly involved data flow analysis, but the handling here should
be optimal.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 41b39d6d5d pan/va: Do scoreboard analysis
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 7e3b9cf754 pan/va: Add pass to insert flow control
To set flow control modifiers correctly and efficiently, we need a pass that
runs after register allocation and scheduling, but before packing. Add such a
pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 82b1897900 pan/bi: Print flow control on instructions
This helps debug the flow control lowering passes on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig c0180f6bd3 pan/bi: Export helper termination analysis
The current helper termination analysis code is hardwired for clauses, so it
won't work for Valhall. However, the bulk of it is dataflow analysis which is
portable between Bifrost and Valhall. Export the interesting bits so we can
reuse them on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 7bb635316b pan/bi: Export bi_block_add_successor
For use in unit tests that need to create blocks.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig d7c6b7c9d2 pan/bi: Extract bit_block helper
Convenience for unit tests which need to create multiple blocks, to test global
passes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig b0edd92156 pan/bi: Add a trivial ctx->inputs for unit tests
So we can unit test the flow control insertion which needs to gate some
behaviour on not being in a blend shader.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 218148d38a pan/bi: Add ASSERT_SHADER_EQUAL macro
Useful for whole-program unit tests.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 4627cd99de pan/bi: Preserve flow control for non-psiz variant
Otherwise we will get INSTR_INVALID_ENC faults when deleting the final STORE.end
instruction, after we rework our flow control code.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig c846e0812b pan/bi: Add slot to bi_instr
For better handling of message-passing instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Alyssa Rosenzweig 616df0e97d pan/bi: Extend bi_scoreboard_state for finer tracking
We need to insert dependencies for varyings and memory access. Currently, the
Bifrost scoreboarding pass just treats these as barriers, but this is too heavy
handed. Extend the scoreboard data structure so we can do better.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16804>
2022-06-01 16:14:38 +00:00
Daniel Schürmann bd151a256e nir/opt_vectorize: add callback for max vectorization width
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.

This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().

Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
2022-06-01 11:41:44 +00:00
Emma Anholt 7ae206d76e panfrost: always print the bad ALU op if we're failing to translate.
CI failure could have told me what needed fixing, but no...

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16437>
2022-06-01 10:56:35 +00:00
Emma Anholt 7472bb4bad glsl,nir: Move i/umulExtended lowering to NIR.
NIR already has the necessary lowering, and the GLSL lowering violates
GLSL IR validation rules.  Once quadop lowering was turned off, the IR
validation at the end of the compile path on DEBUG builds caught the
problem.

In order to move the lowering to NIR, though, we need to make sure that
drivers supporting these functions actually have the lowering flag set.

xfails added for t860, where apparently this tickles a variety of existing
64-bit bugs in the backend.

Fixes: #6461
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16437>
2022-06-01 10:56:35 +00:00
Juan A. Suarez Romero 836ce97f5e ci: bump VK-GL-CTS to 1.3.2.0
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16689>
2022-05-31 15:02:08 +00:00
Alyssa Rosenzweig 0170500627 pan/bi: Interpolate varyings at 16-bit
On Bifrost, we have a single "load float varying" instruction that controls the
bit size of the result, allowing us to fold a f2f16 into the load. However, the
larger benefit is that 16-bit varying loads are interpolated at 16-bit. Arm
claims that the varying unit has a 32-bit data path, allowing 16-bit varyings to
be interpolated in half the cycles from 32-bit. This change should therefore
improve performance for workloads that are varying units. This means we want to
be aggressive about 16-bit varying loads, even if it costs some extra f2f32
instructions.

glmark2 total score on Mali-G52 up from 1173fps to 1218fps with particular wins
in -brefract, -bshadow, -bjellyfish, and -bshading.

total instructions in shared programs: 2432246 -> 2423668 (-0.35%)
instructions in affected programs: 516056 -> 507478 (-1.66%)
helped: 3641
HURT: 432
helped stats (abs) min: 1.0 max: 12.0 x̄: 2.91 x̃: 2
helped stats (rel) min: 0.08% max: 54.55% x̄: 9.88% x̃: 5.71%
HURT stats (abs)   min: 1.0 max: 42.0 x̄: 4.71 x̃: 4
HURT stats (rel)   min: 0.23% max: 200.00% x̄: 12.58% x̃: 6.37%
95% mean confidence interval for instructions value: -2.21 -2.00
95% mean confidence interval for instructions %-change: -7.92% -7.07%
Instructions are helped.

total tuples in shared programs: 1941309 -> 1934647 (-0.34%)
tuples in affected programs: 353169 -> 346507 (-1.89%)
helped: 3233
HURT: 453
helped stats (abs) min: 1.0 max: 14.0 x̄: 2.46 x̃: 2
helped stats (rel) min: 0.12% max: 50.00% x̄: 9.90% x̃: 5.56%
HURT stats (abs)   min: 1.0 max: 25.0 x̄: 2.85 x̃: 2
HURT stats (rel)   min: 0.22% max: 150.00% x̄: 8.96% x̃: 5.26%
95% mean confidence interval for tuples value: -1.89 -1.72
95% mean confidence interval for tuples %-change: -8.01% -7.15%
Tuples are helped.

total clauses in shared programs: 357354 -> 356610 (-0.21%)
clauses in affected programs: 25794 -> 25050 (-2.88%)
helped: 994
HURT: 317
helped stats (abs) min: 1.0 max: 3.0 x̄: 1.16 x̃: 1
helped stats (rel) min: 1.49% max: 33.33% x̄: 10.78% x̃: 10.00%
HURT stats (abs)   min: 1.0 max: 4.0 x̄: 1.31 x̃: 1
HURT stats (rel)   min: 1.19% max: 50.00% x̄: 13.56% x̃: 8.33%
95% mean confidence interval for clauses value: -0.63 -0.50
95% mean confidence interval for clauses %-change: -5.63% -4.16%
Clauses are helped.

total cycles in shared programs: 167697.96 -> 167431.15 (-0.16%)
cycles in affected programs: 12638.29 -> 12371.48 (-2.11%)
helped: 2652
HURT: 350
helped stats (abs) min: 0.04166399999999726 max: 0.75 x̄: 0.11 x̃: 0
helped stats (rel) min: 0.12% max: 100.00% x̄: 14.39% x̃: 5.04%
HURT stats (abs)   min: 0.041665999999999315 max: 0.5833329999999997 x̄: 0.11 x̃: 0
HURT stats (rel)   min: 0.00% max: 75.00% x̄: 7.90% x̃: 4.71%
95% mean confidence interval for cycles value: -0.09 -0.08
95% mean confidence interval for cycles %-change: -12.56% -11.02%
Cycles are helped.

total arith in shared programs: 74169.46 -> 73891.71 (-0.37%)
arith in affected programs: 13885.87 -> 13608.12 (-2.00%)
helped: 3215
HURT: 445
helped stats (abs) min: 0.04166399999999726 max: 0.5416680000000014 x̄: 0.10 x̃: 0
helped stats (rel) min: 0.12% max: 100.00% x̄: 14.16% x̃: 6.67%
HURT stats (abs)   min: 0.041665999999999315 max: 1.125 x̄: 0.12 x̃: 0
HURT stats (rel)   min: 0.00% max: 100.00% x̄: 9.76% x̃: 5.49%
95% mean confidence interval for arith value: -0.08 -0.07
95% mean confidence interval for arith %-change: -11.91% -10.59%
Arith are helped.

total texture in shared programs: 11936 -> 11931 (-0.04%)
texture in affected programs: 20 -> 15 (-25.00%)
helped: 10
HURT: 0
helped stats (abs) min: 0.5 max: 0.5 x̄: 0.50 x̃: 0
helped stats (rel) min: 14.29% max: 100.00% x̄: 45.71% x̃: 33.33%
95% mean confidence interval for texture value: -0.50 -0.50
95% mean confidence interval for texture %-change: -73.16% -18.26%
Texture are helped.

total vary in shared programs: 4180.88 -> 3447.19 (-17.55%)
vary in affected programs: 2109.88 -> 1376.19 (-34.77%)
helped: 2202
HURT: 39
helped stats (abs) min: 0.0625 max: 1.4375 x̄: 0.34 x̃: 0
helped stats (rel) min: 2.38% max: 66.67% x̄: 40.43% x̃: 50.00%
HURT stats (abs)   min: 0.125 max: 0.375 x̄: 0.26 x̃: 0
HURT stats (rel)   min: 0.00% max: 300.00% x̄: 92.54% x̃: 23.08%
95% mean confidence interval for vary value: -0.34 -0.32
95% mean confidence interval for vary %-change: -39.22% -37.01%
Vary are helped.

total quadwords in shared programs: 1689664 -> 1684852 (-0.28%)
quadwords in affected programs: 265522 -> 260710 (-1.81%)
helped: 2864
HURT: 447
helped stats (abs) min: 1.0 max: 14.0 x̄: 2.10 x̃: 2
helped stats (rel) min: 0.15% max: 31.58% x̄: 6.05% x̃: 4.65%
HURT stats (abs)   min: 1.0 max: 22.0 x̄: 2.67 x̃: 2
HURT stats (rel)   min: 0.27% max: 38.46% x̄: 6.79% x̃: 4.55%
95% mean confidence interval for quadwords value: -1.54 -1.37
95% mean confidence interval for quadwords %-change: -4.55% -4.08%
Quadwords are helped.

total threads in shared programs: 53656 -> 53688 (0.06%)
threads in affected programs: 32 -> 64 (100.00%)
helped: 32
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

total preloads in shared programs: 116212 -> 103476 (-10.96%)
preloads in affected programs: 45222 -> 32486 (-28.16%)
helped: 3022
HURT: 11
helped stats (abs) min: 1.0 max: 11.0 x̄: 4.23 x̃: 4
helped stats (rel) min: 7.14% max: 68.75% x̄: 30.39% x̃: 25.00%
HURT stats (abs)   min: 2.0 max: 4.0 x̄: 3.45 x̃: 4
HURT stats (rel)   min: 14.29% max: 50.00% x̄: 25.93% x̃: 25.00%
95% mean confidence interval for preloads value: -4.26 -4.14
95% mean confidence interval for preloads %-change: -30.68% -29.69%
Preloads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Chris Healy cphealy@gmail.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig 93f69e4b1c pan/bi: Model Valhall source formats
LD_VAR_BUF instructions on Valhall take a source format, indicating the
in-memory format of the varying independent from the register format, which we
still model within the compiler for compatibility with Bifrost. (Prior to
Valhall, source format is specified in the attribute descriptor as a physical
pixel format.)

Model this information, allowing us to generate fp16 LD_VAR_BUF instructions
correctly on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig 06886c3861 pan/bi: Make LD_VAR w=format instead of w=vecsize
Fixes a vector dimension validation failure in
dEQP-GLES3.functional.shaders.indexing.varying_array.vec4_static_write_dynamic_read
after we enable fp16 varyings.

No shader-db changes, as we don't yet support fp16 varyings.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig a9b13a1867 pan/va: Fill in missing src_flat16 enum
Valhall gains(?) the ability to flatshade 16-bit varyings, this is indicated by
a particular source format.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:49:44 -04:00
Alyssa Rosenzweig e898e2466b pan/bi: Add VAR_TEX fusing unit test
As fusing VAR_TEX is an optimization, it's helpful to have unit tests since
functional tests won't check that the optimization triggers when expected.
Originally written when I was touching the VAR_TEX code. Those changes have
since been dropped by the unit test remains useful.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16752>
2022-05-30 17:48:59 -04:00
Alyssa Rosenzweig 42a4a123a6 pan/bi: Don't allow spilling coverage mask writes
The register precolouring logic assumes that coverage masks are always in R60,
so spilling them causes incorrect results. We could do better. Fixes on Valhall:

   dEQP-GLES3.functional.ubo.random.all_per_block_buffers.28

Fixes: 3df5446cbd ("pan/bi: Simplify register precolouring in the IR")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>
2022-05-30 14:00:55 +00:00
Alyssa Rosenzweig 67f5721349 panfrost: Set allow_rotating_primitives
On Valhall, the driver should set this flag if the hardware may rotate
primitives. This happens if:

1. The rasterization of lines does not matter, AND
2. The provoking vertex does not matter.

The first condition we may satisfy by checking for LINES and the second by
checking for flat shading. Otherwise, we should set this flag to allow
optimizations. This may be more efficient for tiling.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16748>
2022-05-30 14:00:55 +00:00
Jason Ekstrand 0eee071038 panvk: Use the vk_buffer base struct
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16607>
2022-05-27 18:39:00 -05:00
Alyssa Rosenzweig 01ba3460a9 pan/bi: Test CMP result_type optimization
Add unit tests ensuring the optimization applies in all the cases we care about,
as functional integration tests (CTS and Piglit) won't test this. Also add unit
tests for a few cases where we specifically cannot fuse, in case these cases are
missed by the tests.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16725>
2022-05-27 12:14:22 +00:00
Alyssa Rosenzweig 501a66cb5c pan/bi: Fuse result types
In NIR, comparison instructions always produce 0/~0 results. For other result
types, a separate b2f32 or b2i32 instruction is used to transform the result.
However, Mali's comparison instructions have modifiers for these alternate
result types, so we can implement expressions like int(a < b) and float(a ==
b) in single instruction. Add a peephole optimization to fuse comparisons
with result type transformations.

Results on Mali-G52:

total instructions in shared programs: 2439696 -> 2434339 (-0.22%)
instructions in affected programs: 418703 -> 413346 (-1.28%)
helped: 1630
HURT: 0
helped stats (abs) min: 1.0 max: 28.0 x̄: 3.29 x̃: 2
helped stats (rel) min: 0.11% max: 19.35% x̄: 1.64% x̃: 1.39%
95% mean confidence interval for instructions value: -3.44 -3.13
95% mean confidence interval for instructions %-change: -1.72% -1.56%
Instructions are helped.

total tuples in shared programs: 1946581 -> 1943005 (-0.18%)
tuples in affected programs: 251742 -> 248166 (-1.42%)
helped: 1113
HURT: 11
helped stats (abs) min: 1.0 max: 32.0 x̄: 3.23 x̃: 2
helped stats (rel) min: 0.17% max: 15.38% x̄: 1.80% x̃: 1.38%
HURT stats (abs)   min: 1.0 max: 2.0 x̄: 1.45 x̃: 1
HURT stats (rel)   min: 0.21% max: 3.12% x̄: 1.23% x̃: 0.89%
95% mean confidence interval for tuples value: -3.35 -3.01
95% mean confidence interval for tuples %-change: -1.88% -1.66%
Tuples are helped.

total clauses in shared programs: 357791 -> 357349 (-0.12%)
clauses in affected programs: 15879 -> 15437 (-2.78%)
helped: 371
HURT: 3
helped stats (abs) min: 1.0 max: 8.0 x̄: 1.20 x̃: 1
helped stats (rel) min: 0.80% max: 33.33% x̄: 3.85% x̃: 2.17%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 2.94% max: 5.26% x̄: 4.49% x̃: 5.26%
95% mean confidence interval for clauses value: -1.27 -1.09
95% mean confidence interval for clauses %-change: -4.21% -3.36%
Clauses are helped.

total cycles in shared programs: 167922.04 -> 167810.71 (-0.07%)
cycles in affected programs: 6772.08 -> 6660.75 (-1.64%)
helped: 655
HURT: 12
helped stats (abs) min: 0.041665999999999315 max: 1.3333319999999986 x̄: 0.17 x̃: 0
helped stats (rel) min: 0.18% max: 20.00% x̄: 2.02% x̃: 1.60%
HURT stats (abs)   min: 0.041665999999999315 max: 0.125 x̄: 0.05 x̃: 0
HURT stats (rel)   min: 0.21% max: 3.80% x̄: 1.23% x̃: 0.88%
95% mean confidence interval for cycles value: -0.18 -0.16
95% mean confidence interval for cycles %-change: -2.10% -1.81%
Cycles are helped.

total arith in shared programs: 74393.17 -> 74243.08 (-0.20%)
arith in affected programs: 10157.50 -> 10007.42 (-1.48%)
helped: 1129
HURT: 12
helped stats (abs) min: 0.041665999999999315 max: 1.3333319999999986 x̄: 0.13 x̃: 0
helped stats (rel) min: 0.18% max: 50.00% x̄: 1.94% x̃: 1.40%
HURT stats (abs)   min: 0.041665999999999315 max: 0.125 x̄: 0.05 x̃: 0
HURT stats (rel)   min: 0.21% max: 3.80% x̄: 1.23% x̃: 0.88%
95% mean confidence interval for arith value: -0.14 -0.12
95% mean confidence interval for arith %-change: -2.06% -1.76%
Arith are helped.

total quadwords in shared programs: 1692019 -> 1688164 (-0.23%)
quadwords in affected programs: 216669 -> 212814 (-1.78%)
helped: 1148
HURT: 11
helped stats (abs) min: 1.0 max: 41.0 x̄: 3.37 x̃: 2
helped stats (rel) min: 0.17% max: 17.24% x̄: 2.25% x̃: 1.73%
HURT stats (abs)   min: 1.0 max: 2.0 x̄: 1.09 x̃: 1
HURT stats (rel)   min: 0.60% max: 1.32% x̄: 0.85% x̃: 0.83%
95% mean confidence interval for quadwords value: -3.49 -3.16
95% mean confidence interval for quadwords %-change: -2.33% -2.10%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16725>
2022-05-27 12:14:22 +00:00
David Heidelberg c9f0a511e0 ci/panfrost: add RoR and Nheko traces
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16633>
2022-05-27 06:51:38 +00:00
Alyssa Rosenzweig 0255f554f3 panfrost: Advertise 16x16 tiled AFBC
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 3fbfd356af panfrost: Add helper checking tiled AFBC support
Tiled AFBC support was introduced with v7. Add a helper encoding this fact.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 5fa274fee4 panfrost: Handle AFBC Tiled
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig b63dad3ce5 panfrost: Put comment in correct #ifdef
Minor fix to make the code less confusing.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig bd529b7983 panfrost: Fix AFBC flags on v6
Tiled headers and bounds checking were introduced with v7. The flags don't exist
on v6. Fix the XML accordingly so we don't accidentally use features too new for
the hardware.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 166d879ff0 panfrost: Add 1x1 layout unit tests
These check the alignments are correct. Of course, ideally these cases aren't
hit in practice, since it's a waste of memory.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 65ba39f84c panfrost: Add a tiled 16x16 layout unit test
To exercise the layout code introduced in this series.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig d11945cd85 panfrost: Calculate header_size based on row_stride
The header size is the header stride times the number of rows in the header
(number of tiles of superblocks). We already calculate the header stride, so
eliminate the separate header size calculation.

Delete the old header size calculation. It has no notion of wide blocks, let
alone tiled AFBC headers.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 0cf6091bd0 panfrost: Add 3D texture layout unit test
3D AFBC is pretty subtle, let's make sure we have adequate unit test coverage.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 5944bbfa94 panfrost: Add AFBC stride unit tests
Demonstrating correctness of the low level calculations.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 544a8894fc panfrost: Align layouts to tiles of superblocks
Required to satisfy the alignment constraints on tiled AFBC.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 9c9b7f7a42 panfrost: Support tiled AFBC in stride helpers
Part 1 of tiled AFBC. This requires modifier information.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 5c86f53112 panfrost: Add pan_afbc_tile_size helper
To unify calculations with linear and tiled AFBC formats.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig b7c18160d3 panfrost: Fix is_wide return type
By inspection.

Fixes: e4ee2c213a ("panfrost: Extract panfrost_afbc_is_wide helper")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 6b0ff7da48 panfrost: Extract pan_afbc_row_stride helper
Extract a helper for calculating AFBC strides. This is used in two places in
pan_layout. It will need extension for tiled AFBC, and the extended version
could benefit from unit testing.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig d8a4c9b505 panfrost: Extract afbc_stride_blocks helper
Let's keep all the AFBC computations inside the layout code, to keep pan_cs
dumb. This helper will need some extension for tiled AFBC.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16697>
2022-05-26 15:56:32 +00:00
Alyssa Rosenzweig 96d9093c19 pan/bi: Allow CSEing LEA_BUF_IMM
Cleans up the code gen a lot in varying shaders. Instruction count regression
due to how we handle 64-bit on Valhall. (TODO: A better solution for that...)

total instructions in shared programs: 2730186 -> 2736193 (0.22%)
instructions in affected programs: 775825 -> 781832 (0.77%)
helped: 2010
HURT: 4433
helped stats (abs) min: 1.0 max: 18.0 x̄: 2.16 x̃: 2
helped stats (rel) min: 0.16% max: 26.67% x̄: 3.75% x̃: 2.22%
HURT stats (abs)   min: 1.0 max: 10.0 x̄: 2.33 x̃: 2
HURT stats (rel)   min: 0.20% max: 23.08% x̄: 4.79% x̃: 2.79%
95% mean confidence interval for instructions value: 0.87 1.00
95% mean confidence interval for instructions %-change: 1.98% 2.27%
Instructions are HURT.

total cycles in shared programs: 161178.77 -> 144303.77 (-10.47%)
cycles in affected programs: 85720 -> 68845 (-19.69%)
helped: 6910
HURT: 0
helped stats (abs) min: 1.0 max: 18.0 x̄: 2.44 x̃: 2
helped stats (rel) min: 1.05% max: 41.18% x̄: 19.72% x̃: 20.00%
95% mean confidence interval for cycles value: -2.48 -2.41
95% mean confidence interval for cycles %-change: -19.86% -19.58%
Cycles are helped.

total cvt in shared programs: 13655.45 -> 14013 (2.62%)
cvt in affected programs: 2978.06 -> 3335.61 (12.01%)
helped: 381
HURT: 5242
helped stats (abs) min: 0.015625 max: 0.0625 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.37% max: 50.00% x̄: 7.61% x̃: 3.85%
HURT stats (abs)   min: 0.015625 max: 0.296875 x̄: 0.07 x̃: 0
HURT stats (rel)   min: 0.00% max: 400.00% x̄: 28.51% x̃: 16.00%
95% mean confidence interval for cvt value: 0.06 0.06
95% mean confidence interval for cvt %-change: 25.13% 27.00%
Cvt are HURT.

total ls in shared programs: 147856 -> 130980 (-11.41%)
ls in affected programs: 85725 -> 68849 (-19.69%)
helped: 6911
HURT: 0
helped stats (abs) min: 1.0 max: 18.0 x̄: 2.44 x̃: 2
helped stats (rel) min: 1.05% max: 41.18% x̄: 19.72% x̃: 20.00%
95% mean confidence interval for ls value: -2.48 -2.41
95% mean confidence interval for ls %-change: -19.86% -19.58%
Ls are helped.

total quadwords in shared programs: 1483576 -> 1486872 (0.22%)
quadwords in affected programs: 73816 -> 77112 (4.47%)
helped: 286
HURT: 698
helped stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
helped stats (rel) min: 2.38% max: 50.00% x̄: 16.83% x̃: 16.67%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 2.78% max: 100.00% x̄: 37.38% x̃: 16.67%
95% mean confidence interval for quadwords value: 2.89 3.80
95% mean confidence interval for quadwords %-change: 19.02% 24.22%
Quadwords are HURT.

total threads in shared programs: 53186 -> 53189 (<.01%)
threads in affected programs: 3 -> 6 (100.00%)
helped: 3
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%

total fills in shared programs: 2172 -> 2163 (-0.41%)
fills in affected programs: 11 -> 2 (-81.82%)
helped: 1
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16710>
2022-05-25 15:51:15 +00:00
Alyssa Rosenzweig 569e5dc745 pan/bi: Schedule for pressure pre-RA
Add a bottom-up pre-RA list scheduler that aims to reduce register pressure,
roughly the same as we use on Midgard to great effect. It uses a simple
heuristic: greedily select instructions that have reduce liveness.  To avoid
regressions, the algorithm throws away schedules that increase maximum number of
lives (used as an estimate of register pressure -- if we had SSA form, this
would be exact).

We might be better off using Sarkar. But for something I could type out in an
afternoon, I'll happily accept a >50% reduction in spills. Instruction count is
regressed due to extra moves around the blend shader ABI in some cases, at least
on Bifrost this is mostly hidden by the clause scheduler. Thread count and
spills/fills are both much improved here.

There are numerous opportunities for future improvements to pre-RA scheduling:

* Better heuristics? (Something more global than liveness alone)
* Reducing false dependencies with memory access
* Improve ILP for message-passing instructions? This is a tradeoff.
* Simplify the code if we have SSA in the future.

But for now, I think this is well worth it already.

v2: Various clean-ups and memory leak fix (Icecream95). Reduce false
dependencies to eliminate spilling in more shaders.

shader-db stats on Mali-G52:

total instructions in shared programs: 2438841 -> 2439698 (0.04%)
instructions in affected programs: 1206421 -> 1207278 (0.07%)
helped: 3113
HURT: 4011
helped stats (abs) min: 1.0 max: 50.0 x̄: 3.25 x̃: 2
helped stats (rel) min: 0.13% max: 44.83% x̄: 4.09% x̃: 2.11%
HURT stats (abs)   min: 1.0 max: 18.0 x̄: 2.73 x̃: 2
HURT stats (rel)   min: 0.11% max: 57.14% x̄: 3.86% x̃: 2.07%
95% mean confidence interval for instructions value: 0.02 0.22
95% mean confidence interval for instructions %-change: 0.23% 0.54%
Instructions are HURT.

total tuples in shared programs: 1927077 -> 1946583 (1.01%)
tuples in affected programs: 1118627 -> 1138133 (1.74%)
helped: 2874
HURT: 6295
helped stats (abs) min: 1.0 max: 82.0 x̄: 3.51 x̃: 2
helped stats (rel) min: 0.17% max: 33.33% x̄: 4.60% x̃: 3.57%
HURT stats (abs)   min: 1.0 max: 47.0 x̄: 4.70 x̃: 3
HURT stats (rel)   min: 0.20% max: 50.00% x̄: 5.16% x̃: 4.32%
95% mean confidence interval for tuples value: 2.00 2.25
95% mean confidence interval for tuples %-change: 1.97% 2.23%
Tuples are HURT.

total clauses in shared programs: 356053 -> 357793 (0.49%)
clauses in affected programs: 151578 -> 153318 (1.15%)
helped: 2196
HURT: 3813
helped stats (abs) min: 1.0 max: 49.0 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.18% max: 69.01% x̄: 10.26% x̃: 8.33%
HURT stats (abs)   min: 1.0 max: 25.0 x̄: 1.70 x̃: 1
HURT stats (rel)   min: 0.57% max: 66.67% x̄: 10.64% x̃: 8.33%
95% mean confidence interval for clauses value: 0.22 0.36
95% mean confidence interval for clauses %-change: 2.68% 3.33%
Clauses are HURT.

total cycles in shared programs: 167761.17 -> 167922.04 (0.10%)
cycles in affected programs: 24494.21 -> 24655.08 (0.66%)
helped: 862
HURT: 3054
helped stats (abs) min: 0.041665999999999315 max: 53.0 x̄: 0.69 x̃: 0
helped stats (rel) min: 0.28% max: 76.81% x̄: 5.65% x̃: 3.03%
HURT stats (abs)   min: 0.041665999999999315 max: 2.0416659999999993 x̄: 0.25 x̃: 0
HURT stats (rel)   min: 0.26% max: 41.18% x̄: 4.91% x̃: 3.92%
95% mean confidence interval for cycles value: -0.04 0.12
95% mean confidence interval for cycles %-change: 2.36% 2.81%
Inconclusive result (value mean confidence interval includes 0).

total arith in shared programs: 73875.37 -> 74393.17 (0.70%)
arith in affected programs: 43142.42 -> 43660.21 (1.20%)
helped: 3632
HURT: 5443
helped stats (abs) min: 0.041665999999999315 max: 1.2083360000000027 x̄: 0.15 x̃: 0
helped stats (rel) min: 0.22% max: 100.00% x̄: 6.70% x̃: 4.76%
HURT stats (abs)   min: 0.041665999999999315 max: 2.0416659999999993 x̄: 0.19 x̃: 0
HURT stats (rel)   min: 0.00% max: 166.67% x̄: 5.91% x̃: 4.08%
95% mean confidence interval for arith value: 0.05 0.06
95% mean confidence interval for arith %-change: 0.65% 1.07%
Arith are HURT.

total texture in shared programs: 11936 -> 11936 (0.00%)
texture in affected programs: 0 -> 0
helped: 0
HURT: 0

total vary in shared programs: 4180.88 -> 4180.88 (0.00%)
vary in affected programs: 0 -> 0
helped: 0
HURT: 0

total ldst in shared programs: 137551 -> 137028 (-0.38%)
ldst in affected programs: 834 -> 311 (-62.71%)
helped: 13
HURT: 0
helped stats (abs) min: 15.0 max: 53.0 x̄: 40.23 x̃: 53
helped stats (rel) min: 19.15% max: 100.00% x̄: 68.11% x̃: 76.81%
95% mean confidence interval for ldst value: -50.49 -29.98
95% mean confidence interval for ldst %-change: -84.37% -51.84%
Ldst are helped.

total quadwords in shared programs: 1684883 -> 1692021 (0.42%)
quadwords in affected programs: 949463 -> 956601 (0.75%)
helped: 3981
HURT: 5098
helped stats (abs) min: 1.0 max: 86.0 x̄: 3.53 x̃: 3
helped stats (rel) min: 0.18% max: 33.33% x̄: 5.82% x̃: 4.48%
HURT stats (abs)   min: 1.0 max: 50.0 x̄: 4.15 x̃: 3
HURT stats (rel)   min: 0.17% max: 50.00% x̄: 5.11% x̃: 3.85%
95% mean confidence interval for quadwords value: 0.67 0.90
95% mean confidence interval for quadwords %-change: 0.17% 0.47%
Quadwords are HURT.

total threads in shared programs: 53276 -> 53653 (0.71%)
threads in affected programs: 581 -> 958 (64.89%)
helped: 445
HURT: 68
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for threads value: 0.68 0.79
95% mean confidence interval for threads %-change: 75.70% 84.53%
Threads are helped.

total preloads in shared programs: 116312 -> 116312 (0.00%)
preloads in affected programs: 0 -> 0
helped: 0
HURT: 0

total loops in shared programs: 128 -> 128 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 92 -> 37 (-59.78%)
spills in affected programs: 55 -> 0
helped: 13
HURT: 0

total fills in shared programs: 658 -> 190 (-71.12%)
fills in affected programs: 468 -> 0
helped: 13
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16378>
2022-05-25 14:40:12 +00:00
Alyssa Rosenzweig 2fb5ceab7a pan/bi: Recoalesce tied operands after spilling
Otherwise we can fail to allocate tied operands if we spill the tied operand.
Seen in shaders/android/com.miHoYo.GenshinImpact/16.shader_test with a
particularly bad scheduling causing excessive spilling.

No shader-db changes.

Fixes: bc17288697 ("pan/bi: Lower split/collect before RA")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16378>
2022-05-25 14:40:12 +00:00
Icecream95 a4323b0979 panfrost: Only write depth / stencil once if MRT is used
We can't assume that RT0 will be written, so this has to be based on
whether a combined store has already been emitted, not the location of
the store.

Emit a non-special combined_store intrinsic that only writes colour
for the other RTs, as reordering stores breaks the Midgard compiler.

Fixes: d37e901e35 ("pan/mdg: Add new depth store lowering")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6527
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
2022-05-24 16:13:33 +00:00
Icecream95 0a53ebabcd pan/mdg: Read base for combined stores
Fixes depth/stencil writes with MRT.

Fixes: b3d7272753 ("pan/mdg: Don't read base for combined stores")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
2022-05-24 16:13:33 +00:00
Icecream95 f1a226dd24 pan/bi: Read base for combined stores
Fixes depth/stencil writes with MRT.

Fixes: 996645e479 ("pan/bi: Don't read base for combined stores")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
2022-05-24 16:13:33 +00:00
Icecream95 9f9ed959bd nir: Add store_combined_output_pan BASE back
It's meaningful for this intrinsic and so does not add noise to the
lowering pass.

(Although dual-source writes must be to RT 0, depth and stencil
writes, which store_combined_output_pan is also used for, can still be
done with MRT enabled.)

Fixes: 5c168f09eb ("nir: Eliminate store_combined_output_pan BASE")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16685>
2022-05-24 16:13:33 +00:00
Icecream95 2f2ddfa0ac panfrost: Move patched_s out of the pan_blitter_views struct
The struct is returned from a function, so in debug builds the address
may change after returning, and pointers to patched_s will be broken.

Pass the pointer to the patched stencil view as a parameter to
pan_preload_get_views to avoid this.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16343>
2022-05-20 23:17:07 +00:00
Icecream95 f1f39fa645 panfrost: Increase the limit for blend shader variants
Qt uses blend constants to set text colour, this will allow more
colours onscreen before thrashing happens.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16343>
2022-05-20 23:17:07 +00:00
Icecream95 80404c8b64 panfrost: Copy blend constant into variant even when reusing it
Otherwise future lookups will match searches for the old constant.

Fixes: bbff09b952 ("panfrost: Move the blend shader cache at the device level")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6355
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16343>
2022-05-20 23:17:07 +00:00
Alyssa Rosenzweig d6ece34d0c pan/va: Use ^ instead of ` to indicate last-use
This syncs the ISA syntax with other Valhall ISA users. It's also somewhat
easier to read.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 9fb8ca1851 pan/va: Remove DISCARD.f32 destination
It doesn't actually write anything. This is a pointless divergence from Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 444469d64e pan/va: Handle 2-src blend in lower_split_src
Fixes assertion fail in shaders/dolphin/smg.1.shader_test

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 0576cad958 pan/bi: Validate vector widths
Now that our IR is much more strongly typed, and RA code quality depends on
correct typing, add a validation pass to make sure we didn't screw it up. This
pass found a massive number of bugs in early versions of this series.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 4c1bb23a86 pan/bi: Validate preload constraints are satisfied
We tightened the rules around preloading substantially and take advantage of the
rules in RA. The safe helpers it introduced should ensure the rules are
followed, but just in case, add a validation pass to check our work. This pass
found (multiple) bugs in early versions of this series.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 3636cddde1 pan/bi: See through splits for var_tex fusion
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 1f25f78a9f pan/bi: Optimize split of collect
Required to get decent codegen from UBO pushing.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 4a8bde2190 pan/bi: Don't propagate discard
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig d81b872465 pan/bi: Remove liveness metadata tracking
We don't use it for anything, and with no pass infrastructure it's just an
accident waiting to happen.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 3df5446cbd pan/bi: Simplify register precolouring in the IR
In the current IR, any register may be preloaded by reading it anywhere, and any
register may be precoloured by writing it anywhere. This is convenient for
instruction selection, but requires the register allocator to do considerable
gymnastics to ensure it doesn't clobber precoloured registers. It also breaks
the purity of our SSA representation, which complicates optimization passes
(e.g. copyprop).

Let's trade some instruction selection complexity for simplifying register
allocation by constraining how register precolouring works. Under the new model:

* Registers may only be preloaded at the start of the program.
* Precoloured destinations are handled explicitly by RA.

Internally, a stronger invariant is placed for preloading: registers may only be
preloaded by MOV.i32 instructions at the beginning of the block, and these moves
must be unique. These invariants ensure RA can trivially coalesce the moves.

A bi_preload helper is added as a safe version of bi_register respecting these
invariants, allowing a smooth transition for instruction selection.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig dab5b62ecf pan/bi: Remove bi_word and bi_word_node
They are no longer used, as offsets are no longer used for normal values (only for
FAU). Keep it like that.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig f0184cf218 pan/bi: Scalarize copyprop
Reduces memory footprint.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig c6349278f9 pan/bi: Scalarize modifier propagation
Reduces memory footprint.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig e332e2edc1 pan/bi: Scalarize bi_opt_cse
Reduces memory footprint.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 187dd382cb pan/bi: Scalarize bi_lower_swizzle
Reduces memory footprint.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 5b1c642cee pan/va: Don't use bi_word in FAU unit test
It will be removed shortly, as the FAU construction helper should be used
instead.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 67569b3c23 pan/va: Use split for 64-bit lowering
Written in this way, this pass looks pretty silly...

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 5febeae58e pan/bi: Emit collect and split
..Rather than using offsets during instruction selection.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 4731e9e55a pan/bi: Simplfy BLEND emit
We don't need to collect anything, now that Valhall handles this case correctly.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 7bfaa119f4 pan/bi: Lift split/collect cache from AGX
Design based on ACO (and fruitful discussions with Daniel).

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 8fdb01b96f pan/bi: Create COLLECT during isel
This transitions us away from the fake SSA we currently use for vectors.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 5c0977d230 pan/bi: Expand MAX_DESTS to 4
For splits.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 9924e6f291 pan/bi: Fix mov and pack_32_2x16
Move can take in a vector and write a scalar, depending on the swizzle. We need
to handle this case. Split out mov and pack_32_2x16 so we can specify correct
behaviour for both. Also drop unused 1-bit boolean stuff which obscured the fix.

Fixes: 76cea8e27b ("panfrost: Fix pack_32_2x16 implementation")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig bc17288697 pan/bi: Lower split/collect before RA
For transitioning to the new scalarized IR.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 0c7f126277 pan/bi: Add bi_before_block cursor
Useful for preloading.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 298d20f805 pan/bi: Add collect and split instructions
These move-like instructions will be generated during instruction selection and
lowered before/after register allocation.

These need special printer support until we get dynamic sources/destinations.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig afd88d1380 pan/bi: Add source/destination counts
In preparation for dynamic allocation, as needed for phi nodes and parallel
copies. For now, it just serves to simplify the semantics of splits and
collects.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 0523b6b89b pan/bi: Use value-based interference with LCRA
"Revisiting Out-of-SSA Translation for Correctness, Code Quality, and
Efficiency" discusses "value-based interference": two variables interfere if and
only if there exists a point in the program where they are both live *with
different values*. In particular, the source and destination of a move do not
interfere a priori, because they have the same value at that point in the
program. (If a later instruction overwrites one, the required interference will
be added there).

We can use this idea to avoid some extra interferences, avoiding a regression in
moves from split/collect.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 896dc63623 pan/bi: Lower phis to scalar
If we don't lower phis to scalar, when we go out of SSA, we can get vector
nir_registers. In particular, we can get code like:

   r0 = vec2 r0.y, r0.x

This code looks like a move, but is in fact a swap. The trivial lowering of vec2
would not work -- the following fails to swap correctly:

   r0.x = r0.y
   r0.y = r0.x

Currently, we generate temporaries to handle these cases. It's easy to move the
complexity to NIR, though, and we'll want to scalarize phis for SSA-based RA
anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig c8882ee115 pan/bi: +JUMP can't read same-cycle temp
Minor ISA detail missed in the Bifrost scheduler. I hit this in an early version
of this series (where a move feeding into a blend shader return was not
coalesced). Let's get it fixed in the scheduler.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig c387096eca pan/va: Use 64-bit lowering for texturing
Texture instructions on Valhall take 64-bit sources. Now that we have
infrastructure to handle this properly, we don't need to use a non-SSA node to
hack around the optimization.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 89a3746bc1 pan/va: Lower split 64-bit sources
This ensures Valhall 64-bit constraints are respected in a simple way. It's not
the most efficient, though. Optimization is deferred until full Valhall support
is upstreamed and the RA is overhauled.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 65839d8c3c pan/va: Mark more source sizes
This source size information will be consumed by the 64-bit lowering pass, so
ensure it's accurate. That means marking 32-bit and 64-bit sources explicitly on
message passing where it wouldn't match up with the type size suffix of the
instruction.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Alyssa Rosenzweig 04a1df8c65 pan/bi: Update bi_count_write_registers for Valhall
We add some new instructions on Valhall with special register requirements
(texturing, atomics). Handle these appropriately so we can do RA on Valhall.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16585>
2022-05-19 16:08:26 +00:00
Jason Ekstrand fc8d2543fc vulkan,v3dv: Add a driver_internal flag to vk_image_view_init/create
We already had a little workaround for v3dv where, for some if its meta
ops, it had to bind a depth/stenicil image as color.  Instead of
special-casing binding depth/stencil as color, let's flip on the
drier_internal flag and get rid of most of the checks in that case.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16376>
2022-05-17 18:14:55 +00:00
Tomeu Vizoso 9e031426be panvk/ci: Disable CI for a while
We have been hitting OOM conditions quite often and this is making ti
hard to get stuff merged.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16554>
2022-05-17 09:16:21 +00:00
Timothy Arceri d7a071a28f gallium/drivers: set force_indirect_unrolling_sampler for all required drivers
This is set to true for all drivers that have a GLSL level
of support lower than 4.00. This matches the rule for setting the
GLSL IR option EmitNoIndirectSampler.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
2022-05-17 02:12:21 +00:00
Jason Ekstrand 9e22e2ac88 panvk: Lower blending after lower_var_copies
nir_lower_blend needs store_deref as does
io_arrays_to_elements_no_indirects.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16483>
2022-05-16 21:43:47 +00:00
Jason Ekstrand 4050697a8f panvk: So more nir_lower_tex before descriptor lowering
Some texture lowering generates more txs which means it needs to happen
before we lower descriptors because descriptor lowering is where txs is
actually handled in panvk.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16483>
2022-05-16 21:43:47 +00:00
Jason Ekstrand 36bb62139e bifrost: Run nir_lower_global_vars_to_local before nir_lower_vars_to_scratch
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16483>
2022-05-16 21:43:47 +00:00
Timothy Arceri 7647023f3b glsl: enable the use of the nir based varying linker
Here as well as calling the pass we need to switch the order of
some of the information gathering and optimisation calls. We also
need to create a custom callback for the dead variables removal
pass to clean up dead builtin varying in SSO programs without
causing piglit regressions.

shader-db results IRIS (BDW):

total instructions in shared programs: 17487900 -> 17477072 (-0.06%)
instructions in affected programs: 128682 -> 117854 (-8.41%)
helped: 587
HURT: 82
helped stats (abs) min: 1 max: 145 x̄: 18.82 x̃: 20
helped stats (rel) min: 0.21% max: 77.78% x̄: 17.41% x̃: 8.85%
HURT stats (abs)   min: 1 max: 6 x̄: 2.68 x̃: 2
HURT stats (rel)   min: 0.25% max: 9.76% x̄: 2.94% x̃: 2.16%
95% mean confidence interval for instructions value: -17.71 -14.66
95% mean confidence interval for instructions %-change: -16.40% -13.42%
Instructions are helped.

total cycles in shared programs: 857442520 -> 857170199 (-0.03%)
cycles in affected programs: 112252720 -> 111980399 (-0.24%)
helped: 13733
HURT: 13349
helped stats (abs) min: 1 max: 7293 x̄: 81.44 x̃: 10
helped stats (rel) min: <.01% max: 90.32% x̄: 3.30% x̃: 0.62%
HURT stats (abs)   min: 1 max: 7424 x̄: 63.38 x̃: 8
HURT stats (rel)   min: <.01% max: 192.23% x̄: 3.28% x̃: 0.54%
95% mean confidence interval for cycles value: -14.01 -6.10
95% mean confidence interval for cycles %-change: -0.17% 0.06%
Inconclusive result (%-change mean confidence interval includes 0).

total sends in shared programs: 971443 -> 970010 (-0.15%)
sends in affected programs: 4596 -> 3163 (-31.18%)
helped: 446
HURT: 39
helped stats (abs) min: 1 max: 6 x̄: 3.40 x̃: 4
helped stats (rel) min: 3.03% max: 85.71% x̄: 46.48% x̃: 50.00%
HURT stats (abs)   min: 1 max: 3 x̄: 2.15 x̃: 2
HURT stats (rel)   min: 6.67% max: 25.00% x̄: 15.16% x̃: 10.53%
95% mean confidence interval for sends value: -3.13 -2.78
95% mean confidence interval for sends %-change: -44.16% -38.88%
Sends are helped.

LOST:   235
GAINED: 262

Shader-db results radeonsi (RX580):

169505 shaders in 102144 tests
Totals:
SGPRS: 7698832 -> 7696552 (-0.03 %)
VGPRS: 5547296 -> 5545280 (-0.04 %)
Spilled SGPRs: 14795 -> 14773 (-0.15 %)
Spilled VGPRs: 3782 -> 3782 (0.00 %)
Private memory VGPRs: 1152 -> 1152 (0.00 %)
Scratch size: 3872 -> 3872 (0.00 %) dwords per thread
Code Size: 162946528 -> 162895264 (-0.03 %) bytes
Max Waves: 2449334 -> 2449736 (0.02 %)

Totals from affected shaders:
SGPRS: 215024 -> 212744 (-1.06 %)
VGPRS: 151976 -> 149960 (-1.33 %)
Spilled SGPRs: 162 -> 140 (-13.58 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5249916 -> 5198652 (-0.98 %) bytes
Max Waves: 54588 -> 54990 (0.74 %)

Panfrost trace checksum is updated as per discussion in:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6343

Some virpipe tess shader piglit tests are added as failures to CI
these failures are not a regression but an uncovered existing bug
exposed due to the linker no longer sorting internally facing
shader interfaces in alphabetical order. See details in:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6481

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15731>
2022-05-16 03:33:18 +00:00
Jason Ekstrand 5ef9bd5ff2 panvk: Round FillBuffer sizes down to a multiple of 4
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand ad05bc9315 panvk: Drop panvk_descriptor
The API-style representation of descriptors is no longer used by
anything so let's get rid of it.  All we really need is the data in the
descriptor set itself.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand d783f8949e panvk: Implement descriptor copies properly
All we were doing was copying panvk_descriptor structs around which
don't actually contain data that's used by anything interesting.  We
need to copy the actual data arround.  Annoyingly, that means we need a
descriptor copy function per descriptor type.  Woo!

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand f6268220c2 panvk: Set immutable samplers properly up-front
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 935fd18bc3 panvk: Rewrite the write portion of vkUpdateDescriptorSets
The new design is based on the ANV code which I massively cleaned up
some time ago.  Each descriptor type has a write function and they have
consistent prototypes.  This makes it all much easier to read and figure
out what's going on.  It also makes it easier to make changes going
forward because you aren't re-plumbing function arguments if you ever
change the type of data in any given descriptor type.  You just change
the write function.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 53f53b577f panvk: Re-arrange descriptor set functions
Put them in the order we call them which is also roughly descriptor type
enum order.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 28333e039c FIXUP: Use 16-bit things for texture sizes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 38a0742f6a panvk: Implement texture/image queries
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 714e125ae4 pavnk: Pass bind layouts to texture and image descriptor helpers
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 6ed298dce7 panvk: Add an elems field to panvk_buffer_view
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 6621ab8bf9 panvk: Advertise VK_KHR_variable_pointers
Now that our SSBO descriptor handling code no longer craws deref chains
back to the variable, we should be handling variable pointers properly.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand d9f9955f9e panvk: Enable robustBufferAccess
It should already work for UBOs.  This should do everything we need for
SSBOs.  Not sure about vertex and index buffers but we can deal with
those later.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand a463c58e22 panvk: Put SSBO addresses in the descriptor buffer
Instead of storing SSBO pointers in the very limited sysval space, store
them in the UBO we've attached to the descriptor set.  This gives us a
virtually unlimited number of SSBOs.  Dynamic SSBOs still live in the
sysval space so we can update them as part of vkCmdBindDescriptorSets().
Also, the new code (based on the code in ANV) loads those SSBO addresses
in a way that never chases the deref chain back to the variable so we
should now be able to handle all of variable pointers.  The code as
written in this patch is a bit overly generic because it switches on
address modes a bit more than panvk needs but we ended up needing all
that flexibility in ANV so we may as well leave hooks for it in panvk.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand e265583ee1 panvk: Interleave UBOs with multiple descriptor sets
The original intention was to put all the non-dynamic UBOs first
followed by all the dynamic ones.  However, we got the calculations
wrong and, once you went above one descriptor set, things start stomping
each other.

Also, the whole strategy is a bit busted.  Vulkan pipeline layout
compatability rules say that it's ok to create a pipeline with one
layout and then bind with another so long as the bottom N descriptor set
layouts match and the pipeline uses at most N descriptors.  This means
that, while it's safe to have each subsequent set add onto a given pool
of descriptors, if you're going to combine two of those pools, you need
to be careful that the position of descriptors in set N only depends on
the layouts of sets M <= N.  The easy way to do this is to interleve
where we do the UBOs for set 0 then dynamic for set 0 then UBOs for set
1 then dynamic for set 1, etc.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 6d15d65e19 panvk: Put the sysval and push const UBOs at fixed indices
In theory, this may cost us a tiny bit of descriptor space but in
practice, given that the viewport transform is a sysval, we'll always
need it for 3D and given that SSBO pointers live there, we'll basically
always need it for compute.  It also makes a lot of things simpler.
We're about to start using the sysval UBO directly in our descriptor set
code and knowing the index up-front is really nice.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand 744b977963 panvk: Stop calling lower_uniforms_to_ubo
We don't need it because Vulkan doesn't have GL-style uniforms.  It
*shouldn't* be doing anything but sometimes it inserts an extra UBO
binding and adds 1 to all our UBO indices for no good reason.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand c32ddb5e77 panvk: Use a flat sysvals struct
PanVK uses fewer sysvals than the GLES driver, as some data that would
be a data in GLES is instead part of the descriptor set or the pipeline
state in Vulkan. Therefore, it is simpler and more efficient to use a
flat, fixed layout provided by the driver for our sysvals, rather than
the compiler choosing a layout.

This commit switches to a flat sysval layout.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand e6091cc578 panvk: Get rid of the per-pipeline sysvals BO
This is a micro-optimization and probably not a correct one at that.
The cost involved in re-uploading the viewport is tiny compared to the
mental overhead from trying to do this juggle.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand f0a47d8602 bifrost,midgard: Allow providing a fixed sysval layout
Vulkan doesn't need nearly as many system values and would like to bake
its layout up-front instead of having it provided by the back-end
compiler.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:16 +00:00
Jason Ekstrand e07a296398 panfrost: Add some sanity checking for sysvals
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:15 +00:00
Jason Ekstrand 4e60f0655a panfrost,panvk: Make fixed_sysval_ubo < 0 mean compiler-assigned
In 3559efb9bf ("panfrost: Allow passing an explicit UBO index for the
sysval UBO"), an explicit UBO index was added and it was implicitly
assumed that it would be > num_ubos.  This was convenient because it
meant 0, the default for designated initializers, implicitly meant
compiler-assigned.  However, we're about to move the sysval UBO to 0
which breaks this assumption.   Also, we don't want the back-end
compiler to even look at num_ubos since it's meaningless in Vulkan.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:15 +00:00
Jason Ekstrand 42aca84704 panvk: Add a buffer to each descriptor set
Later in the series, we will map descriptor sets to driver-internal
buffers bound as UBOs. These buffers will contain various internal data,
like buffer and texture sizes. Resource access will be lowered to pull
from this UBO in the shader. To prepare, create a backing buffer when
creating descriptor set and emit a UBO record so we can bind it.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:15 +00:00
Jason Ekstrand bcea5ed2b6 panvk: Break descriptor lowering into its own file
It's about to get a lot more complicated so let's split it out.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:15 +00:00
Jason Ekstrand 8af805a475 panvk: Move CreateDescriptorSetLayout to per-arch
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16276>
2022-05-12 10:53:15 +00:00