Commit Graph

1779 Commits

Author SHA1 Message Date
Eric Engestrom 2c67457e5e util/list: rename LIST_ENTRY() to list_entry()
This follows the Linux kernel convention, and avoids collision with
macOS header macro.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6751
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6840
Cc: mesa-stable
Signed-off-by: Eric Engestrom <eric@igalia.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17772>
2022-07-28 10:10:44 +00:00
Alyssa Rosenzweig 29c33f75d3 pan/va: Stall after ATEST
In theory this wait is required for correct behaviour of discarded threads with
ATEST. Mesa usually waits before the instruction after ATEST, so this wait will
get optimized out by va_merge_flow, but as our scheduler gets more sophisticated
this could become an issue.

Let's stay on the safe side and insert the recommended wait.

No shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
2022-07-13 21:05:35 +00:00
Alyssa Rosenzweig db2bdc1dc3 pan/bi: Require ATEST coverage mask input in R60
In theory, ATEST can take any combination of registers for inputs.
Experimentally, however, ATEST requires the coverage mask in R60. This avoids
regressing the following dEQP tests, which write their coverage mask with
pixel-frequency-shading but without writing to the depth/stencil buffer.

dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.*

This issue is known to affect both Mali-G52 (v7) and Mali-G57 (v9). I am unsure
if this is a silicon bug or just an obscure implementation detail.

No shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
2022-07-13 21:05:35 +00:00
Alyssa Rosenzweig d458384883 pan/va: Handle BIFROST_MESA_DEBUG=nosb
For debugging flakes that might be caused due to wrong scoreboarding.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17223>
2022-07-13 18:46:15 +00:00
Alyssa Rosenzweig 2171412c66 pan/va: Print instructions with pack assert fails
va_pack asserts a large number of invariants about the instruction being packed.
If one of these fails (due to an invalid instruction), it's helpful to inspect
the failing instruction, as it may not be apparent in a large shader. Pass the
instruction through with all the assertions in va_pack for easier debugging.

Now assertion failures when packing are easier to debug:

   Invalid invariant lo.type == hi.type:
       = STORE.i32.flow2.wls br5, br2, wls_ptr[1], byte_offset:0

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
2022-07-13 18:01:56 +00:00
Alyssa Rosenzweig cfeafef755 pan/va: Use invalid_instruction in more places
By passing the instruction pointer through the packer, we can get better error
messages with invalid_instruction.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
2022-07-13 18:01:56 +00:00
Alyssa Rosenzweig cc94409d70 pan/va: Dump unencodable instructions
When we assert out due to certain invalid encoding, it's helpful to know what
instruction is causing the failure, since it may not be obvious from the
assembly for large shaders. Now we get nice errors when failing:

   Invalid opcode:
      br0 = VAR_TEX.f32.flow8.store.skip.lod_mode.center , texture_index:0, varying_index:0

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
2022-07-13 18:01:56 +00:00
Icecream95 bcd85a74e8 pan/va: Use the _safe iterator when adding blend shader calls
Otherwise the list 'next' changing will cause the assertion in
list_for_each_entry to be hit.

This was not hit before because list_assert is defined for debug
builds but not debugoptimized.

Fixes: 5067a26f44 ("pan/bi: Use flow control lowering on Valhall")
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
2022-07-08 01:30:23 +00:00
Alyssa Rosenzweig fbe430fae9 panfrost: Move bifrost_lanes_per_warp to common
Whereas the compiler needs to know the warp size for lowering divergent
indirects, the driver needs to know it to report the subgroup size. Move the
Bifrost-specific helper to common and add the trivial implementation for
Midgard.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
2022-07-08 01:14:55 +00:00
Alyssa Rosenzweig 5aa740bc8e pan/bi: Implement f2f16{_rtz, _rtne}
Float conversions with explicit rounding modes are required for OpenCL,
as well as for Vulkan with the VK_KHR_16bit_storage extension (mandatory
in Vulkan 1.1). Since the hardware conversion instructions allow
configuring the round mode, this is easy to support :-)

Fixes test_half.vstore_half_rtz.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>
2022-07-08 00:57:18 +00:00
Alyssa Rosenzweig 5f599fdef6 pan/va: Add missing <roundmode/> to V2F32_TO_V2F16
So we can implement f2f16_rtz.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>
2022-07-08 00:57:18 +00:00
Alyssa Rosenzweig 9bd7570e96 pan/bi: Fix unpack_32_2x16 definition
This got messed up when scalarizing the IR. Fix the definition of the opcode to
return (instead of break, asserting out) and to respect the swizzle (instead of
failing validation). Noticed when bringing up OpenCL on Valhall.

Fixes: 5febeae58e ("pan/bi: Emit collect and split")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17222>
2022-07-07 23:16:39 +00:00
Alyssa Rosenzweig 154929d731 pan/va: Handle terminal barriers
If a shader ends with a workgroup barrier, it must wait for slot #7 at the end
to finish the barrier. After inserting flow control, we get:

   BARRIER
   NOP.wait
   NOP.end

Currently, the flow control pass assumes that .end implies all other control
flow, and will merge this down to

   BARRIER.end

However, this is incorrect. Slot #7 is no longer waited on. In theory, this
cannot affect the correctness of the shader. In practice, the hardware checks
that all barriers are reached. Terminating without waiting on slot #7 first
raises an INSTR_BARRIER_FAULT. We need to weaken the flow control merging
slightly to avoid this incorrect merge, instead emitting:

   BARRIER.wait
   NOP.end

Of course, all of these cases are inefficient: terminal barriers shouldn't be
emitted in the first place. I wrote out an optimization for this. We can merge
it if we find a workload that it actually helps.

Fixes test_half.vstore_half.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17264>
2022-07-05 14:48:09 +00:00
Alyssa Rosenzweig 3fedf22b60 pan/bi: Tune lower_vars_to_scratch
Increase the threshold to lower indirect indexing of arrays to scratch memory
all the way up to 256 bytes, which was the lowest power-of-two threshold for
which enabling the pass on Mali-G57 was a win in shaderdb.

It's difficult to tell what threshold is optimal here. The shader-db stats are
based on a rough cycle model that assumes a 16:1 ratio between CVT and
load/store on Valhall, and a 24:1 ratio between arithmetic and load/store on
Bifrost. Those ratios are at most rules of thumb, as the number of cycles
required by a load/store instruction will vary tremendously based on caching and
the memory controller. However, they may well be lower bounds (if those are the
upper bounds on instruction issuing in the Mali shader cores). As such, a large
threshold seems well motivated.

shader-db results on Mali-G52 follow, results on Mali-G57 were similar. Note the
shader that's hurt for spills/fills is *helped* for load/store overall.

cycles helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%))
ldst helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%))

total instructions in shared programs: 2415410 -> 2415372 (<.01%)
instructions in affected programs: 1041 -> 1003 (-3.65%)
helped: 3
HURT: 0
helped stats (abs) min: 2.0 max: 31.0 x̄: 12.67 x̃: 5
helped stats (rel) min: 2.08% max: 6.02% x̄: 3.90% x̃: 3.60%

total tuples in shared programs: 1928558 -> 1928527 (<.01%)
tuples in affected programs: 826 -> 795 (-3.75%)
helped: 2
HURT: 1
helped stats (abs) min: 6.0 max: 26.0 x̄: 16.00 x̃: 16
helped stats (rel) min: 3.72% max: 9.68% x̄: 6.70% x̃: 6.70%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.54% max: 1.54% x̄: 1.54% x̃: 1.54%

total clauses in shared programs: 355013 -> 354981 (<.01%)
clauses in affected programs: 220 -> 188 (-14.55%)
helped: 3
HURT: 0
helped stats (abs) min: 2.0 max: 27.0 x̄: 10.67 x̃: 3
helped stats (rel) min: 13.99% max: 21.43% x̄: 16.93% x̃: 15.38%

total cycles in shared programs: 166610.27 -> 166574.90 (-0.02%)
cycles in affected programs: 138 -> 102.62 (-25.63%)
helped: 3
HURT: 0
helped stats (abs) min: 0.4583330000000001 max: 31.0 x̄: 11.79 x̃: 3
helped stats (rel) min: 15.28% max: 65.28% x̄: 34.86% x̃: 24.03%

total arith in shared programs: 73690.13 -> 73690.58 (<.01%)
arith in affected programs: 29.71 -> 30.17 (1.54%)
helped: 1
HURT: 2
helped stats (abs) min: 0.0833339999999998 max: 0.0833339999999998 x̄: 0.08 x̃: 0
helped stats (rel) min: 3.85% max: 3.85% x̄: 3.85% x̃: 3.85%
HURT stats (abs)   min: 0.125 max: 0.4166659999999993 x̄: 0.27 x̃: 0
HURT stats (rel)   min: 1.66% max: 5.17% x̄: 3.42% x̃: 3.42%

total ldst in shared programs: 135611 -> 135571 (-0.03%)
ldst in affected programs: 138 -> 98 (-28.99%)
helped: 3
HURT: 0
helped stats (abs) min: 3.0 max: 31.0 x̄: 13.33 x̃: 6
helped stats (rel) min: 24.03% max: 100.00% x̄: 74.68% x̃: 100.00%

total quadwords in shared programs: 1674599 -> 1674523 (<.01%)
quadwords in affected programs: 838 -> 762 (-9.07%)
helped: 3
HURT: 0
helped stats (abs) min: 2.0 max: 65.0 x̄: 25.33 x̃: 9
helped stats (rel) min: 3.39% max: 15.00% x̄: 9.14% x̃: 9.04%

total spills in shared programs: 37 -> 40 (8.11%)
spills in affected programs: 17 -> 20 (17.65%)
helped: 0
HURT: 1

total fills in shared programs: 190 -> 196 (3.16%)
fills in affected programs: 34 -> 40 (17.65%)
helped: 0
HURT: 1

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig fd021a618f pan/va: Replace MKVEC.v4i8 with MKVEC.v2i8
This is the instruction that the hardware actually supports. Do the rename, use
the more specific accurate model in the IR, and rework the Valhall texturing
code to emit MKVEC.v2i8 instead of MKVEC.v4i8.

Will fix:

   dEQP-GLES31.functional.texture.gather.offset_dynamic.implementation_offset.*

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig c570693c19 pan/va: Pack MKVEC.v2i8 byte lanes
They are in a different place, but the encoding is otherwise as usual. This will
be required for texture gathers with dynamic offsets.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 10301885ab pan/bi: Constant fold MKVEC.v2i8
Constant MKVEC.v2i8 will be generated during texturing on Valhall, just like
constant MKVEC.v4i8 is currently generated.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 2833d0472a pan/bi: Model MKVEC.v2i8
Valhall does not have Bifrost's 4-source MKVEC.v4i8. Instead, it has a (somewhat
limtied) 3-source MKVEC.v2i8. The full MKVEC.v4i8 may be lowered to a pair of
MKVEC.v2i8 instructions.

For good code quality on both Bifrost and Valhall, we need to model both
instructions in their full generality.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 6792b15971 pan/bi: Remove FRSCALE from IR
It's just LDEXP in different clothing.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 21bedd2c97 pan/va: Rename RSCALE to LDEXP
This avoids needless variation from Bifrost. While at it, fix the opcode
definition: there are no abs/neg/swizzle modifiers on the signed integer source,
and there's no clamp. However, there are round and infinity modes, like on
Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 0da28ee2c7 pan/va: Implement sample positions FAU packing
This will fix:

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.at_sample_position.default_framebuffer

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 9dd0bc92b5 pan/va: Lower FADD_RSCALE.f32 to FMA_RSCALE.f32
We generate FADD_RSCALE.f32 in our sample variables implementations. Valhall
doesn't have a dedicated FADD_RSCALE.f32 implementation, it should be aliased to
FMA_RSCALE.f32. Handle that alias in isel lowering. This will fix:

   dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 1a882ecdab pan/bi: Align accesses with packed TLS
When lowering vars to scratch, we need to be careful with alignment on Valhall,
where packed TLS access must not straddle a 16-byte boundary. Fixes regressions
when enabling indirect access to temps on Valhall.

Fixes: 6761dbf891 ("panfrost: Use packed TLS on Valhall")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 5ee1179c94 pan/bi: Fix LD_BUFFER.i16 definition
This was missing the message, breaking UBO-to-push and who-knows-what-else, when
enabling fp16 const buffers.

Fixes: 3dc2095b07 ("pan/bi: Model LD_BUFFER instructions")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17101>
2022-06-21 22:42:34 +00:00
Alyssa Rosenzweig 40accfd3b7 pan/va: Unit test va_mark_last
This pass is super easy to unit test, so we have no excuse not to test
thoroughly. va_mark_last only inserts annotations in a shader without any
annotations, so our test cases are simply annotated shaders. The CASE macro just
has to compare the case against the case with the annotations stripped and added
back with va_mark_last.

In retrospect, I should have used that technique for the flow control insertion
tests too.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 4b7e337b45 pan/va: Mark last register reads
On Valhall, register reads may be marked as "last" [1]. Setting the last flag
promises the hardware that the value of the register is no longer required. This
may enable hardware optimizations. In particular, it may permit the hardware to
avoid register file writes if a write to the marked register is still in the
forwarding buffer. This may improve power efficiency.

In principle, this is trivial: run liveness analysis and mark killed sources,
like we would in an SSA-based register allocator. In practice, there are a few
wrinkles to avoid hazards around staging registers and 64-bit register pairs,
requiring some additional data flow analysis and fix ups. However, nothing here
is particularly "hard", and all the ideas are already in use for the Bifrost
scheduler and the Bifrost/Valhall scoreboard analyses.

[1] In Mesa's compiler, this is called discard for historical reasons.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig d4377e1255 pan/va: Use validate_register_pair for BLEND pack
Instead of open-coding. Noticed by inspection.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig b48933d641 pan/va: Include BLEND for va_swap_12
This helps "contain the crazy" and avoids special casing BLEND in compiler
passes. The Valhall instruction is roughly the same as its Bifrost counterpart,
as long as we fix up the source order (as we already do for bitwise operations)
everything works out.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 738a1572d2 pan/va: Move va_flow_is_wait_or_none to common
We want to use this helper in the "mark last" pass too.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 1b29a99b7b pan/va: Add header guards to valhall_enums.h
Otherwise we can't #include in multiple places.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig c5a8736552 pan/bi: Constify bi_is_staging_src argument
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 2075bff4e8 pan/bi: Mark bi_postra_liveness_ins as MUST_CHECK
Post-RA liveness relies on the caller updating the live variable with the
results of bi_postra_liveness_ins. It is not automatic, as with regular
liveness. This means ignoring the result of bi_postra_liveness_ins is surely an
error. Mark it as MUST_CHECK to catch that error at compile time.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 43d00c2971 pan/va: Unit test barrier handling
Add a unit test for the quirk discovered in the previos commit, because this
will cause flakes (instead of fails) if we get it wrong. Better have a
deterministic fail mode.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 8c6b9b9c92 pan/va: Workaround quirk of barrier handling
For some unknown reason, waiting for general slots (at least for memory stores)
doesn't work properly on a BARRIER instruction. We need to wait for all general
slots right before issuing the BARRIER in addition to the general wait on the
BARRIER itself. I don't know if this is a hardware bug or some hideous
gate-saving quirk, but I observe the Mali-G78 DDK using the same workaround,
which implies this really is necessary.

Fixes rare flakes in:

   dEQP-GLES31.functional.compute.shared_var.work_group_size.float_128_1_1

Note that the flakes from that test are extremely timing dependent. Without this
change, that test is racy but we almost always win the race. Reproducing the
issue reliably requires high system load (e.g. running the CTS in the
background) and simultaneously running that test a large number of times.

Minimal shader-db impact. In particular, no cycle count regressions.

total instructions in shared programs: 2699419 -> 2699458 (<.01%)
instructions in affected programs: 22014 -> 22053 (0.18%)
helped: 2
HURT: 25
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
HURT stats (abs)   min: 1.0 max: 3.0 x̄: 1.64 x̃: 1
HURT stats (rel)   min: 0.07% max: 2.82% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: 1.01 1.87
95% mean confidence interval for instructions %-change: 0.38% 0.88%
Instructions are HURT.

total cvt in shared programs: 14468.81 -> 14469.42 (<.01%)
cvt in affected programs: 221.33 -> 221.94 (0.28%)
helped: 2
HURT: 25
helped stats (abs) min: 0.015625 max: 0.015625 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18%
HURT stats (abs)   min: 0.015625 max: 0.046875 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 0.10% max: 4.44% x̄: 1.06% x̃: 0.79%
95% mean confidence interval for cvt value: 0.02 0.03
95% mean confidence interval for cvt %-change: 0.57% 1.36%
Cvt are HURT.

total quadwords in shared programs: 1462496 -> 1462528 (<.01%)
quadwords in affected programs: 4632 -> 4664 (0.69%)
helped: 0
HURT: 4
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.35% max: 7.69% x̄: 4.03% x̃: 4.03%
95% mean confidence interval for quadwords value: 8.00 8.00
95% mean confidence interval for quadwords %-change: -2.71% 10.76%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 7fa545528d pan/va: Simplify insert flow tests
Test cases for insert flow are necessarily the reference test cases with the
NOPs stripped out. That means we don't need to duplicate the test bodies.
Deduplicate.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 35fcf8d3d7 pan/va: Move VA_NUM_GENERAL_SLOTS to common
This definition is a hardware property. It's not specific to the flow control
insertion pass, so move it to common code where other passes can use it.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17091>
2022-06-21 22:19:59 +00:00
Alyssa Rosenzweig 90beea75f6 pan/bi: Don't reorder push with no_ubo_to_push
Otherwise, load_push_constant won't work properly. This could probably be made
to work if we tried hard enough, but we still don't want reordering for internal
(meta) shaders which are layed out deliberately.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig 17ea1642e2 pan/bi: Implement load_push_constant
Bifrost supports "fast access uniforms" loaded from a single contiguous buffer.
This maps directly to Vulkan push constants, with some caveats:

* No indirect access. Indirects need to be lowered to a UBO pull.
* Strict alignment requirements. These will be met in practice.

Implement the NIR intrinsic and map it to the native hardware construct.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16916>
2022-06-08 13:42:42 +00:00
Alyssa Rosenzweig 28801cfba0 pan/va: Unit test constant lowering pass
Like other optimizations, breaking this pass may not affect functional
correctness. It's also dead simple to unit test the pass, so we have no excuse
not to. Add unit tests for the functionality we currently support, since we just
extended it and want to make sure everything still works.

This includes tests for use of modifiers to get more small constants. There are
lots of subtle gotchas there, so let's add lots of unit tests to make sure we
got it right.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
2022-06-06 18:10:24 +00:00
Alyssa Rosenzweig 9cfafbb09b pan/va: Try widening small constants
Many small integers are availabled as small constants, but the table of small
constants is tightly packed. Zero and sign extensions are usually required to
access small integers. When packing constants, try zero/sign extension for
unsigned/signed integer instructions respectively.

total instructions in shared programs: 2716912 -> 2707795 (-0.34%)
instructions in affected programs: 1045609 -> 1036492 (-0.87%)
helped: 4460
HURT: 125
helped stats (abs) min: 1.0 max: 58.0 x̄: 2.14 x̃: 1
helped stats (rel) min: 0.14% max: 23.85% x̄: 1.35% x̃: 0.88%
HURT stats (abs)   min: 1.0 max: 68.0 x̄: 3.41 x̃: 1
HURT stats (rel)   min: 0.34% max: 3.88% x̄: 0.93% x̃: 0.70%
95% mean confidence interval for instructions value: -2.09 -1.89
95% mean confidence interval for instructions %-change: -1.33% -1.25%
Instructions are helped.

total cycles in shared programs: 141984.06 -> 141932.42 (-0.04%)
cycles in affected programs: 552.08 -> 500.44 (-9.35%)
helped: 18
HURT: 0
helped stats (abs) min: 0.015625 max: 11.0 x̄: 2.87 x̃: 0
helped stats (rel) min: 0.50% max: 19.64% x̄: 5.36% x̃: 1.53%
95% mean confidence interval for cycles value: -5.17 -0.56
95% mean confidence interval for cycles %-change: -9.28% -1.44%
Cycles are helped.

total cvt in shared programs: 13805.05 -> 13663.39 (-1.03%)
cvt in affected programs: 6127.45 -> 5985.80 (-2.31%)
helped: 4460
HURT: 125
helped stats (abs) min: 0.015625 max: 0.90625 x̄: 0.03 x̃: 0
helped stats (rel) min: 0.35% max: 50.00% x̄: 5.19% x̃: 4.00%
HURT stats (abs)   min: 0.015625 max: 1.0625 x̄: 0.05 x̃: 0
HURT stats (rel)   min: 0.77% max: 9.30% x̄: 3.40% x̃: 2.78%
95% mean confidence interval for cvt value: -0.03 -0.03
95% mean confidence interval for cvt %-change: -5.10% -4.81%
Cvt are helped.

total ls in shared programs: 129545 -> 129494 (-0.04%)
ls in affected programs: 495 -> 444 (-10.30%)
helped: 6
HURT: 0
helped stats (abs) min: 2.0 max: 11.0 x̄: 8.50 x̃: 11
helped stats (rel) min: 1.49% max: 19.64% x̄: 13.95% x̃: 19.64%
95% mean confidence interval for ls value: -12.68 -4.32
95% mean confidence interval for ls %-change: -23.23% -4.67%
Ls are helped.

total quadwords in shared programs: 1476416 -> 1469824 (-0.45%)
quadwords in affected programs: 121208 -> 114616 (-5.44%)
helped: 820
HURT: 16
helped stats (abs) min: 8.0 max: 32.0 x̄: 8.28 x̃: 8
helped stats (rel) min: 1.39% max: 50.00% x̄: 11.00% x̃: 10.00%
HURT stats (abs)   min: 8.0 max: 32.0 x̄: 12.50 x̃: 8
HURT stats (rel)   min: 1.38% max: 10.00% x̄: 6.19% x̃: 7.14%
95% mean confidence interval for quadwords value: -8.14 -7.63
95% mean confidence interval for quadwords %-change: -11.20% -10.15%
Quadwords are helped.

total threads in shared programs: 53633 -> 53663 (0.06%)
threads in affected programs: 39 -> 69 (76.92%)
helped: 33
HURT: 3
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for threads value: 0.64 1.02
95% mean confidence interval for threads %-change: 73.27% 101.73%
Threads are helped.

total spills in shared programs: 154 -> 103 (-33.12%)
spills in affected programs: 75 -> 24 (-68.00%)
helped: 6
HURT: 0

total fills in shared programs: 656 -> 656 (0.00%)
fills in affected programs: 148 -> 148 (0.00%)
helped: 2
HURT: 4

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
2022-06-06 18:10:23 +00:00
Alyssa Rosenzweig 72146051d5 pan/va: Try negating small constants when lowering
If a constant is used with a floating point instruction with a floating-point
negate modifier, we can use the modifier to negate constants in the table for
free. Each floating point in the table is positive, so this is required for
negative small constants.

total instructions in shared programs: 2728438 -> 2716912 (-0.42%)
instructions in affected programs: 1418220 -> 1406694 (-0.81%)
helped: 6053
HURT: 94
helped stats (abs) min: 1.0 max: 43.0 x̄: 1.94 x̃: 1
helped stats (rel) min: 0.06% max: 18.18% x̄: 1.34% x̃: 0.84%
HURT stats (abs)   min: 1.0 max: 5.0 x̄: 2.34 x̃: 2
HURT stats (rel)   min: 0.09% max: 21.43% x̄: 1.87% x̃: 0.91%
95% mean confidence interval for instructions value: -1.93 -1.82
95% mean confidence interval for instructions %-change: -1.34% -1.25%
Instructions are helped.

total cycles in shared programs: 142103 -> 141984.06 (-0.08%)
cycles in affected programs: 766.70 -> 647.77 (-15.51%)
helped: 97
HURT: 0
helped stats (abs) min: 0.015625 max: 40.0 x̄: 1.23 x̃: 0
helped stats (rel) min: 0.27% max: 41.24% x̄: 3.63% x̃: 2.08%
95% mean confidence interval for cycles value: -2.41 -0.04
95% mean confidence interval for cycles %-change: -4.68% -2.57%
Cycles are helped.

total cvt in shared programs: 13983.34 -> 13805.05 (-1.28%)
cvt in affected programs: 7952.45 -> 7774.16 (-2.24%)
helped: 6049
HURT: 98
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.03 x̃: 0
helped stats (rel) min: 0.25% max: 100.00% x̄: 4.74% x̃: 2.52%
HURT stats (abs)   min: 0.015625 max: 0.078125 x̄: 0.04 x̃: 0
HURT stats (rel)   min: 0.17% max: 100.00% x̄: 5.48% x̃: 2.54%
95% mean confidence interval for cvt value: -0.03 -0.03
95% mean confidence interval for cvt %-change: -4.83% -4.32%
Cvt are helped.

total ls in shared programs: 129660 -> 129545 (-0.09%)
ls in affected programs: 601 -> 486 (-19.13%)
helped: 7
HURT: 0
helped stats (abs) min: 3.0 max: 40.0 x̄: 16.43 x̃: 8
helped stats (rel) min: 2.88% max: 41.24% x̄: 17.41% x̃: 12.50%
95% mean confidence interval for ls value: -31.42 -1.44
95% mean confidence interval for ls %-change: -29.25% -5.58%
Ls are helped.

total quadwords in shared programs: 1482728 -> 1476416 (-0.43%)
quadwords in affected programs: 131200 -> 124888 (-4.81%)
helped: 798
HURT: 15
helped stats (abs) min: 8.0 max: 24.0 x̄: 8.06 x̃: 8
helped stats (rel) min: 0.34% max: 50.00% x̄: 10.15% x̃: 6.67%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 1.49% max: 100.00% x̄: 11.25% x̃: 2.78%
95% mean confidence interval for quadwords value: -7.92 -7.60
95% mean confidence interval for quadwords %-change: -10.52% -8.99%
Quadwords are helped.

total threads in shared programs: 53585 -> 53633 (0.09%)
threads in affected programs: 51 -> 99 (94.12%)
helped: 49
HURT: 1
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for threads value: 0.88 1.04
95% mean confidence interval for threads %-change: 90.97% 103.03%
Threads are helped.

total spills in shared programs: 125 -> 154 (23.20%)
spills in affected programs: 75 -> 104 (38.67%)
helped: 3
HURT: 4

total fills in shared programs: 800 -> 656 (-18.00%)
fills in affected programs: 476 -> 332 (-30.25%)
helped: 7
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
2022-06-06 18:10:23 +00:00
Alyssa Rosenzweig cecfa0c44a pan/va: Record which instructions are signed
We need to distinguish signed integer instructions from unsigned integer
instructions, to distinguish sign-extension and zero-extension of sources.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16862>
2022-06-06 18:10:23 +00:00
Alyssa Rosenzweig e57dfed419 pan/bi: Implement b2i with MUX
The result_type modifier propagation looks for MUX instructions, so using this
canonical b2i implementation allows the sequence b2i(cmp) to be fused.
It's also faster on its own: on Valhall, MUX may be implemented as CSEL on the
CVT unit, while AND may only be implemented on the SFU unit. So in case this
doesn't get fused, we expect 4x better throughput for b2i with this
implementation. Similarly, on Bifrost, MUX may be scheduled to either unit (as
CSEL on FMA or MUX on ADD), whereas AND may only be scheduled to FMA.

Results on Mali-G52:

total instructions in shared programs: 2419171 -> 2414814 (-0.18%)
instructions in affected programs: 272203 -> 267846 (-1.60%)
helped: 767
HURT: 0
helped stats (abs) min: 1.0 max: 138.0 x̄: 5.68 x̃: 2
helped stats (rel) min: 0.12% max: 15.57% x̄: 2.09% x̃: 0.68%
95% mean confidence interval for instructions value: -6.68 -4.68
95% mean confidence interval for instructions %-change: -2.37% -1.82%
Instructions are helped.

total tuples in shared programs: 1932822 -> 1929234 (-0.19%)
tuples in affected programs: 76485 -> 72897 (-4.69%)
helped: 380
HURT: 3
helped stats (abs) min: 1.0 max: 138.0 x̄: 9.46 x̃: 1
helped stats (rel) min: 0.14% max: 15.96% x̄: 3.81% x̃: 0.92%
HURT stats (abs)   min: 1.0 max: 6.0 x̄: 2.67 x̃: 1
HURT stats (rel)   min: 0.38% max: 8.57% x̄: 3.80% x̃: 2.44%
95% mean confidence interval for tuples value: -11.30 -7.44
95% mean confidence interval for tuples %-change: -4.27% -3.22%
Tuples are helped.

total clauses in shared programs: 356094 -> 355992 (-0.03%)
clauses in affected programs: 3264 -> 3162 (-3.12%)
helped: 80
HURT: 0
helped stats (abs) min: 1.0 max: 9.0 x̄: 1.27 x̃: 1
helped stats (rel) min: 0.81% max: 50.00% x̄: 4.83% x̃: 3.39%
95% mean confidence interval for clauses value: -1.49 -1.06
95% mean confidence interval for clauses %-change: -6.23% -3.43%
Clauses are helped.

total cycles in shared programs: 167337.10 -> 167329.19 (<.01%)
cycles in affected programs: 510.08 -> 502.17 (-1.55%)
helped: 80
HURT: 2
helped stats (abs) min: 0.041665999999999315 max: 0.7916659999999993 x̄: 0.10 x̃: 0
helped stats (rel) min: 0.51% max: 13.64% x̄: 2.12% x̃: 1.34%
HURT stats (abs)   min: 0.041665999999999315 max: 0.0416669999999999 x̄: 0.04 x̃: 0
HURT stats (rel)   min: 0.39% max: 2.78% x̄: 1.58% x̃: 1.58%
95% mean confidence interval for cycles value: -0.12 -0.07
95% mean confidence interval for cycles %-change: -2.59% -1.48%
Cycles are helped.

total arith in shared programs: 73819.54 -> 73669.25 (-0.20%)
arith in affected programs: 2840.54 -> 2690.25 (-5.29%)
helped: 383
HURT: 3
helped stats (abs) min: 0.041665999999999315 max: 5.75 x̄: 0.39 x̃: 0
helped stats (rel) min: 0.33% max: 18.81% x̄: 4.39% x̃: 0.98%
HURT stats (abs)   min: 0.041665999999999315 max: 0.25 x̄: 0.11 x̃: 0
HURT stats (rel)   min: 0.39% max: 8.96% x̄: 4.04% x̃: 2.78%
95% mean confidence interval for arith value: -0.47 -0.31
95% mean confidence interval for arith %-change: -4.93% -3.71%
Arith are helped.

total quadwords in shared programs: 1679798 -> 1676259 (-0.21%)
quadwords in affected programs: 72826 -> 69287 (-4.86%)
helped: 381
HURT: 15
helped stats (abs) min: 1.0 max: 142.0 x̄: 9.35 x̃: 1
helped stats (rel) min: 0.25% max: 18.87% x̄: 4.33% x̃: 1.13%
HURT stats (abs)   min: 1.0 max: 6.0 x̄: 1.47 x̃: 1
HURT stats (rel)   min: 0.30% max: 6.25% x̄: 0.77% x̃: 0.35%
95% mean confidence interval for quadwords value: -10.76 -7.11
95% mean confidence interval for quadwords %-change: -4.71% -3.56%
Quadwords are helped.

Results on Mali-G57:

total instructions in shared programs: 2704193 -> 2699317 (-0.18%)
instructions in affected programs: 293366 -> 288490 (-1.66%)
helped: 758
HURT: 5
helped stats (abs) min: 1.0 max: 151.0 x̄: 6.45 x̃: 2
helped stats (rel) min: 0.11% max: 22.22% x̄: 2.05% x̃: 0.64%
HURT stats (abs)   min: 1.0 max: 7.0 x̄: 2.20 x̃: 1
HURT stats (rel)   min: 0.22% max: 1.69% x̄: 0.87% x̃: 1.08%
95% mean confidence interval for instructions value: -7.42 -5.36
95% mean confidence interval for instructions %-change: -2.27% -1.79%
Instructions are helped.

total cycles in shared programs: 141711.73 -> 141711.84 (<.01%)
cycles in affected programs: 214.36 -> 214.47 (0.05%)
helped: 4
HURT: 42
helped stats (abs) min: 0.015625 max: 0.359375 x̄: 0.20 x̃: 0
helped stats (rel) min: 1.85% max: 12.78% x̄: 9.12% x̃: 10.93%
HURT stats (abs)   min: 0.015625 max: 0.09375 x̄: 0.02 x̃: 0
HURT stats (rel)   min: 0.17% max: 17.65% x̄: 0.84% x̃: 0.34%
95% mean confidence interval for cycles value: -0.02 0.03
95% mean confidence interval for cycles %-change: -1.23% 1.17%
Inconclusive result (value mean confidence interval includes 0).

total cvt in shared programs: 14479.14 -> 14474.19 (-0.03%)
cvt in affected programs: 2877.05 -> 2872.09 (-0.17%)
helped: 508
HURT: 209
helped stats (abs) min: 0.015625 max: 0.453125 x̄: 0.02 x̃: 0
helped stats (rel) min: 0.25% max: 16.67% x̄: 1.23% x̃: 0.37%
HURT stats (abs)   min: 0.015625 max: 0.296875 x̄: 0.03 x̃: 0
HURT stats (rel)   min: 0.15% max: 18.18% x̄: 1.70% x̃: 0.34%
95% mean confidence interval for cvt value: -0.01 -0.00
95% mean confidence interval for cvt %-change: -0.57% -0.18%
Cvt are helped.

total sfu in shared programs: 7875.69 -> 7590.75 (-3.62%)
sfu in affected programs: 1567.38 -> 1282.44 (-18.18%)
helped: 906
HURT: 0
helped stats (abs) min: 0.0625 max: 8.625 x̄: 0.31 x̃: 0
helped stats (rel) min: 2.38% max: 100.00% x̄: 16.80% x̃: 5.63%
95% mean confidence interval for sfu value: -0.37 -0.26
95% mean confidence interval for sfu %-change: -18.43% -15.17%
Sfu are helped.

total quadwords in shared programs: 1468152 -> 1465800 (-0.16%)
quadwords in affected programs: 37104 -> 34752 (-6.34%)
helped: 161
HURT: 2
helped stats (abs) min: 8.0 max: 80.0 x̄: 14.71 x̃: 8
helped stats (rel) min: 1.67% max: 20.00% x̄: 8.05% x̃: 7.69%
HURT stats (abs)   min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 3.57% max: 3.85% x̄: 3.71% x̃: 3.71%
95% mean confidence interval for quadwords value: -16.29 -12.57
95% mean confidence interval for quadwords %-change: -8.58% -7.22%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig 8f3b62f87e pan/va: Add MUX lowering tests
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig 677a66b3eb pan/va: Lower MUX to CSEL where possible
CSEL executes on the conversion unit (CVT), while MUX executes on the special
function unit (SFU). Throughput on CVT is 4x higher than SFU, so this is
(almost) always an optimization.

The "real" MUX is still used for unusual cases, like 8-bit and bitselect.

Note that it's easier for us to use MUX everywhere for the IR. This is an easy
fixup to get better codegen on Valhall without touching the core Bifrost code.

shader-db is a bit of a toss up: register pressure and instruction count are
hurt in some cases due to restrictions on FAU access. In particular, a shader
that muxes between two uniforms needs an extra move due to extra constant
(zero). However, in terms of throughput this is still a win: 2 CVT instructions
(MOV + CSEL) have 2x throughput to 1 SFU instruction (MUX). The MOV has
opportunities for CSE, but that can hurt pressure in turn. Overall, cycles are
helped substantially.

total instructions in shared programs: 2728438 -> 2731597 (0.12%)
instructions in affected programs: 414391 -> 417550 (0.76%)
helped: 87
HURT: 1063
helped stats (abs) min: 1.0 max: 6.0 x̄: 5.17 x̃: 6
helped stats (rel) min: 0.19% max: 15.79% x̄: 4.12% x̃: 4.11%
HURT stats (abs)   min: 1.0 max: 56.0 x̄: 3.40 x̃: 2
HURT stats (rel)   min: 0.11% max: 23.43% x̄: 1.15% x̃: 0.63%
95% mean confidence interval for instructions value: 2.47 3.03
95% mean confidence interval for instructions %-change: 0.61% 0.90%
Instructions are HURT.

total cycles in shared programs: 142103 -> 142015.75 (-0.06%)
cycles in affected programs: 1263.45 -> 1176.20 (-6.91%)
helped: 281
HURT: 176
helped stats (abs) min: 0.015625 max: 2.234375 x̄: 0.50 x̃: 0
helped stats (rel) min: 0.71% max: 54.17% x̄: 16.93% x̃: 15.31%
HURT stats (abs)   min: 0.015625 max: 30.0 x̄: 0.30 x̃: 0
HURT stats (rel)   min: 0.84% max: 120.00% x̄: 7.16% x̃: 5.00%
95% mean confidence interval for cycles value: -0.33 -0.05
95% mean confidence interval for cycles %-change: -9.08% -6.22%
Cycles are helped.

total cvt in shared programs: 13983.34 -> 14891.70 (6.50%)
cvt in affected programs: 7498.36 -> 8406.72 (12.11%)
helped: 71
HURT: 4711
helped stats (abs) min: 0.0625 max: 0.0625 x̄: 0.06 x̃: 0
helped stats (rel) min: 5.41% max: 40.00% x̄: 10.23% x̃: 9.30%
HURT stats (abs)   min: 0.015625 max: 2.640625 x̄: 0.19 x̃: 0
HURT stats (rel)   min: 0.18% max: 141.18% x̄: 16.21% x̃: 9.52%
95% mean confidence interval for cvt value: 0.18 0.20
95% mean confidence interval for cvt %-change: 15.21% 16.42%
Cvt are HURT.

total sfu in shared programs: 11320.44 -> 7882.56 (-30.37%)
sfu in affected programs: 7618.50 -> 4180.62 (-45.13%)
helped: 4782
HURT: 0
helped stats (abs) min: 0.0625 max: 10.5625 x̄: 0.72 x̃: 0
helped stats (rel) min: 1.34% max: 100.00% x̄: 41.91% x̃: 37.50%
95% mean confidence interval for sfu value: -0.75 -0.68
95% mean confidence interval for sfu %-change: -42.68% -41.14%
Sfu are helped.

total ls in shared programs: 129660 -> 129690 (0.02%)
ls in affected programs: 25 -> 55 (120.00%)
helped: 0
HURT: 1

total quadwords in shared programs: 1482728 -> 1484128 (0.09%)
quadwords in affected programs: 58624 -> 60024 (2.39%)
helped: 24
HURT: 195
helped stats (abs) min: 8.0 max: 8.0 x̄: 8.00 x̃: 8
helped stats (rel) min: 3.70% max: 20.00% x̄: 10.34% x̃: 10.00%
HURT stats (abs)   min: 8.0 max: 24.0 x̄: 8.16 x̃: 8
HURT stats (rel)   min: 1.41% max: 50.00% x̄: 4.84% x̃: 2.56%
95% mean confidence interval for quadwords value: 5.70 7.09
95% mean confidence interval for quadwords %-change: 2.22% 4.14%
Quadwords are HURT.

total spills in shared programs: 125 -> 127 (1.60%)
spills in affected programs: 0 -> 2
helped: 0
HURT: 1

total fills in shared programs: 800 -> 828 (3.50%)
fills in affected programs: 0 -> 28
helped: 0
HURT: 1

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig 3741606b25 pan/va: Implement more lanes
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig 1768afa5b9 pan/bi: Extract MUX to CSEL optimization
It's portable, and useful to both Bifrost and Valhall, in the clause scheduler
and in an instruction selection respectively. Move it from the Bifrost clause
scheduler to common code so we can share the benefits.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16857>
2022-06-06 16:08:25 +00:00
Alyssa Rosenzweig e1fb182d90 pan/va: Do not insert NOPs into empty shaders
It's unnecessary and breaks the empty shader optimizations. Noticed while
inspecting a trace from dEQP-GLES3.functional.color_clear.masked_scissored_rgb,
which does not produce any varyings other than gl_Position in its vertex shader
and hence should omit the varying shader.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16868>
2022-06-06 14:28:59 +00:00
Alyssa Rosenzweig 3b3cd59fb8 panfrost: Launch transform feedback shaders
We now have infrastructure in place to generate variants of vertex shaders
specialized for transform feedback. All that's left is launching these
compute-like kernels before the IDVS job, implementing both the
transform feedback and the regular rasterization pipeline. This implements
transform feedback on Valhall, passing the relevant GLES3.1 tests.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00
Alyssa Rosenzweig 4e341e70d8 pan/bi: Handle transform feedback intrinsics
Translate the intrinsics we introduced to lower away transform feedback into
Panfrost system values which the GL driver can handle.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
2022-06-04 14:35:56 +00:00