The 16-bit field can be decomposed to two independent 8-bit fields, each
representing a single (additional) argument to the load/store op,
generally used for encoding registers. Addressable registers here are
substantially limited compared to the main register in a load/store op.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to flip ops to have a constant in the second place to
enable inlining of the constant.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes a hang (and abort) on empty shaders, which you shouldn't have
anyway but better safe than sorry. DCE going on the fritz is no reason
to freeze the system.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to decode asymmetric varyings correctly, which occurs
with e.g. gl_FrontFacing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Corresponds to the normalized coordinates? flag on images in OpenCL and
evidently also shows up in GL, so let's wire it in.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's just a bit field containing some flags; there's no need for all the
macro magic.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We still don't know what it is, but from a newer trace we now know it's
half the size we thought it was.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We had them backwards in both the command stream and the Midgard stack.
In OpenGL ES 2.0, they're always the same, but in Vulkan/later-GL/CL
they diverge so we can fix this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Images are implemented (in part) as special attributes, so include
support for decoding this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This were seriously messed up beyond all recognition. How we're passing
shaders.random.* is a mystery.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Conceptually, r28-r29 (as used for reading) and r28-r29 (as used for
writing) aren't registers at all, merely push/pull arrangements. So you
can't feed a texture result back into itself without explicitly moving
in the middle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will enable us to fuse inverts in various ways. Marginal hurt:
total instructions in shared programs: 3610 -> 3611 (0.03%)
instructions in affected programs: 67 -> 68 (1.49%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than putting registers after SSA in the MIR indexing, put them
side-by-side, shifted 1, using the bottom bit as the SSA/reg select.
This will allow us to generate SSA temps in the compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than bailing if we see something that's not SSA, do out the
analysis to check if we can pipeline and do so if we can.
total registers in shared programs: 392 -> 391 (-0.26%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than always emitting an extra move for fragments, check the
actual criteria and emit accordingly. (This was lost during the RA
improvements at the end of May).
total bundles in shared programs: 2210 -> 2176 (-1.54%)
bundles in affected programs: 501 -> 467 (-6.79%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.59% max: 33.33% x̄: 13.13% x̃: 12.50%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -16.06% -10.21%
Bundles are helped.
total quadwords in shared programs: 3639 -> 3605 (-0.93%)
quadwords in affected programs: 795 -> 761 (-4.28%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.96% max: 33.33% x̄: 11.22% x̃: 8.33%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -14.31% -8.13%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Think of this pass as register coalescing part 2. After RA runs, but
before scheduling, we scan for code of the form:
mov rN, rN
and delete the move, since it's totally redundant. This pass helps
already, but it'd of course be much more effective paired with
register coalescing to encourage moves in general to end up in this
form. Nevertheless, even by itself:
total instructions in shared programs: 3665 -> 3613 (-1.42%)
instructions in affected programs: 2046 -> 1994 (-2.54%)
helped: 52
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 8.02% x̃: 4.00%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -10.26% -5.79%
Instructions are helped.
total bundles in shared programs: 2256 -> 2213 (-1.91%)
bundles in affected programs: 1154 -> 1111 (-3.73%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 25.00% x̄: 9.10% x̃: 5.56%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -11.60% -6.60%
Bundles are helped.
total quadwords in shared programs: 3689 -> 3642 (-1.27%)
quadwords in affected programs: 2025 -> 1978 (-2.32%)
helped: 47
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 7.86% x̃: 3.85%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -10.30% -5.42%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The source and destination were incorrectly flipped in the move, but
some details of our internal regalloc made this function anyway. Now
that we're changing the regalloc, we need to fix this to avoid
regressing blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a special case of DCE designed to run after the out-of-ssa pass
to cleanup special register lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We mixed up component_lo and full, which made it appear that we had
less freedom in RA than we actually do. Fix this to fix some
disassemblies as well as prepare for RA with the bias field.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Following the RA work, we apply the same technique to eliminate the move
to r27 when loading cubemaps.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This reverts commit 4508f43eed, which
broke a bunch of dEQP tests (e.g. in
dEQP-GLES2.functional.draw.draw_arrays.*)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We add a new opt pass fusing perspective projection with varyings. Minor
win..? We don't combine non-varying projections, since if we're too
agressive, the extra load/store traffic will hurt us so it's not really
a win in practice.
total instructions in shared programs: 3915 -> 3913 (-0.05%)
instructions in affected programs: 76 -> 74 (-2.63%)
helped: 1
HURT: 0
total bundles in shared programs: 2520 -> 2519 (-0.04%)
bundles in affected programs: 46 -> 45 (-2.17%)
helped: 1
HURT: 0
total quadwords in shared programs: 4027 -> 4025 (-0.05%)
quadwords in affected programs: 80 -> 78 (-2.50%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't use it yet, since it's actually a shader-db regression. This is
primarily helpful as an intermediate step for attaching projection to
varyings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
While load/store ops like st_vary can take an argument in either
r26/r27, ops like those for perspective projection must specifically
take their argument in r27.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The load/store pipes can't take a uniform register in, so an explicit
move is necessary here.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Given the constraints on special registers, we add a helper for lowering
these by inserting moves (copies) where needed to satsify the ISA
constraints.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We generalize the constant emission helper used in fragment writeout as
we'll also need it for vertex outputs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This ensures the rules for accessing special register classes are
satisfied. This is asserted as a prepass should have lowered offending
uses to something satisfying these rules. Special register classes are
*not* work registers and cannot be used for RMW operations; they are
essentially 1-way pipes straight into/from fixed-function logic in the
shader cores.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We reuse the same register spilling mechanism as for work->memory to
spill special->work registers, e.g. to allow writing out more than 2
vec4 varyings (without better scheduling anyway).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This does not yet support special->work spilling, nor does it support
multiclass breakup. These corner cases will be handled in succeeding
commits.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Checks for x/xy/xyz/xyzw style swizzles (slightly more general but you
get the idea).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This should save a lot of per-compile time by using the RA the way it's
actually supposed to be used.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Memory corruption (for both legitimate and illegitimate reasons) causes
this to hang pantrace.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This was disabled to permit regression-free RA work. Now that the spill
code is in place, we can reenable, with some caveats about efficacy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Pipe through the number of bytes of spilled memory used from the
compiler into the main driver, where it will be used to allocate the
Thread Local Storage buffer.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We just use the pointers of the midgard_block*, which is crude, but it
gets the point across and will help debug successor related issues.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we run RA in a loop, before each iteration after a failed
allocation we choose a spill node and spill it to Thread Local Storage
using st_int4/ld_int4 instructions (for spills and fills respectively).
This allows us to compile complex shaders that normally would not fit
within the 16 work register limits, although it comes at a fairly steep
performance penalty.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we write to an index before reading it, the old copy we're checking
liveness for isn't live in this block, even if it does get read later.
Fixes abnormally high register pressure in shaders with loops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Midgard bundles contain a tag, as well as a copy of the tag of the next
bundle to facilitate prefetch. Do some simple static analysis to detect
certain tag errors (particularly on shaders without branching).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than rewriting an index away across the whole block, we expose
finer (per-instruction) granularity for rewrites.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These are used to load/store from Thread Local Storage, which is memory
allocated per-thread (corresponding to ctx->scratchpad in the command
stream) and used for register spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It was a crazy idea that didn't pan out. We're better served by a good
copyprop pass. It's also unused now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than creating either a load or a uniform register read with a
fixed beginning offset, we always create a load and then promote to a
uniform register later. This will allow us to promote in a register
pressure aware manner.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to insert instructions as a result of register
allocation, permitting spilling to be implemented. As a side effect,
with the assert commented out this would fix a bunch of glamor crashes
(due to RA failures) so MATE becomes useable.
Ideally we'll have scheduling or RA actually sorted out before the
branch point but if not this gives us a one-line out to get X working...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If tiler_heap_end == tiler_heap_start, ensure it's printed the same
rather than one erroring out as hex.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It is legal to load a shader from a NULL address, particularly when the
TILER job is used strictly for effects on the Z/S buffer with 0x0 color
mask. Don't crash the decoder in this case.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
When debugging, we're given the fault_pointer unresolved, so it is
helpful to have more context in the decode.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Midgard supports two modes of operation, 32-bit mode and 64-bit mode.
The GPU is natively 64-bit, but job descriptors can be submitted in
32-bit mode. Among other changes, 32-bit mode shortens pointer sizes to
use 32-bit pointers rather than the full 64-bit range.
The blob decides which mode to use based on the CPU bitness, so an armhf
system uses 32-bit descriptors and an aarch64 system uses 64-bit
descriptors. For a while, we mimicked this, bu inevitably this caused
the 32-bit support to lag behind as our reference platform is 64-bit.
To combat the code staleness, we traced an older GPU paired with a 64-bit
CPU (the Midgard T720 on-board the sunxi H64). From there, we could tell
which fields were really about hardware and which fields were simply
reflections of the descriptor bitness.
From there, we decided to remove support for 32-bit descriptors
entirely, using 64-bit descriptors unconditionally. There is minimal
performance penalty for this in practice, and it allows us to unify
these disparate code paths. This fixes:
- T860 + armhf
- T820 + armhf
- T760 + aarch64
And will help bringup of 1st/2nd generation Midgard regardless of CPU.
[Work done by Tomeu. Commit message written by Alyssa.]
v2: Add comments preserving information about the old behaviour for
future reference. Fix a compiler warning. (Alyssa)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't even support replay anymore; this is just wasting characters
and adding clutter.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows dumping memory properties directly without dereferencing an
address, allowing us to fix more -Waddress-of-packed-member warnings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It could be midgard_outmod_float or midgard_outmod_int; don't assume
it's one or the other. Fixes -Wenum-conversion warnings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
I'm not sure why I thoughtt here was an off-by-one, other than maybe bad
data collection.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This reverts commit 812ce2ce9e.
We massively regress with the reverted patch. So in the meantime, take
it out.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
It's not clear the hardware really has a maximum which confuses dEQP;
clamp to whatever we report as our maximum.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In preparation for a Panfrost-based non-Gallium driver (maybe
Vulkan...?), hoist everything except for the Gallium driver into a
shared src/panfrost. Practically, that means the compilers, the headers,
and pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than using a magic lookup table with no explanations, let's add
liberal comments to the code to explain what this tiling scheme is and
how to encode/decode it efficiently.
It's not so mysterious after all -- just reordering bits with some XORs
thrown in.
v2: Correct copyright identifier. Fix spelling error. Switch space_4 to
a LUT. Fix comment typo. Use LUT instead of space_x tricks. Fallback on
generic rather than split up unaligned writes.
v3: Correct stride order (fixes crash loading). Correct coordinate
system mishap.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
This will allow both drivers to share this code. Both drivers
build-tested with meson. Android build not tested.
v2: Change naming from tiling->shared, in case Lima and Panfrost can
share more in the future. Fix Android build system.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-and-tested-by: Qiang Yu <yuq825@gmail.com>