This does not affect shaders in any way. Rather, it makes the shader-db
instruction count recorded in the compiler accurate with the in-order
scheduler, matching up with what we calculate from pandecode.
Though shaders are the same, instruction counts cannot be compared
across this commit for this reason.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Instructions attached to blocks are never explicitly freed. Let's
use ralloc() to attach those objects to the compiler context so that
they are automatically freed when the ctx object is freed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Add an assert() in schedule_bundle() to make sure all instruction
pointers in bundle.instructions[] are valid.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Right now we're leaking all block and instruction objects allocated by
the compiler. Let's clean things up before leaving
midgard_compile_shader_nir().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
When moving constants, if switching to a floating-point representation
doesn't break anything, we'd rather have an fmov than an imov,
permitting inlining the constant in many circumstances.
total quadwords in shared programs: 3408 -> 3366 (-1.23%)
quadwords in affected programs: 1188 -> 1146 (-3.54%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 9.65% x̃: 11.11%
95% mean confidence interval for quadwords value: -1.07 -0.98
95% mean confidence interval for quadwords %-change: -11.38% -7.93%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Storing constants as float doesn't make sense when we have integer
instructions; better to switch to be integer natively and coerce to/from
float rather than the opposite.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We check for texture ops which calculate derivatives (either explicitly
via dFd* or implicitly) and mark the shader as requiring helper
invocations.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
For shaders using exclusively direct attribute/varyings, we can work
this out statically. For shaders with indirect access, we just set an
upper bound of 16 (the max attributes/varyings we support) and the
actual count will be reported regardless.
We proceed similarly for textures/samplers, as well as for UBOs. While
UBOs can be *indexed* indirectly, the *UBO itself* -- which is what we
count in the shader descriptor (rather than the UBO descriptors) -- is
statically determinable.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This one is a little tricky, but the idea is that:
r16-r23 are always uniforms
r8-r15 are sometimes work, sometimes uniforms...
...but as work, they are always written before use
...and as uniforms, they are never written before use
So we use that heuristic to determine the count to feed the machine.
We'll record work register use in the next commit.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Panfrost is the only user of the macro; we are better off expanding than
having random stuff in nir.h.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
A pair of special flags can turn the texture/sampler handle fields into
register selects. This means code like:
texture(uTextures[hr28.w], ...)
can be compiled to something like:
texture ..., fsampler[hr28.w], texture[hr28.w]
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This data structure is shared in other parts of the texture word, so
let's streamline printing.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows nodes to be unsigned and prevents a class of weird
signedness bugs identified by Coverity.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than using a regalloc based on live internals, computed hastily
with repeated invocations of a forward-analysis pass, we switch to
compute liveness information on a per-block basis.
Within a given basic block, we compute liveness backwards with a
linear-time algorithm; for common shaders, this may help RA terminate
quicker.
Across blocks, we use a work list (really a work set) and check if we're
making progress. This isn't terribly efficient, but it gets the job
done. Point is, we get the live_in/live_out for each block.
From there, it's simple to rerun the linear-time update algorithm to
compute the interference graph.
The benefit of this technique is the ability to ignore "gaps" in
liveness across intermediate blocks that are never executed. On simple
shaders like the loops in glmark, this results in a minor reduction in
register pressure. The motivation was a complex shader in Krita that
failed register allocation due to an unfortunate interaction between
texture pipeline registers and control flow. This shader now compiles
successfully.
total instructions in shared programs: 3439 -> 3438 (-0.03%)
instructions in affected programs: 22 -> 21 (-4.55%)
helped: 1
HURT: 0
total bundles in shared programs: 2077 -> 2076 (-0.05%)
bundles in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0
total quadwords in shared programs: 3457 -> 3456 (-0.03%)
quadwords in affected programs: 20 -> 19 (-5.00%)
helped: 1
HURT: 0
total registers in shared programs: 341 -> 338 (-0.88%)
registers in affected programs: 9 -> 6 (-33.33%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If there's a nontrivial swizzle fed into an extra (shortened) argument,
we bail on copyprop. No glmark changes (since it doesn't use fancy
texturing/loads).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's always been ambiguous which they are, but their primary register is
their output, not their input; therefore, they are loads.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Same issue with liveness analysis. If we store out a vec3, we should not
reference the .w component.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The texture coordinate for a 2D texture could be a vec2 or a vec3,
depending if it's an array texture or not. If it's vec2 (non-array
texture), we should not reference the z component; otherwise, liveness
analysis will get very confused when z is never written.
v2: Fix typo (Ilia).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we need to lower a move for a read from a vec2 texture coordinate, we
shouldn't write zw, even incidentally.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes shaders with control flow like:
out = 0;
if (A) {
if (B)
out = texture(A, ...)
} else {
out = texture(B, ...)
}
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The exit block has been 'dangling' in the successors graph, so let's
ensure it's linked in.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
While we already compute the successors array, for backwards data flow
analysis, it is useful to walk the control flow graph backwards based on
predecessors, so let's compute that information as well.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We started honouring the normalized_coords flag in the texture
descriptor, but a bisection revealed that broke RECT textures -- since
we were *also* lowering them in the shader. So just remove the
shader-based lowering, use native RECT textures, and enjoy the nominal
reduction in complexity and performance boost.
Fixes: 3e47a1181b ("panfrost: Add MALI_SAMP_NORM_COORDS flag")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We only know how to promote aligned accesses, although theoretically we
should be able to promote unaligned to swizzles in the future. Check
this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We'll want to be smarter about unaligned reads, so let's get this code
all in one place.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We'll need multiple branches for MRT, so we can't defer. Also, we need
to track dependencies to ensure r0 is set to the correct value for each
store_output.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Some memory corruption / etc issues let to an accidental "fuzzing" of
the disassembler ;) This uncovered some issues leading to a disassembler
hang, so let's fix that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to have multiple spill moves, whereas otherwise for N
spill moves, the first N-1 would be clobbered. Issue found in Krita.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We wire through some shader-db-style stats on the current shader in the
disassemble so we can get a quick estimate of shader complexity from a
trace.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Rob Clark <robdclark@chromium.org>
We'll want to dump some stats after the shader, and I refuse to use one
teensy little goto.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a bit of a hack, but it'll hold us over until we have 64-bit
support wired through.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This used a delicate hack to try to find indirect inputs and skip them
as candidates for pairing. Let's use a better criterion -- no sources --
and pair based on that.
We could do better, but that would require more complex data flow
analysis than we're interested in doing here.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We implement gl_WorkGroupID and gl_LocalInvocationID, which map to
ld_compute_id with special sources.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows liveness analysis within a loop to be more fine grained,
fixing RA failures with partial spilled movs within a loop, as well as
enabling a slight reduction of register pressure more generally:
total registers in shared programs: 350 -> 347 (-0.86%)
registers in affected programs: 12 -> 9 (-25.00%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Just laying the groundwork. Reads and writes should be supported (both
direct and indirect, either int or float, vec1/2/3/4), but no bounds
checking is done at the moment.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a corner case that happens a lot with SSBOs. Basically, if we
only read a few components of a uniform, we need to only spill a few
components or otherwise we try to spill what we spilled and RA hangs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't want to load a 128-bit sysval when 64-bits will do. Fixes RA
failures with SSBO indirect writes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Sometimes a sysval is used to facilitate an instruction but is not the
instruction itself.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
For each SSBO index we get from Gallium/NIR, we need two pieces of
information in the shader:
1. The address of the SSBO in GPU memory. Within the shader, we'll be
accessing it with raw memory load/store, so we need the actual address,
not just an index.
2. The size of the SSBO. This is not strictly necessary, but at some
point, we may like to do bounds checking on SSBO accesses.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Register allocation for varying stores is a bit different, since the
instructions ignore the writemask (varyings are normalized
packed/vectorized..)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Symptom: the sky is black in SuperTuxKart (flashbacks to SMB/NES
emulation intensify).
Essentially, what happened is a fixed (special) move to r0 was
eliminated but scheduling did not factor this in, so
can_run_concurrent_ssa returned true even when there was a logical data
dependency that needed to be resolved.
Fixes: 20771ede1c ("pan/midgard: Add post-RA move elimination")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
No functional changes, just breaks out a megamonster function and fixes
the indentation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need three independent sources to support indirect SSBO writes (as
well as textures with both LOD/bias and offsets). Now is a good time to
make sources just an array so we don't have to rewrite a ton of code if
we ever needed a fourth source for some reason.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
As far as I know, there's no such thing as a load/store op that only
takes its argument in r27. We just need to set the appropriate arg_1
field in the RA to specify other registers if we want them.
To facilitate this, various RA-related changes are needed across the
compiler ; this should also fix indirect offsets which were implicitly
interpreted as "r27-only" despite not even passing through RA yet. One
ripple effect change is switching the move insertion point and adjusting
the liveness analysis accordingly, so while this was intended as a
purely functional change, there are some shader-db changes:
total instructions in shared programs: 3511 -> 3498 (-0.37%)
instructions in affected programs: 563 -> 550 (-2.31%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.93% max: 5.00% x̄: 2.58% x̃: 2.33%
95% mean confidence interval for instructions value: -1.27 -0.90
95% mean confidence interval for instructions %-change: -3.23% -1.93%
Instructions are helped.
total bundles in shared programs: 2067 -> 2067 (0.00%)
bundles in affected programs: 398 -> 398 (0.00%)
helped: 7
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.54% max: 10.00% x̄: 5.04% x̃: 5.56%
HURT stats (abs) min: 1 max: 2 x̄: 1.75 x̃: 2
HURT stats (rel) min: 2.13% max: 4.26% x̄: 3.72% x̃: 4.26%
95% mean confidence interval for bundles value: -0.95 0.95
95% mean confidence interval for bundles %-change: -5.21% 1.50%
Inconclusive result (value mean confidence interval includes 0).
total quadwords in shared programs: 3464 -> 3454 (-0.29%)
quadwords in affected programs: 1199 -> 1189 (-0.83%)
helped: 18
HURT: 4
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.03% max: 5.26% x̄: 2.44% x̃: 1.79%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 2.56% max: 2.82% x̄: 2.63% x̃: 2.56%
95% mean confidence interval for quadwords value: -0.98 0.07
Inconclusive result (value mean confidence interval includes 0).
total registers in shared programs: 383 -> 373 (-2.61%)
registers in affected programs: 56 -> 46 (-17.86%)
helped: 12
HURT: 2
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 9.09% max: 33.33% x̄: 29.58% x̃: 33.33%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 20.00% max: 50.00% x̄: 35.00% x̃: 35.00%
95% mean confidence interval for registers value: -1.13 -0.29
95% mean confidence interval for registers %-change: -35.07% -5.63%
Registers are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Load/store's main "argument 0" already has its swizzle handled
correctly (for stores, that is). But the tinier arguments, the compact
ones with a component select but not a full swizzle, those are not yet
handled. Let's do something about that!
Rather than an ersatz thing that sort of looks like successors but is in
fact just the source order traversal with some backward jumps hacked in
for loops... construct an actual flow graph so we can do analysis
sanely.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The 16-bit field can be decomposed to two independent 8-bit fields, each
representing a single (additional) argument to the load/store op,
generally used for encoding registers. Addressable registers here are
substantially limited compared to the main register in a load/store op.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to flip ops to have a constant in the second place to
enable inlining of the constant.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes a hang (and abort) on empty shaders, which you shouldn't have
anyway but better safe than sorry. DCE going on the fritz is no reason
to freeze the system.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We had them backwards in both the command stream and the Midgard stack.
In OpenGL ES 2.0, they're always the same, but in Vulkan/later-GL/CL
they diverge so we can fix this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This were seriously messed up beyond all recognition. How we're passing
shaders.random.* is a mystery.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Conceptually, r28-r29 (as used for reading) and r28-r29 (as used for
writing) aren't registers at all, merely push/pull arrangements. So you
can't feed a texture result back into itself without explicitly moving
in the middle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will enable us to fuse inverts in various ways. Marginal hurt:
total instructions in shared programs: 3610 -> 3611 (0.03%)
instructions in affected programs: 67 -> 68 (1.49%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than putting registers after SSA in the MIR indexing, put them
side-by-side, shifted 1, using the bottom bit as the SSA/reg select.
This will allow us to generate SSA temps in the compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than bailing if we see something that's not SSA, do out the
analysis to check if we can pipeline and do so if we can.
total registers in shared programs: 392 -> 391 (-0.26%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than always emitting an extra move for fragments, check the
actual criteria and emit accordingly. (This was lost during the RA
improvements at the end of May).
total bundles in shared programs: 2210 -> 2176 (-1.54%)
bundles in affected programs: 501 -> 467 (-6.79%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.59% max: 33.33% x̄: 13.13% x̃: 12.50%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -16.06% -10.21%
Bundles are helped.
total quadwords in shared programs: 3639 -> 3605 (-0.93%)
quadwords in affected programs: 795 -> 761 (-4.28%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.96% max: 33.33% x̄: 11.22% x̃: 8.33%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -14.31% -8.13%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Think of this pass as register coalescing part 2. After RA runs, but
before scheduling, we scan for code of the form:
mov rN, rN
and delete the move, since it's totally redundant. This pass helps
already, but it'd of course be much more effective paired with
register coalescing to encourage moves in general to end up in this
form. Nevertheless, even by itself:
total instructions in shared programs: 3665 -> 3613 (-1.42%)
instructions in affected programs: 2046 -> 1994 (-2.54%)
helped: 52
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 8.02% x̃: 4.00%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -10.26% -5.79%
Instructions are helped.
total bundles in shared programs: 2256 -> 2213 (-1.91%)
bundles in affected programs: 1154 -> 1111 (-3.73%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 25.00% x̄: 9.10% x̃: 5.56%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -11.60% -6.60%
Bundles are helped.
total quadwords in shared programs: 3689 -> 3642 (-1.27%)
quadwords in affected programs: 2025 -> 1978 (-2.32%)
helped: 47
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 7.86% x̃: 3.85%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -10.30% -5.42%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The source and destination were incorrectly flipped in the move, but
some details of our internal regalloc made this function anyway. Now
that we're changing the regalloc, we need to fix this to avoid
regressing blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a special case of DCE designed to run after the out-of-ssa pass
to cleanup special register lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We mixed up component_lo and full, which made it appear that we had
less freedom in RA than we actually do. Fix this to fix some
disassemblies as well as prepare for RA with the bias field.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Following the RA work, we apply the same technique to eliminate the move
to r27 when loading cubemaps.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This reverts commit 4508f43eed, which
broke a bunch of dEQP tests (e.g. in
dEQP-GLES2.functional.draw.draw_arrays.*)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We add a new opt pass fusing perspective projection with varyings. Minor
win..? We don't combine non-varying projections, since if we're too
agressive, the extra load/store traffic will hurt us so it's not really
a win in practice.
total instructions in shared programs: 3915 -> 3913 (-0.05%)
instructions in affected programs: 76 -> 74 (-2.63%)
helped: 1
HURT: 0
total bundles in shared programs: 2520 -> 2519 (-0.04%)
bundles in affected programs: 46 -> 45 (-2.17%)
helped: 1
HURT: 0
total quadwords in shared programs: 4027 -> 4025 (-0.05%)
quadwords in affected programs: 80 -> 78 (-2.50%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't use it yet, since it's actually a shader-db regression. This is
primarily helpful as an intermediate step for attaching projection to
varyings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
While load/store ops like st_vary can take an argument in either
r26/r27, ops like those for perspective projection must specifically
take their argument in r27.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The load/store pipes can't take a uniform register in, so an explicit
move is necessary here.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Given the constraints on special registers, we add a helper for lowering
these by inserting moves (copies) where needed to satsify the ISA
constraints.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We generalize the constant emission helper used in fragment writeout as
we'll also need it for vertex outputs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This ensures the rules for accessing special register classes are
satisfied. This is asserted as a prepass should have lowered offending
uses to something satisfying these rules. Special register classes are
*not* work registers and cannot be used for RMW operations; they are
essentially 1-way pipes straight into/from fixed-function logic in the
shader cores.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We reuse the same register spilling mechanism as for work->memory to
spill special->work registers, e.g. to allow writing out more than 2
vec4 varyings (without better scheduling anyway).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This does not yet support special->work spilling, nor does it support
multiclass breakup. These corner cases will be handled in succeeding
commits.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Checks for x/xy/xyz/xyzw style swizzles (slightly more general but you
get the idea).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This should save a lot of per-compile time by using the RA the way it's
actually supposed to be used.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This was disabled to permit regression-free RA work. Now that the spill
code is in place, we can reenable, with some caveats about efficacy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Pipe through the number of bytes of spilled memory used from the
compiler into the main driver, where it will be used to allocate the
Thread Local Storage buffer.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We just use the pointers of the midgard_block*, which is crude, but it
gets the point across and will help debug successor related issues.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we run RA in a loop, before each iteration after a failed
allocation we choose a spill node and spill it to Thread Local Storage
using st_int4/ld_int4 instructions (for spills and fills respectively).
This allows us to compile complex shaders that normally would not fit
within the 16 work register limits, although it comes at a fairly steep
performance penalty.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we write to an index before reading it, the old copy we're checking
liveness for isn't live in this block, even if it does get read later.
Fixes abnormally high register pressure in shaders with loops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Midgard bundles contain a tag, as well as a copy of the tag of the next
bundle to facilitate prefetch. Do some simple static analysis to detect
certain tag errors (particularly on shaders without branching).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than rewriting an index away across the whole block, we expose
finer (per-instruction) granularity for rewrites.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These are used to load/store from Thread Local Storage, which is memory
allocated per-thread (corresponding to ctx->scratchpad in the command
stream) and used for register spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It was a crazy idea that didn't pan out. We're better served by a good
copyprop pass. It's also unused now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than creating either a load or a uniform register read with a
fixed beginning offset, we always create a load and then promote to a
uniform register later. This will allow us to promote in a register
pressure aware manner.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to insert instructions as a result of register
allocation, permitting spilling to be implemented. As a side effect,
with the assert commented out this would fix a bunch of glamor crashes
(due to RA failures) so MATE becomes useable.
Ideally we'll have scheduling or RA actually sorted out before the
branch point but if not this gives us a one-line out to get X working...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It could be midgard_outmod_float or midgard_outmod_int; don't assume
it's one or the other. Fixes -Wenum-conversion warnings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This reverts commit 812ce2ce9e.
We massively regress with the reverted patch. So in the meantime, take
it out.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
It's not clear the hardware really has a maximum which confuses dEQP;
clamp to whatever we report as our maximum.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In preparation for a Panfrost-based non-Gallium driver (maybe
Vulkan...?), hoist everything except for the Gallium driver into a
shared src/panfrost. Practically, that means the compilers, the headers,
and pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>