In schedule live value tracking, differentiate between half vs full
precision. Half-precision live values are less costly than full
precision.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
So far we only handle full regs of arrays during pre-allocation.
This patch is to handle half regs of arrays and also consider the size
of half regs when finding out conflicts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
After RA, we can schedule to increase parallelism (reduce nop's) without
worrying about increasing register pressure. This pass lets us cut down
the instruction count ~10%, and prioritize bary.f, kill, etc, which
would tend to increase register pressure if we tried to do that before
RA.
It should be more useful if RA round-robin'd register choices.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Originally these were nested functions, which worked nicely, giving us
the function of a local macro that was actual 'c' syntax (ie. not token
pasted macro). But these were converted to macros because clang doesn't
let us have nice gcc extensions.
Extract these back out into functions, before adding more things and
making the macros even more cumbersome.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This scenario can come up with block-sched and nop-sched moved to after
RA. So lets fix it first to keep things bisectable.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Setting up transitive conflicts between a full register and its two
half registers (eg r0.x and hr0.x and hr0.y) will make the half
registers conflict. They don't actually conflict and this prevents us
from using both at the same time.
Add and use a new ra helper that sets up transitive conflicts between
a register and its subregisters, except it carefully avoids the
subregister conflict.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec0.
Fixes: f30c256ec0 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can at least get rid of the if-not-NULL check in a bunch of places.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fold it in to writes_gpr() (since a register that does not reference any
registers by definition does not write a register). This lets us avoid
having to handle this case in a few other places.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.
Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump
when lowering the precision of the variables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
On a6xx, half-regs conflict with full-regs. But we were only setting up
conflicts for the first class (ie. scalar, but not hvec2/hvec3/hvec4),
resulting in higher half-reg classes getting assigned to regs that
overwrite full-regs.
Noticed while trying to enable indirect-sampler (sam.s2en) which uses an
hvec2 argument to pass the sampler/tex index.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We probably need to rethink how we detect which instruction first
defines higher register classes. But for now, this at least fixes
the symptom.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This was a hold-over from the early TGSI days, and mostly not needed
with NIR. This avoids burning an entire 4 consecutive scalar regs
for vec3 outputs, for example. Which fixes a few places that we were
doing worse that we should on register usage.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Detect when a component of an (for example) texture fetch is unused and
propagate the updated wrmask back to the parent instruction.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver. The parts that are gallium
specific have been refactored out and remain in the gallium driver.
Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.
NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build. Waiting
patiently for the day that we can remove *everything* from the autotools
build.
Signed-off-by: Rob Clark <robdclark@gmail.com>