The goal here is to have an argument for the depth write opcode so that I
can do computed depth. In the process, this makes the calculations that
will be emitted more obvious in the QIR.
We potentially need to be careful that use of a value stored in r4 isn't
copy-propagated (or something) across another r4 write. That doesn't
appear to happen currently, and this makes the dataflow more obvious. It
also opens up not unpacking the r4 value, which will be useful for depth
textures.
The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
There were troubles with bools without using native integers
(st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a
uniform it's stored as ~0), and since I've got native integers other than
divide, I might as well just support them.
Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.
The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
Fixes piglit glsl-fs-discard-01 and -03, and allows a lot of mesa demos to
start running. glsl-fs-discard-02 has a problem where the first tile is
not getting stored on the first render.
Passes blendminmax and blendsquare. glean's more serious blendFunc fails
in simulation due to binner memory overflow (I really need to work around
that), and fbo-blending-formats fails due to Mesa refusing one of the
getter requests, even before it could fail due to the driver not actually
supporting different formats yet.
This doesn't load/store the Z contents across submits yet. It also
disables early Z, since it's going to require tracking of Z functions
across multiple state updates to track the early Z direction and whether
it can be used.
v2: Move the key setup to before the search for the key.
Only rgba8888 works, and only a single texture unit, and it's only under
simulation because I haven't built the kernel interface yet.
v2: Rebase on helpers.
v3: Fold in the don't-break-the-arm-build fix.
We put in a bunch of extra MOVs for program outputs, and this can clean
those up. We should do uniforms, too, though.
v2: Fix missing flagging of progress when we actually optimize. Caught by
Aaron Watry.
This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.
This took a couple of tries, and this is the squash of those attempts.
v2: Fix register file conflicts on the args in the
destination-is-accumulator case.
v3: Rebase on helper change and qir_inst4 change.
This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.
Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.
v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).