Every caller was dereffing the qinst, and this will let us make the number
of sources vary depending on the destination of the qinst so that we can
have general ALU ops that store to tex_[strb] and get an implicit uniform.
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before. This commit is just to make QIR
code failing more intelligible when register allocation fails.
With control flow, we can't be sure that we'll see the uses of a variable
before its def as we walk backwards. Given that NIR is eliminating our
long chains of dead code, a simple solution for now seems fine.
This slightly changes the order of some optimizations, and so an opt_vpm
happens before opt_dce, causing 3 dead MOVs to be turned into dead FMAXes
in Minecraft:
instructions in affected programs: 52 -> 54 (3.85%)
The optimization passes and scheduling aren't actually ready for multiple
blocks with control flow yet (as seen by the "cur_block" references in
them instead of iterating over blocks), but this creates the structures
necessary for converting them.
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling. We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.
No change on shader-db.
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects. Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
(originally part of previous patch, split out to separate patch by Rob)
v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value. Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:
total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs: 3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs: 12595 -> 12497 (-0.78%)
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.
total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.