Commit Graph

86 Commits

Author SHA1 Message Date
Eric Anholt f029932cac vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR.
This lets us write the Z directly from the FTOI for computed Z, and may
let us coalesce color writes in the future.

No change in my shader-db, but clearly drops an instruction in piglit's
early-z test.
2016-04-08 18:41:46 -07:00
Eric Anholt 44d7b8ad12 vc4: Add a helper function for the construction of qregs.
The separate declaration of the struct is not helping clarity, and I was
going to be writing a whole lot more of these in the upcoming patches.
2016-04-08 18:41:45 -07:00
Eric Anholt 483c172989 vc4: Drop the multi_instruction distinction for QIR instructions.
It wasn't correctly flagged everywhere, and QPU generation now handles the
only remaining case that was paying attention to it.

No change on shader-db.
2016-04-08 18:41:45 -07:00
Eric Anholt 2b9f0dffe0 vc4: Move discard handling to the condition flag.
Now that the field exists in the instruction, we can make discards less
special.  As a bonus, that means that we should be able to merge some more
.sf instructions together when we get around to that.

This causes some scheduling changes, as it allows tlb_color_reads to be
delayed past the discard condition setup.  Since the tlb_color_read ends
up later, this may mean performance improvements, but I haven't tested.

total instructions in shared programs: 78114 -> 78035 (-0.10%)
instructions in affected programs:     1922 -> 1843 (-4.11%)
total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
estimated cycles in affected programs:     8200 -> 8211 (0.13%)
2016-03-16 11:28:47 -07:00
Eric Anholt 7c9fc43915 vc4: Don't make a temporary for setting flags.
The register allocator doesn't really do anything about the temp, so it
doesn't seem like it should matter.  However, the scheduler would think
that a new def is being created.

This doesn't change anything yet, but it avoids a bunch of regressions in
the next commit.
2016-03-16 11:28:34 -07:00
Eric Anholt b4f45f319c vc4: Add a safety check for setting flags.
If a pack was on the src reg, should it be a float, int, or mul unpack?
Just complain, instead.
2016-03-16 11:28:34 -07:00
Varad Gautam e103b52aec vc4: Coalesce instructions using VPM reads into the VPM read.
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.

shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs:     5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs:     5345 -> 4993 (-6.59%)

Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
2016-03-15 13:09:24 -07:00
Eric Anholt 655fa0f465 vc4: Don't treat conditional MOVs as raw MOV.
The two consumers want to know that the destination will be exactly the
source, which is not true if we might not set the destination.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-15 17:13:52 -08:00
Eric Anholt 71db7d3dc5 vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers.  By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs:     39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
2016-01-06 12:39:51 -08:00
Eric Anholt 0a89f307f9 vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
2016-01-06 12:39:27 -08:00
Eric Anholt 2591beef89 vc4: Fix handling of src packs on in qir_follow_movs().
The caller isn't going to expect it from a return, so it would probably
get misinterpreted.  If the caller had an unpack in its reg, that's fine,
but don't lose track of it.
2015-12-11 12:21:22 -08:00
Eric Anholt a97b40dca4 vc4: Add support for multisample framebuffer operations.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.
2015-12-08 09:49:54 -08:00
Eric Anholt 74c4b3b80c vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.
2015-12-04 09:23:55 -08:00
Eric Anholt a4bf28178f vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 17:45:23 -08:00
Eric Anholt 01ca4f207e vc4: Rewrite the pack instructions as a MOV with a dst pack flag
Another step in reducing the special-casing of instructions.
2015-10-26 16:48:34 -07:00
Eric Anholt 99a9a5a345 vc4: Switch the unpack ops to being unpack flags on a mov.
This paves the way for copy propagating our unpacks.  We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs:     19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
2015-10-26 16:48:34 -07:00
Eric Anholt 652a864b25 vc4: Fix up the test for whether the unpack can be from r4.
We can do 16a/16b from float as well.  No difference on shader-db.
2015-10-26 16:48:34 -07:00
Eric Anholt 3d7a088608 vc4: Don't try to follow MOVs across a pack. 2015-10-26 16:48:34 -07:00
Eric Anholt 0ccacfa017 vc4: If a QIR source has an unpack set, print it.
Not used yet, but will be.
2015-10-26 16:48:34 -07:00
Eric Anholt f09ed63f43 vc4: Fix the test for skipping raw MOVs.
I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c.  No change to shader-db.
2015-10-24 17:55:22 -07:00
Eric Anholt 8e701fda49 vc4: Add QIR/QPU support for the 8-bit vector instructions. 2015-10-23 18:11:21 +01:00
Eric Anholt fb064901e9 vc4: Use Rob's NIR-based user clip lowering. 2015-10-23 14:30:15 +01:00
Eric Anholt cfa980f493 vc4: convert from tgsi semantic/index to varying-slot
(originally part of previous patch, split out to separate patch by Rob)

v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 15:07:08 -04:00
Boyan Ding 48de40ce9c vc4: Initialize pack field of qreg to 0 in qir_get_temp
This avoids generation of undefined packing in qir and qpu instructions,
fixing a lot of rendering errors.

Fixes 8b36d107fd (vc4: Pack the unorm-packing bits into a src MUL
instruction when possible.)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-04 12:16:07 -07:00
Eric Anholt 89b1b33f44 vc4: Fold the 16-bit integer pack into the instructions generating it.
total instructions in shared programs: 97580 -> 96798 (-0.80%)
instructions in affected programs:     52826 -> 52044 (-1.48%)
2015-08-21 13:29:26 -07:00
Eric Anholt 7e0b868cf3 vc4: Reuse QPU dumping for packing bits in QIR. 2015-08-21 13:29:26 -07:00
Eric Anholt 4ae137534a vc4: Make _dest variants of qir ALU helpers to provide an explicit dest. 2015-08-21 13:29:26 -07:00
Eric Anholt 8b36d107fd vc4: Pack the unorm-packing bits into a src MUL instruction when possible.
Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.

total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs:     4149 -> 3987 (-3.90%)
2015-08-20 23:43:04 -07:00
Eric Anholt 572a48366d vc4: Add a QIR helper for whether the op is a MUL type. 2015-08-20 23:42:59 -07:00
Eric Anholt 98728ce071 vc4: Switch QPU_PACK_SCALED to be two non-SSA instructions.
total instructions in shared programs: 98159 -> 98136 (-0.02%)
instructions in affected programs:     12279 -> 12256 (-0.19%)
2015-08-20 23:42:45 -07:00
Eric Anholt 69ef08d303 vc4: Make the pack-to-unorm instructions be non-SSA.
This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.

total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs:     2134 -> 2123 (-0.52%)
2015-08-20 23:42:17 -07:00
Eric Anholt 0bba4fa070 vc4: Allow QIR registers to be non-SSA.
Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations.  This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers.  Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
2015-08-20 23:40:22 -07:00
Eric Anholt cc8fb29046 vc4: Make r4-writes implicitly move to a temp, and allocate temps to r4.
Previously, SFU values always moved to a temporary, and TLB color reads
and texture reads always lived in r4.  Instead, we can have these results
just be normal temporaries, and the register allocator can leave the
values in r4 when they don't interfere with anything else using r4.

shader-db results:
total instructions in shared programs: 100809 -> 100040 (-0.76%)
instructions in affected programs:     42383 -> 41614 (-1.81%)
2015-08-04 17:19:01 -07:00
Eric Anholt 78c773bb36 vc4: Convert from simple_list.h to list.h
list.h is a nicer and more familiar set of list functions/macros.
2015-05-29 22:09:53 -07:00
Eric Anholt 73e2d4837d vc4: Convert to consuming NIR.
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.

total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs:     62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs:     15494 -> 15240 (-1.64%)

v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
    I'm not committing)
2015-04-01 10:57:01 -07:00
Eric Anholt 8c5dcdbccb vc4: Add a constant folding pass.
This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).

I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.

total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs:     346 -> 344 (-0.58%)
2015-03-30 12:57:45 -07:00
Eric Anholt 85316d059c vc4: Keep an array of pointers to instructions defining the temps around.
The optimization passes are always regenerating it and throwing it away,
but it's not hard to keep track of.
2015-02-19 23:35:17 -08:00
Eric Anholt 877b48a531 vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h.
I may want them in optimization passes, and they're not really particular
to the program translation stage.
2015-02-19 23:35:17 -08:00
Eric Anholt 14dc281c13 vc4: Enforce one-uniform-per-instruction after optimization.
This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.

total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs:     1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs:     6027 -> 5848 (-2.97%)

I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.
2015-02-19 23:35:17 -08:00
Eric Anholt 3f1e1287fd vc4: Make SF be a flag on the QIR instructions.
Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value.  Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:

total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs:     3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs:     12595 -> 12497 (-0.78%)
2015-02-12 16:33:16 -08:00
Eric Anholt 12ebd7e20e vc4: Dump the VPM read index in QIR disasm.
Since the VPM reads have to be in order, it's useful to see their indices
in the dump.
2015-02-01 12:53:08 -08:00
Eric Anholt d70eb38517 gallium: Replace u_simple_list.h with util/simple_list.h
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 16:33:34 -08:00
Eric Anholt 772c47aefe vc4: Move the tests for src needing to be an A register to vc4_qir.c.
I want it from another location.
2015-01-15 22:19:25 +13:00
Eric Anholt c772c92153 vc4: Split two notions of instructions having side effects.
Some ops can't be DCEd, while some of the ops that are just important due
to the args they have can be.
2015-01-10 15:24:46 +13:00
Eric Anholt a58ae83882 vc4: Redo VPM reads as a read file.
This will let us do copy propagation of the VPM reads.
2015-01-10 14:35:24 +13:00
Eric Anholt 72cb6619cb vc4: Restructure color packing as a series of channel replacements.
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4.  But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:

total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs:     4381 -> 4314 (-1.53%)
2015-01-10 13:54:12 +13:00
Eric Anholt e06b0778f5 vc4: Coalesce MOVs into VPM with the instructions generating the values.
total instructions in shared programs: 41168 -> 40976 (-0.47%)
instructions in affected programs:     18156 -> 17964 (-1.06%)
2014-12-18 15:00:56 -08:00
Eric Anholt a871eff16c vc4: Redefine VPM writes as a (destination) QIR register file.
This will let me coalesce the VPM writes into the instructions generating
the values.
2014-12-17 22:35:08 -08:00
Eric Anholt e473fbe469 vc4: Add support for turning constant uniforms into small immediates.
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts.  However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.

total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs:     10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs:     25551 -> 25924 (1.46%)

In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z.  In this patch is I just use raddr conflict resolution,
which is more expensive.  I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
2014-12-17 19:35:13 -08:00
Eric Anholt ff266483fb vc4: Move follow_movs() to common QIR code.
I want this from other passes.
2014-12-17 19:05:52 -08:00