Commit Graph

109 Commits

Author SHA1 Message Date
Eric Anholt 51087327f2 vc4: Replace the qinst src[] with a fixed-size array.
This may have made a tiny bit of sense when we had one 4-arg inst per
shader, but if we only ever put 2 things in, having a pointer to 2 things
almost every instruction is pointless indirection.
2016-11-29 08:38:59 -08:00
Eric Anholt a220f1b5a9 vc4: Remove qir_inst4().
This was used originally for unorm4x8 packs, but we now represent those as
a series of packed movs.
2016-11-29 08:38:59 -08:00
Eric Anholt 4f527f1260 vc4: Add a thread switch QIR instruction.
This will eventually be generated at the QIR level, so that
vc4_qir_schedule.c can arrange the separation of tex_strb from tex_result
correctly.  It will also be important so that register allocation set the
register classes appropriately for values that are live across the switch.
2016-11-12 18:46:35 -08:00
Eric Anholt 695a2e2ffa vc4: Print a reg pressure estimate in our reg allocation failure dump. 2016-11-09 15:33:56 -08:00
Eric Anholt 8ce6526178 vc4: Add support for MUL output rotation.
Extracted from a patch by jonasarrow on github.
2016-08-25 17:24:11 -07:00
Eric Anholt 074f1f3c0c vc4: Add support for the 2-bit LOAD_IMM variants.
Extracted and fixed up from a patch by jonasarrow on github.  This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
2016-08-25 17:24:11 -07:00
Eric Anholt 31da39ddc9 vc4: Add a QIR value for the QPU element register.
This will be used in the ddx/ddy support for "Am I the top half?" or "Am I
the left half?" checks.
2016-08-25 17:24:11 -07:00
Eric Anholt 9194473dd2 vc4: Emit resets of the uniform stream at the starts of blocks.
If a block might be entered from multiple locations, then the uniform
stream will (probably) be at different points, and we need to make sure
that it's pointing where we expect it to be.  The kernel also enforces
that any block reading a uniform resets uniforms, to prevent reading
outside of the uniform stream by using looping.
2016-07-13 23:54:15 -07:00
Eric Anholt a59da513d3 vc4: Move the QPU instructions to schedule into each block.
We'll want to schedule them individually, to handle delay slots.
2016-07-13 23:54:15 -07:00
Eric Anholt 05bcd9dd96 vc4: Define a QIR branch instruction
This uses the branch condition code in inst->cond to jump to either
successor[0] (condition matches) or successor[0] (condition doesn't
match).
2016-07-12 17:42:40 -07:00
Eric Anholt 6d34345001 vc4: Print live variable start/ends during QIR dumping.
This only happens when live variables are set up, which is not in the
normal dump, but is set up when we've failed to register allocate.
2016-07-12 17:42:37 -07:00
Eric Anholt 89918c1e74 vc4: Implement live intervals using a CFG.
Right now our CFG is always a trivial single basic block, but that will
change when enable loops.
2016-07-12 17:41:59 -07:00
Eric Anholt 6c1f834a23 vc4: Create a basic block structure and move the instructions into it.
The optimization passes and scheduling aren't actually ready for multiple
blocks with control flow yet (as seen by the "cur_block" references in
them instead of iterating over blocks), but this creates the structures
necessary for converting them.
2016-07-12 15:47:26 -07:00
Eric Anholt ac772b24a1 vc4: Regularize instruction emit macros
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did.  This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
2016-07-04 16:33:22 -07:00
Eric Anholt 8f2af4763a vc4: Optimize out redundant SF updates.
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.

total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs:     1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs:     8568 -> 8523 (-0.53%)
2016-07-04 16:33:22 -07:00
Eric Anholt 200b4e4bd5 vc4: Move SF removal to a separate peephole pass.
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling.  We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.

No change on shader-db.
2016-07-04 16:33:22 -07:00
Eric Anholt 2a8973fb78 vc4: Mark texturing setup instructions as having side effects.
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm.  Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
2016-07-04 16:33:22 -07:00
Eric Anholt a1f698881e vc4: Add support for loading immediate values in QIR.
This will be used for resetting the uniform stream in the presence of
branching, but may also be useful as an optimization to reduce how many
uniforms we have to copy out per draw call (in exchange for increasing
icache pressure).
2016-05-06 10:25:55 -07:00
Eric Anholt 8e2d0843c0 vc4: Add a small QIR validate pass.
This has caught a couple of bugs during loop development so far, and I
should probably have written it long ago.
2016-05-06 10:25:55 -07:00
Eric Anholt daaa9d579d vc4: Fix the src count on exp2/log2.
Found by the upcoming QIR validate pass.
2016-05-06 10:25:55 -07:00
Eric Anholt d36b28402f vc4: Reuse QPU disasm's cond flags in QIR.
In the process, this made me flatten out the "%s%s%s%s" fprintf arguments.
2016-05-06 10:25:55 -07:00
Eric Anholt 84322b2f31 vc4: Remove the CSE pass.
It's not doing anything according to shader-db now that we're using NIR.
It would have had to be reworked significantly anyway, to handle control
flow.
2016-05-02 11:06:29 -07:00
Eric Anholt 30b818d5eb vc4: Move FRAG_X/Y/REV_FLAG to a QFILE like VPM or TLB color writes.
This gives us one less set of special instruction generation cases, and
instead just the case for returning the correct register to read.
2016-04-08 18:41:46 -07:00
Eric Anholt f029932cac vc4: Allow TLB Z/color/stencil writes from any ALU operation in QIR.
This lets us write the Z directly from the FTOI for computed Z, and may
let us coalesce color writes in the future.

No change in my shader-db, but clearly drops an instruction in piglit's
early-z test.
2016-04-08 18:41:46 -07:00
Eric Anholt 44d7b8ad12 vc4: Add a helper function for the construction of qregs.
The separate declaration of the struct is not helping clarity, and I was
going to be writing a whole lot more of these in the upcoming patches.
2016-04-08 18:41:45 -07:00
Eric Anholt 483c172989 vc4: Drop the multi_instruction distinction for QIR instructions.
It wasn't correctly flagged everywhere, and QPU generation now handles the
only remaining case that was paying attention to it.

No change on shader-db.
2016-04-08 18:41:45 -07:00
Eric Anholt 2b9f0dffe0 vc4: Move discard handling to the condition flag.
Now that the field exists in the instruction, we can make discards less
special.  As a bonus, that means that we should be able to merge some more
.sf instructions together when we get around to that.

This causes some scheduling changes, as it allows tlb_color_reads to be
delayed past the discard condition setup.  Since the tlb_color_read ends
up later, this may mean performance improvements, but I haven't tested.

total instructions in shared programs: 78114 -> 78035 (-0.10%)
instructions in affected programs:     1922 -> 1843 (-4.11%)
total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
estimated cycles in affected programs:     8200 -> 8211 (0.13%)
2016-03-16 11:28:47 -07:00
Eric Anholt 7c9fc43915 vc4: Don't make a temporary for setting flags.
The register allocator doesn't really do anything about the temp, so it
doesn't seem like it should matter.  However, the scheduler would think
that a new def is being created.

This doesn't change anything yet, but it avoids a bunch of regressions in
the next commit.
2016-03-16 11:28:34 -07:00
Eric Anholt b4f45f319c vc4: Add a safety check for setting flags.
If a pack was on the src reg, should it be a float, int, or mul unpack?
Just complain, instead.
2016-03-16 11:28:34 -07:00
Varad Gautam e103b52aec vc4: Coalesce instructions using VPM reads into the VPM read.
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.

shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs:     5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs:     5345 -> 4993 (-6.59%)

Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
2016-03-15 13:09:24 -07:00
Eric Anholt 655fa0f465 vc4: Don't treat conditional MOVs as raw MOV.
The two consumers want to know that the destination will be exactly the
source, which is not true if we might not set the destination.

Signed-off-by: Eric Anholt <eric@anholt.net>
2016-02-15 17:13:52 -08:00
Eric Anholt 71db7d3dc5 vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers.  By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs:     39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
2016-01-06 12:39:51 -08:00
Eric Anholt 0a89f307f9 vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
2016-01-06 12:39:27 -08:00
Eric Anholt 2591beef89 vc4: Fix handling of src packs on in qir_follow_movs().
The caller isn't going to expect it from a return, so it would probably
get misinterpreted.  If the caller had an unpack in its reg, that's fine,
but don't lose track of it.
2015-12-11 12:21:22 -08:00
Eric Anholt a97b40dca4 vc4: Add support for multisample framebuffer operations.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.

I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.
2015-12-08 09:49:54 -08:00
Eric Anholt 74c4b3b80c vc4: Add support for storing sample mask.
From the API perspective, writing 1 bits can't turn on pixels that were
off, so we AND it with the sample mask from the payload.
2015-12-04 09:23:55 -08:00
Eric Anholt a4bf28178f vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 17:45:23 -08:00
Eric Anholt 01ca4f207e vc4: Rewrite the pack instructions as a MOV with a dst pack flag
Another step in reducing the special-casing of instructions.
2015-10-26 16:48:34 -07:00
Eric Anholt 99a9a5a345 vc4: Switch the unpack ops to being unpack flags on a mov.
This paves the way for copy propagating our unpacks.  We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs:     19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
2015-10-26 16:48:34 -07:00
Eric Anholt 652a864b25 vc4: Fix up the test for whether the unpack can be from r4.
We can do 16a/16b from float as well.  No difference on shader-db.
2015-10-26 16:48:34 -07:00
Eric Anholt 3d7a088608 vc4: Don't try to follow MOVs across a pack. 2015-10-26 16:48:34 -07:00
Eric Anholt 0ccacfa017 vc4: If a QIR source has an unpack set, print it.
Not used yet, but will be.
2015-10-26 16:48:34 -07:00
Eric Anholt f09ed63f43 vc4: Fix the test for skipping raw MOVs.
I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c.  No change to shader-db.
2015-10-24 17:55:22 -07:00
Eric Anholt 8e701fda49 vc4: Add QIR/QPU support for the 8-bit vector instructions. 2015-10-23 18:11:21 +01:00
Eric Anholt fb064901e9 vc4: Use Rob's NIR-based user clip lowering. 2015-10-23 14:30:15 +01:00
Eric Anholt cfa980f493 vc4: convert from tgsi semantic/index to varying-slot
(originally part of previous patch, split out to separate patch by Rob)

v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-09-16 15:07:08 -04:00
Boyan Ding 48de40ce9c vc4: Initialize pack field of qreg to 0 in qir_get_temp
This avoids generation of undefined packing in qir and qpu instructions,
fixing a lot of rendering errors.

Fixes 8b36d107fd (vc4: Pack the unorm-packing bits into a src MUL
instruction when possible.)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-09-04 12:16:07 -07:00
Eric Anholt 89b1b33f44 vc4: Fold the 16-bit integer pack into the instructions generating it.
total instructions in shared programs: 97580 -> 96798 (-0.80%)
instructions in affected programs:     52826 -> 52044 (-1.48%)
2015-08-21 13:29:26 -07:00
Eric Anholt 7e0b868cf3 vc4: Reuse QPU dumping for packing bits in QIR. 2015-08-21 13:29:26 -07:00
Eric Anholt 4ae137534a vc4: Make _dest variants of qir ALU helpers to provide an explicit dest. 2015-08-21 13:29:26 -07:00