KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Anholt	f09ed63f43	vc4: Fix the test for skipping raw MOVs. I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.	2015-10-24 17:55:22 -07:00
Eric Anholt	8e701fda49	vc4: Add QIR/QPU support for the 8-bit vector instructions.	2015-10-23 18:11:21 +01:00
Eric Anholt	fb064901e9	vc4: Use Rob's NIR-based user clip lowering.	2015-10-23 14:30:15 +01:00
Eric Anholt	cfa980f493	vc4: convert from tgsi semantic/index to varying-slot (originally part of previous patch, split out to separate patch by Rob) v2: squash in some fixes from Eric v3: Another fix from Eric for point coords. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-09-16 15:07:08 -04:00
Boyan Ding	48de40ce9c	vc4: Initialize pack field of qreg to 0 in qir_get_temp This avoids generation of undefined packing in qir and qpu instructions, fixing a lot of rendering errors. Fixes `8b36d107fd` (vc4: Pack the unorm-packing bits into a src MUL instruction when possible.) Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-09-04 12:16:07 -07:00
Eric Anholt	89b1b33f44	vc4: Fold the 16-bit integer pack into the instructions generating it. total instructions in shared programs: 97580 -> 96798 (-0.80%) instructions in affected programs: 52826 -> 52044 (-1.48%)	2015-08-21 13:29:26 -07:00
Eric Anholt	7e0b868cf3	vc4: Reuse QPU dumping for packing bits in QIR.	2015-08-21 13:29:26 -07:00
Eric Anholt	4ae137534a	vc4: Make _dest variants of qir ALU helpers to provide an explicit dest.	2015-08-21 13:29:26 -07:00
Eric Anholt	8b36d107fd	vc4: Pack the unorm-packing bits into a src MUL instruction when possible. Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's only used by the unorm packing and just stuff the pack bits into it. total instructions in shared programs: 98136 -> 97974 (-0.17%) instructions in affected programs: 4149 -> 3987 (-3.90%)	2015-08-20 23:43:04 -07:00
Eric Anholt	572a48366d	vc4: Add a QIR helper for whether the op is a MUL type.	2015-08-20 23:42:59 -07:00
Eric Anholt	98728ce071	vc4: Switch QPU_PACK_SCALED to be two non-SSA instructions. total instructions in shared programs: 98159 -> 98136 (-0.02%) instructions in affected programs: 12279 -> 12256 (-0.19%)	2015-08-20 23:42:45 -07:00
Eric Anholt	69ef08d303	vc4: Make the pack-to-unorm instructions be non-SSA. This helps ensure that the register allocator doesn't force the later pack operations to insert extra MOVs. total instructions in shared programs: 98170 -> 98159 (-0.01%) instructions in affected programs: 2134 -> 2123 (-0.52%)	2015-08-20 23:42:17 -07:00
Eric Anholt	0bba4fa070	vc4: Allow QIR registers to be non-SSA. Now that we have NIR, most of the optimization we still need to do is peepholes on instruction selection rather than general dataflow operations. This means we want to be able to have QIR be a lot closer to the actual QPU instructions, just with virtual registers. Allowing multiple instructions writing the same register opens up a lot of possibilities.	2015-08-20 23:40:22 -07:00
Eric Anholt	cc8fb29046	vc4: Make r4-writes implicitly move to a temp, and allocate temps to r4. Previously, SFU values always moved to a temporary, and TLB color reads and texture reads always lived in r4. Instead, we can have these results just be normal temporaries, and the register allocator can leave the values in r4 when they don't interfere with anything else using r4. shader-db results: total instructions in shared programs: 100809 -> 100040 (-0.76%) instructions in affected programs: 42383 -> 41614 (-1.81%)	2015-08-04 17:19:01 -07:00
Eric Anholt	78c773bb36	vc4: Convert from simple_list.h to list.h list.h is a nicer and more familiar set of list functions/macros.	2015-05-29 22:09:53 -07:00
Eric Anholt	73e2d4837d	vc4: Convert to consuming NIR. NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)	2015-04-01 10:57:01 -07:00
Eric Anholt	8c5dcdbccb	vc4: Add a constant folding pass. This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)	2015-03-30 12:57:45 -07:00
Eric Anholt	85316d059c	vc4: Keep an array of pointers to instructions defining the temps around. The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of.	2015-02-19 23:35:17 -08:00
Eric Anholt	877b48a531	vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h. I may want them in optimization passes, and they're not really particular to the program translation stage.	2015-02-19 23:35:17 -08:00
Eric Anholt	14dc281c13	vc4: Enforce one-uniform-per-instruction after optimization. This lets us more intelligently decide which uniform values should be put into temporaries, by choosing the most reused values to push to temps first. total uniforms in shared programs: 13457 -> 13433 (-0.18%) uniforms in affected programs: 1524 -> 1500 (-1.57%) total instructions in shared programs: 40198 -> 40019 (-0.45%) instructions in affected programs: 6027 -> 5848 (-2.97%) I noticed this opportunity because with the NIR work, some programs were happening to make different uniform copy propagation choices that significantly increased instruction counts.	2015-02-19 23:35:17 -08:00
Eric Anholt	3f1e1287fd	vc4: Make SF be a flag on the QIR instructions. Right now the places that used to emit a mov.sf just put the SF on the previous instruction when it generated the source of the SF value. Even without optimization to push the sf up further (and kill thus potentially kill more MOVs), this gets us: total uniforms in shared programs: 13455 -> 13457 (0.01%) uniforms in affected programs: 3 -> 5 (66.67%) total instructions in shared programs: 40296 -> 40198 (-0.24%) instructions in affected programs: 12595 -> 12497 (-0.78%)	2015-02-12 16:33:16 -08:00
Eric Anholt	12ebd7e20e	vc4: Dump the VPM read index in QIR disasm. Since the VPM reads have to be in order, it's useful to see their indices in the dump.	2015-02-01 12:53:08 -08:00
Eric Anholt	d70eb38517	gallium: Replace u_simple_list.h with util/simple_list.h The code was exactly the same, except util/ has c++ guards and a struct simple_node declaration. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-01-28 16:33:34 -08:00
Eric Anholt	772c47aefe	vc4: Move the tests for src needing to be an A register to vc4_qir.c. I want it from another location.	2015-01-15 22:19:25 +13:00
Eric Anholt	c772c92153	vc4: Split two notions of instructions having side effects. Some ops can't be DCEd, while some of the ops that are just important due to the args they have can be.	2015-01-10 15:24:46 +13:00
Eric Anholt	a58ae83882	vc4: Redo VPM reads as a read file. This will let us do copy propagation of the VPM reads.	2015-01-10 14:35:24 +13:00
Eric Anholt	72cb6619cb	vc4: Restructure color packing as a series of channel replacements. I'm using this in some WIP commits for doing blending in 8888 instead of vec4. But it also gives us these results immediately, thanks to allowing more uniforms/immediates in the arguments: total instructions in shared programs: 41027 -> 40960 (-0.16%) instructions in affected programs: 4381 -> 4314 (-1.53%)	2015-01-10 13:54:12 +13:00
Eric Anholt	e06b0778f5	vc4: Coalesce MOVs into VPM with the instructions generating the values. total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)	2014-12-18 15:00:56 -08:00
Eric Anholt	a871eff16c	vc4: Redefine VPM writes as a (destination) QIR register file. This will let me coalesce the VPM writes into the instructions generating the values.	2014-12-17 22:35:08 -08:00
Eric Anholt	e473fbe469	vc4: Add support for turning constant uniforms into small immediates. Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.	2014-12-17 19:35:13 -08:00
Eric Anholt	ff266483fb	vc4: Move follow_movs() to common QIR code. I want this from other passes.	2014-12-17 19:05:52 -08:00
Eric Anholt	48a2154520	vc4: Add support for 16-bit signed/unsigned norm/scaled vertex attrs.	2014-12-15 14:33:01 -08:00
Eric Anholt	2142fd1f6f	vc4: Add support for 8-bit unnormalized vertex attrs.	2014-12-15 14:33:00 -08:00
Eric Anholt	8e678de761	vc4: Rename UNPACK_8* to UNPACK_8*_F. There is an equivalent unpack function without conversion to float if you use an integer operation instead.	2014-12-15 14:28:23 -08:00
Eric Anholt	dadc32ac80	vc4: Allow dead code elimination of color reads. This might happen if the blending functions are set up to not actually use the destination color/alpha, for example.	2014-12-05 10:43:14 -08:00
Eric Anholt	f87c700895	vc4: Add support for ARL and indirect register access on TGSI_FILE_CONSTANT. Fixes 14 ARB_vp tests (which had no lowering done), and should improve performance of indirect uniform array access in GLSL.	2014-10-28 17:16:05 -07:00
Eric Anholt	52824811b9	vc4: Allow dead code elimination of unused varyings. total instructions in shared programs: 39022 -> 37341 (-4.31%) instructions in affected programs: 26979 -> 25298 (-6.23%) total uniforms in shared programs: 11242 -> 10523 (-6.40%) uniforms in affected programs: 5836 -> 5117 (-12.32%)	2014-10-24 18:04:26 +01:00
Eric Anholt	201d4c0b2a	vc4: Add support for user clip plane and gl_ClipVertex. Fixes about 15 piglit tests about interpolation and clipping.	2014-10-15 18:11:46 +01:00
Eric Anholt	d7a0502a54	vc4: Add support for the FACE semantic. Fixes glsl-fs-frontfacing.	2014-10-01 17:03:35 -07:00
Eric Anholt	64122b16ce	vc4: Dump constant uniform values in VC4_DEBUG=qir. Definitely helps when trying to understand and optimize a program.	2014-09-29 11:33:34 -07:00
Eric Anholt	2e48b286bf	vc4: Add support for 8-bit unorm/snorm vertex inputs.	2014-09-23 13:40:10 -07:00
Eric Anholt	dcd03e7476	vc4: Use the same method as for FRAG_Z to handle fragcoord W. I need to get the non-reciprocal version of W for interpolation, anyway.	2014-09-19 11:09:04 -07:00
Eric Anholt	19589147ef	vc4: Add support for stencil operations. While depth test state is passed through the fragment shader as sideband, data, the stencil test state has to be set by the fragment shader itself. Many tests are still failing, but this gets most of hiz/ passing.	2014-09-18 17:46:43 -07:00
Eric Anholt	44b8eb743d	vc4: Allow dead code elimination of instructions that read uniforms.	2014-09-17 14:21:24 -07:00
Eric Anholt	2264925f85	vc4: Add support for computed depth writes. Fixes piglit glsl-1.10-fragdepth and early-z.	2014-09-16 13:03:41 -07:00
Eric Anholt	aae4223fbd	vc4: Restructure depth input/output in fragment shaders. The goal here is to have an argument for the depth write opcode so that I can do computed depth. In the process, this makes the calculations that will be emitted more obvious in the QIR.	2014-09-16 13:03:32 -07:00
Eric Anholt	2147dd9681	vc4: Fix memory leaks of struct qinst.	2014-09-15 13:12:27 -07:00
Eric Anholt	f78ee1b280	vc4: Fix memory leaks of some vc4_compile contents.	2014-09-15 13:12:27 -07:00
Eric Anholt	d952a98c53	vc4: Expose r4 to register allocation. We potentially need to be careful that use of a value stored in r4 isn't copy-propagated (or something) across another r4 write. That doesn't appear to happen currently, and this makes the dataflow more obvious. It also opens up not unpacking the r4 value, which will be useful for depth textures.	2014-09-09 20:38:39 -07:00
Eric Anholt	4bca922878	vc4: Merge qcompile and tgsi_to_qir The split between these two didn't make much sense. I'm going to want the chance to look at uniform contents in optimization passes, and the QPU emit I think is going to end up rewriting the uniforms stream.	2014-09-04 17:00:54 -07:00

1 2

67 Commits