KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Rob Clark	a5ac36a75f	freedreno/a3xx/compiler: const file relative addressing Teach new compiler scheduling and register assignment how to deal with relative addressing. This gets us what we need to avoid falling back to old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp file relative addressing, although that is going to also need some cleverness in register assignment to keep arrays grouped together. NOTE: doing address calculation in full precision and then narrowing to s16 in the mov to addr reg seems to sometimes cause lockups (and sometimes work?!). It seems more reliable to do the address calculation in s16, like the blob does. Which means teaching RA how to deal with mixed half and full precision allocation. Fortunately that didn't turn out to be too hard, so that is a nice bonus which we could probably take better advantage of elsewhere. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-07-23 09:03:10 -04:00
Rob Clark	ee839cc6ef	freedreno/a3xx: deal with optimized tex instructions Keep track of whether we actually have any sam instructions in the resulting shader, rather than using TGSI SAMP declarations. If the sam instruction is optimized out, because the result is not used, we don't want to emit texture state, etc. In fact emitting sampler state and/or setting PIXLODENABLE bit when there are no texture fetches seems to cause lockup. In theory this should never happen for a "normal" shader, unless the state tracker is wonky. But it is a very real possibility for binning pass shaders. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-04-08 16:06:49 -04:00
Rob Clark	83808a90be	freedreno/a3xx/compiler: avoid negative register ids In some cases, we need a register to be assigned up to three components before the base. Since we can't have negative register #'s, just shift everything up. May increase register usage for trivial shaders, but I don't think we are shader limited in those cases. A proper solution is going to require a better register assignment algorithm (which is on the TODO list), this is just a hack to get us by until then. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-03-30 09:53:32 -04:00
Rob Clark	ae5efaf285	freedreno/a3xx: little extra debug Catch things which should not happen in debug builds. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-03-30 09:40:00 -04:00
Rob Clark	664045752f	freedreno/a3xx: add support for frag coord/face Fixes anything that tries to use gl_FrontFacing/gl_FragCoord. Also, face support is needed to emulate two sided color. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-03-02 11:26:35 -05:00
Rob Clark	76924e3b51	freedreno/a3xx: fix for unused inputs An unused input might not have a register assigned. We don't want bogus regid to result in impossibly high max_reg.. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-03-02 11:26:35 -05:00
Rob Clark	3f7239ca0e	freedreno/a3xx/compiler: half-precision output Using generic shaders caused a measurable fps drop, which was isolated to use of full precision (vs half precision) output. This is an attempt to regain that lost performance by using half precision solid/blit shaders (when the output format is not float32). Note: for the built-in shaders, I would not expect them to be register starved. And in fact it is the solid frag shader that seems to have the biggest impact. So I suspect you get double the pixel pipe units (or half the cycles) when the output is half precision. So there may be some gain to using half precision output for application shaders as well, even though the rest of register usage is still full precision. But for half precision to work for more complex shaders, we need to deal with some constraints, like cat2 needing same precision for it's two src registers. So for now it is not enabled by default except for the built-in shaders. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-23 14:58:24 -05:00
Rob Clark	9bbfae6265	freedreno/a3xx/compiler: collapse nop's with repeat Easier than making more extensive use of rpt, and the more compact shaders seem to bring some bit of performance boost. (Perhaps repeat flag benefits are more than just instruction cache, possibly it saves on instruction decode as well?) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-23 14:58:23 -05:00
Rob Clark	5993723471	freedreno/a3xx/compiler: scheduling/legalize fixes It seems the write-after-read hazard that applies to texture fetch instructions, also applies to sfu instructions. Also, cat5/cat6 instructions do not have a (ss) bit, so in these cases we need to insert a dummy nop instruction with (ss) bit set. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-19 12:01:26 -05:00
Rob Clark	d73b2c0517	freedreno/a3xx/compiler: use (ss) for WAR hazards Seems texture sample instructions don't immediately consume there src(s). In fact, some shaders from blob compiler seem to indiciate that it does not even count the texture sample instructions when calculating number of delay slots to fill for non-sample instructions. (Although so far it seems inconclusive as to whether this is required.) In particular, when a src register of a previous texture sample instruction is clobbered, the (ss) bit is needed to synchronize with the tex pipeline to ensure it has picked up the previous values before they are overwritten. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-16 08:17:23 -05:00
Rob Clark	e8cca57a3f	freedreno/a3xx/compiler: fix RA typo Was supposed to be a '+', otherwise we end up with a negative offset and choosing registers below the assigned range. This seems to fix the scheduling mystery "solved" by adding in extra delay slots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-16 08:17:23 -05:00
Rob Clark	579473f8f8	freedreno/a3xx/compiler: handle kill properly (new compiler) Since 'kill' does not produce a result, the new compiler was happily optimizing them out. We need to instead track 'kill's similar to outputs. But since there is no non-predicated kill instruction, (and for flattend if/else we do want them to be predicated), we need to track the topmost branch condition on the stack and use that as src arg to the kill. For a kill at the topmost level, we have to generate an immediate 1.0 to feed into the cmps.f for setting the predicate register. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-16 08:17:23 -05:00
Rob Clark	554f1ac00c	freedreno/a3xx/compiler: new compiler The new compiler generates a dependency graph of instructions, including a few meta-instructions to handle PHI and preserve some extra information needed for register assignment, etc. The depth pass assigned a weight/depth to each node (based on sum of instruction cycles of a given node and all it's dependent nodes), which is used to schedule instructions. The scheduling takes into account the minimum number of cycles/slots between dependent instructions, etc. Which was something that could not be handled properly with the original compiler (which was more of a naive TGSI translator than an actual compiler). The register assignment is currently split out as a standalone pass. I expect that it will be replaced at some point, once I figure out what to do about relative addressing (which is currently the only thing that should cause fallback to old compiler). There are a couple new debug options for FD_MESA_DEBUG env var: optmsgs - enable debug prints in optimizer optdump - dump instruction graph in .dot format, for example: http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot At this point, thanks to proper handling of instruction scheduling, the new compiler fixes a lot of things that were broken before, and does not appear to break anything that was working before[1]. So even though it is not finished, it seems useful to merge it in it's current state. [1] Not merged in this commit, because I'm not sure if it really belongs in mesa tree, but the following commit implements a simple shader emulator, which I've used to compare the output of the new compiler to the original compiler (ie. run it on all the TGSI shaders dumped out via ST_DEBUG=tgsi with various games/apps): `163b6306b1` Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-02-03 18:26:53 -05:00

13 Commits