KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Rob Clark	6514b4e3fd	freedreno/ir3: print array live ranges This is also useful to see if optmsgs are enabled. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-03-31 15:09:42 -04:00
Rob Clark	242a8a1957	freedreno/ir3: remove ir3 phi instruction Now that we convert phi webs to ssa, we can drop all this. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	1b658533e1	freedreno/ir3: extend liverange of arrays Use livein state of other blocks to extend liverange of arrays when they are still needed by successor blocks. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	2bc3fb6992	freedreno/ir3: a couple more array fixes (Plus a couple TODOs) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-02-10 14:54:58 -05:00
Rob Clark	15fe9b2347	freedreno/ir3: add 'high' register class For compute shaders, we need to be able to allocate some "high" registers (r48.x to r55.w). (Possibly these are global to all threads in a warp?) Add a new register class to handle this. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-04-17 14:00:05 -04:00
Rob Clark	1f04d4bf59	freedreno/ir3: fix # of registers The instruction encoding allows for more registers, but at least on a3xx/a4xx they don't actually exist. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-04 11:25:55 -04:00
Rob Clark	610837fb98	freedreno/ir3: fix small RA bug Normally the offset in the group would be the same, but not always. For example, in a sam(w) which only writes the 4th component. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-04-24 13:40:57 -04:00
Rob Clark	f8feb97ba5	freedreno/ir3: fix silly brain-fart in RA We want to consider all the vars, not 1/32nd of them, when extending live-ranges. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-04-04 20:18:18 -04:00
Rob Clark	d47fb856af	freedreno/ir3: add dumping for use/def/live-in/live-out Turned out to be useful to debug an issue in RA. Let's keep it. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-04-04 20:18:18 -04:00
Rob Clark	19739e4fb9	freedreno/ir3: remove ir3_instruction::category Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-04-04 20:18:18 -04:00
Bernhard Rosenkränzer	e86ba7844f	freedreno/ir3: Get rid of nested functions This allows building Freedreno with clang Signed-off-by: Bernhard Rosenkränzer <bero@linaro.org> Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-02-10 11:26:48 -05:00
Rob Clark	2a6ec1e061	freedreno/ir3: better array register allocation Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:23:52 -05:00
Rob Clark	fad158a0e0	freedreno/ir3: array rework Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:21:08 -05:00
Rob Clark	fc0d2f7e02	freedreno/ir3: bit of ra refactor Shuffle things slightly, passing instr-data to ra_name() to reduce the number of places where we need to add support for array names. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:47 -05:00
Rob Clark	d430f443de	freedreno/ir3: cosmetic de-indent Collapse two nested if's into one to reduce indent level. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-16 14:18:33 -05:00
Rob Clark	8e52344dc1	freedreno/ir3: rename ir3_block::bd We'll need to add similar for ir3_instruction, but following the pattern to use 'id' seems confusing. Let's just go w/ generic 'data' as the name. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-12-04 10:27:09 -05:00
Rob Clark	2181f2cd58	freedreno/ir3: use instr flag to mark unused instructions Rather than magic depth value, which won't be available in later stages. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-11-26 12:35:10 -05:00
Jason Ekstrand	f01bdb0484	util/ra: Make allocating conflict lists optional Since i965 is now using make_reg_conflicts_transitive and doesn't need q-value computations, they are disabled on i965. They are enabled everywhere else so that they get the old behavior. This reduces the time spent in eglInitialize() on BDW by around 10-15%. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-08-18 17:48:53 -07:00
Rob Clark	a1a6f00782	freedreno/ir3/ra: fix failed assert for a0/p0 The address and predicate register are special, they don't get assigned in RA. So do a better job of ignoring them rather than hitting later asserts. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-07-10 11:57:31 -04:00
Rob Clark	00b6b41482	freedreno/ir3: cache defining instruction It is silly to traverse back to find first instruction that writes part of a larger "virtual" register many times per instruction (plus per use as a src to later instructions). Cache this information so we only figure it out once. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-30 12:13:44 -04:00
Rob Clark	906da49527	freedreno/ir3: fix RA issue with fanin The fanin source could be grouped, for example with shaders like: VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], GENERIC[9] DCL SAMP[0] DCL SVIEW[0], 2D, FLOAT DCL TEMP[0], LOCAL 0: MOV TEMP[0].xy, IN[1].xyyy 1: MOV TEMP[0].w, IN[1].wwww 2: TXF TEMP[0], TEMP[0], SAMP[0], 2D 3: MOV OUT[1], TEMP[0] 4: MOV OUT[0], IN[0] 5: END The second arg to the isaml is IN[1].w, so we need to look at the fanin source to get the correct offset. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-30 12:13:44 -04:00
Rob Clark	1370fde8af	freedreno/ir3: fix crash in RA Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-30 12:13:44 -04:00
Rob Clark	bb2c4b68f7	freedreno/ir3: fixes for indirect writes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-30 12:13:44 -04:00
Rob Clark	66a93a0ff9	freedreno/ir3: pass sz to split_dest() For query_levels, we generate a getinfo with writemask of (z), which RA will consider as size==3. But we were still generating four fanouts. Which meant that RA would see it as two different register classes, depending on the path to definer. Ie. on the getinfo instruction itself it would see size==3, but when chasing back through the fanouts it would see size==4. Easiest way to solve that is to just generate the chain of neighboring fanouts to have the correct size in the first place. Note: we may eventually want split_dest() to take start/end or wrmask instead, since really we only need size==1. But RA is not clever enough for that, query_levels is not that common, and the other two registers that get allocated are never used so those register slots can be immediately re-used. So bunch of work for probably no real gain. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-21 08:01:12 -04:00
Rob Clark	457f7c2a2a	freedreno/ir3: block reshuffling and loops! This shuffles things around to allow the shader to have multiple basic blocks. We drop the entire CFG structure from nir and just preserve the blocks. At scheduling we know whether to schedule conditional branches or unconditional jumps at the end of the block based on the # of block successors. (Dropping jumps to the following instruction, etc.) One slight complication is that variables (load_var/store_var, ie. arrays) are not in SSA form, so we have to figure out where to put the phi's ourself. For this, we use the predecessor set information from nir_block. (We could perhaps use NIR's dominance frontier information to help with this?) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-21 07:54:38 -04:00
Rob Clark	c8fb5f8a01	freedreno/ir3: move inputs/outputs to shader These belong in the shader, rather than the block. Mostly a lot of churn and nothing too interesting. But splitting this out from the rest of ir3_block reshuffling to cut down the noise in the later patch. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-21 07:54:04 -04:00
Rob Clark	d52fb2f5ad	freedreno/ir3/ra: use register_allocate Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-21 07:53:58 -04:00
Rob Clark	adf1659ff5	freedreno/ir3: use standard list implementation Use standard list_head double-linked list and related iterators, helpers, etc, rather than weird combo of instruction array and next pointers depending on stage. Now block has an instrs_list. In certain stages where we want to remove and re-add to the blocks list we just use list_replace() to copy the list to a new list_head. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-21 07:53:09 -04:00
Rob Clark	67d994c676	freedreno/ir3: drop dot graph dumping At least for now.. right now the instruction and instruction list printing should suffice, and the re-working of ir3_block would require a lot of changes in that code. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-06-21 07:52:58 -04:00
Rob Clark	060d349920	freedreno/ir3: relative dst To simplify RA, assign arrays that are written to first. Since enough dependency information is in the graph to preserve order of reads and writes of array, so all SSA names for the array collapse into one, just assign the entire thing by array-id. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	17754b70d7	freedreno/ir3: drop deref nodes The meta-deref instruction doesn't really do what we need for relative destination. Instead, since each instruction can reference at most a single address value, track the dependency on the address register via instr->address. This lets us express the dependency regardless of whether it is used for dst and/or src. The foreach_ssa_src{_n} iterator macros now also iterates the address register so, at least in SSA form, the address register behaves as an additional virtual src to the instruction. Which is pretty much what we want, as far as scheduling/etc. TODO: For now, the foreach_src{_n} iterators are unchanged. We could wrap the address in an ir3_register and make the foreach_src_{_n} iterators behave the same way. But that seems unnecessary at this point, since we mainly care about the address dependency when in SSA form. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	f8f7548f46	freedreno/ir3: helpful iterator macros I remembered that we are using c99.. which makes some sugary iterator macros easier. So introduce iterator macros to iterate all src registers and all SSA src instructions. The _n variants also return the src #, since there are a handful of places that need this. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	e7026ac486	freedreno/ir3: fix pos_regid > max_reg We can't (or don't know how to) turn this off. But it can end up being stored to a higher reg # than what the shader uses, leading to corruption. Also we currently aren't clever enough to turn off frag_coord/frag_face if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org this a bit so both vp and fp reg footprint fixup are called by a common fxn used also by ir3_cmdline. Also add a few more output lines for ir3_cmdline to make it easier to see what is going on. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	1e5c207dba	freedreno/ir3: start on indirect gpr reads Handle TEMP[ADDR[]] src registers by generating a fanin to group array elements, similarly to how texture fetch instructions work. NOTE: For all the scalar instructions generated for a single tgsi vector operation which uses an array src (or possibly even uses the same array as multiple srcs), re-use the same fanin node. Since a vector operation operates on all components at the same time, it should never see more than one version of the same array. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	9a9f2a893b	freedreno/ir3: simplify RA Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	212b909643	freedreno/ir3: runtime enable RA debug for DEBUG builds Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	f332cf92b6	freedreno/ir3: split out legalize pass Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-23 19:53:01 -05:00
Rob Clark	4097ef6ee8	freedreno/ir3: ra debug Some compile time RA debug Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-23 19:53:01 -05:00
Rob Clark	af4d088395	freedreno/ir3: fix lockups with lame FRAG shaders Shaders like: FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL IN[0], GENERIC[0], PERSPECTIVE DCL OUT[0], COLOR DCL SAMP[0] DCL TEMP[0], LOCAL IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000} 0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D 1: MOV OUT[0], IMM[0].xyxx 2: END cause unhappyness. They have an IN[], but once this is compiled the useless TEX instruction goes away. Leaving a varying that is never fetched, which makes the hw unhappy. In the process fix a signed vs unsigned compare. If the vertex shader has max_reg=-1, MAX2() vs an unsigned would not give the desired result. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-10-03 14:19:52 -04:00
Rob Clark	a2c22d80d4	freedreno/ir3: fix potential segfault in RA Triggered by shaders like: FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL OUT[0], COLOR DCL CONST[0] DCL TEMP[0..2], LOCAL 0: IF CONST[0].xxxx :0 1: MOV TEMP[0], TEMP[1] 2: ELSE :0 3: MOV TEMP[0], TEMP[2] 4: ENDIF 5: MOV OUT[0], TEMP[0] 6: END not really a sane shader, although driver segfaulting is probably not the appropriate response. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-09-09 19:42:18 -04:00
Rob Clark	db193e5ad0	freedreno/ir3: split out shader compiler from a3xx Move the bits we want to share between generations from fd3_program to ir3_shader. So overall structure is: fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3 \|- ... \- ir3_shader_variant -> ir3 So the ir3_shader becomes the topmost generation neutral object, which manages the set of variants each of which generates, compiles, and assembles it's own ir. There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/, etc. Keep the split between the gallium level stateobj and the shader helper object because it might be a good idea to pre-compute some generation specific register values (ie. anything that is independent of linking). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-07-25 13:29:28 -04:00

41 Commits