Commit Graph

41 Commits

Author SHA1 Message Date
Rob Clark 6514b4e3fd freedreno/ir3: print array live ranges
This is also useful to see if optmsgs are enabled.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-03-31 15:09:42 -04:00
Rob Clark 242a8a1957 freedreno/ir3: remove ir3 phi instruction
Now that we convert phi webs to ssa, we can drop all this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark 1b658533e1 freedreno/ir3: extend liverange of arrays
Use livein state of other blocks to extend liverange of arrays when they
are still needed by successor blocks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark 2bc3fb6992 freedreno/ir3: a couple more array fixes
(Plus a couple TODOs)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-10 14:54:58 -05:00
Rob Clark 15fe9b2347 freedreno/ir3: add 'high' register class
For compute shaders, we need to be able to allocate some "high"
registers (r48.x to r55.w).  (Possibly these are global to all threads
in a warp?)  Add a new register class to handle this.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-17 14:00:05 -04:00
Rob Clark 1f04d4bf59 freedreno/ir3: fix # of registers
The instruction encoding allows for more registers, but at least on
a3xx/a4xx they don't actually exist.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-04 11:25:55 -04:00
Rob Clark 610837fb98 freedreno/ir3: fix small RA bug
Normally the offset in the group would be the same, but not always.  For
example, in a sam(w) which only writes the 4th component.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-24 13:40:57 -04:00
Rob Clark f8feb97ba5 freedreno/ir3: fix silly brain-fart in RA
We want to consider all the vars, not 1/32nd of them, when extending
live-ranges.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark d47fb856af freedreno/ir3: add dumping for use/def/live-in/live-out
Turned out to be useful to debug an issue in RA.  Let's keep it.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark 19739e4fb9 freedreno/ir3: remove ir3_instruction::category
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Bernhard Rosenkränzer e86ba7844f freedreno/ir3: Get rid of nested functions
This allows building Freedreno with clang

Signed-off-by: Bernhard Rosenkränzer <bero@linaro.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-02-10 11:26:48 -05:00
Rob Clark 2a6ec1e061 freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:52 -05:00
Rob Clark fad158a0e0 freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:08 -05:00
Rob Clark fc0d2f7e02 freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:47 -05:00
Rob Clark d430f443de freedreno/ir3: cosmetic de-indent
Collapse two nested if's into one to reduce indent level.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:33 -05:00
Rob Clark 8e52344dc1 freedreno/ir3: rename ir3_block::bd
We'll need to add similar for ir3_instruction, but following the pattern
to use 'id' seems confusing.  Let's just go w/ generic 'data' as the
name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-12-04 10:27:09 -05:00
Rob Clark 2181f2cd58 freedreno/ir3: use instr flag to mark unused instructions
Rather than magic depth value, which won't be available in later stages.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-11-26 12:35:10 -05:00
Jason Ekstrand f01bdb0484 util/ra: Make allocating conflict lists optional
Since i965 is now using make_reg_conflicts_transitive and doesn't need
q-value computations, they are disabled on i965.  They are enabled
everywhere else so that they get the old behavior.  This reduces the time
spent in eglInitialize() on BDW by around 10-15%.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-18 17:48:53 -07:00
Rob Clark a1a6f00782 freedreno/ir3/ra: fix failed assert for a0/p0
The address and predicate register are special, they don't get assigned
in RA.  So do a better job of ignoring them rather than hitting later
asserts.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-07-10 11:57:31 -04:00
Rob Clark 00b6b41482 freedreno/ir3: cache defining instruction
It is silly to traverse back to find first instruction that writes part
of a larger "virtual" register many times per instruction (plus per use
as a src to later instructions).  Cache this information so we only
figure it out once.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-30 12:13:44 -04:00
Rob Clark 906da49527 freedreno/ir3: fix RA issue with fanin
The fanin source could be grouped, for example with shaders like:

    VERT
    DCL IN[0]
    DCL IN[1]
    DCL OUT[0], POSITION
    DCL OUT[1], GENERIC[9]
    DCL SAMP[0]
    DCL SVIEW[0], 2D, FLOAT
    DCL TEMP[0], LOCAL
      0: MOV TEMP[0].xy, IN[1].xyyy
      1: MOV TEMP[0].w, IN[1].wwww
      2: TXF TEMP[0], TEMP[0], SAMP[0], 2D
      3: MOV OUT[1], TEMP[0]
      4: MOV OUT[0], IN[0]
      5: END

The second arg to the isaml is IN[1].w, so we need to look at the fanin
source to get the correct offset.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-30 12:13:44 -04:00
Rob Clark 1370fde8af freedreno/ir3: fix crash in RA
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-30 12:13:44 -04:00
Rob Clark bb2c4b68f7 freedreno/ir3: fixes for indirect writes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-30 12:13:44 -04:00
Rob Clark 66a93a0ff9 freedreno/ir3: pass sz to split_dest()
For query_levels, we generate a getinfo with writemask of (z), which RA
will consider as size==3.  But we were still generating four fanouts.
Which meant that RA would see it as two different register classes,
depending on the path to definer.  Ie. on the getinfo instruction itself
it would see size==3, but when chasing back through the fanouts it would
see size==4.

Easiest way to solve that is to just generate the chain of neighboring
fanouts to have the correct size in the first place.

Note: we may eventually want split_dest() to take start/end or wrmask
instead, since really we only need size==1.  But RA is not clever enough
for that, query_levels is not that common, and the other two registers
that get allocated are never used so those register slots can be
immediately re-used.  So bunch of work for probably no real gain.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 08:01:12 -04:00
Rob Clark 457f7c2a2a freedreno/ir3: block reshuffling and loops!
This shuffles things around to allow the shader to have multiple basic
blocks.  We drop the entire CFG structure from nir and just preserve the
blocks.  At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors.  (Dropping jumps to the following instruction, etc.)

One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself.  For this, we use the predecessor set information from
nir_block.  (We could perhaps use NIR's dominance frontier information
to help with this?)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:54:38 -04:00
Rob Clark c8fb5f8a01 freedreno/ir3: move inputs/outputs to shader
These belong in the shader, rather than the block.  Mostly a lot of
churn and nothing too interesting.  But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:54:04 -04:00
Rob Clark d52fb2f5ad freedreno/ir3/ra: use register_allocate
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:58 -04:00
Rob Clark adf1659ff5 freedreno/ir3: use standard list implementation
Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage.  Now block has an instrs_list.  In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:53:09 -04:00
Rob Clark 67d994c676 freedreno/ir3: drop dot graph dumping
At least for now.. right now the instruction and instruction list
printing should suffice, and the re-working of ir3_block would require
a lot of changes in that code.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-06-21 07:52:58 -04:00
Rob Clark 060d349920 freedreno/ir3: relative dst
To simplify RA, assign arrays that are written to first.  Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 17754b70d7 freedreno/ir3: drop deref nodes
The meta-deref instruction doesn't really do what we need for relative
destination.  Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address.  This lets us express the dependency regardless of
whether it is used for dst and/or src.

The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction.  Which is pretty much what
we want, as far as scheduling/etc.

TODO:
For now, the foreach_src{_n} iterators are unchanged.  We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way.  But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark f8f7548f46 freedreno/ir3: helpful iterator macros
I remembered that we are using c99.. which makes some sugary iterator
macros easier.  So introduce iterator macros to iterate all src
registers and all SSA src instructions.  The _n variants also return
the src #, since there are a handful of places that need this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark e7026ac486 freedreno/ir3: fix pos_regid > max_reg
We can't (or don't know how to) turn this off.  But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.

Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg.  Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline.  Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark 1e5c207dba freedreno/ir3: start on indirect gpr reads
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.

NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node.  Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark 9a9f2a893b freedreno/ir3: simplify RA
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers.  This lets us simplify RA by
working in terms of neighbor groups.

NOTE: has the slight problem that it can't optimize out mov's for things
like:

  MOV OUT[n], IN[m]

To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them).  This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark 212b909643 freedreno/ir3: runtime enable RA debug for DEBUG builds
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-01-07 19:37:28 -05:00
Rob Clark f332cf92b6 freedreno/ir3: split out legalize pass
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-23 19:53:01 -05:00
Rob Clark 4097ef6ee8 freedreno/ir3: ra debug
Some compile time RA debug

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-23 19:53:01 -05:00
Rob Clark af4d088395 freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:

  FRAG
  PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
  DCL IN[0], GENERIC[0], PERSPECTIVE
  DCL OUT[0], COLOR
  DCL SAMP[0]
  DCL TEMP[0], LOCAL
  IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
    0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
    1: MOV OUT[0], IMM[0].xyxx
    2: END

cause unhappyness.  They have an IN[], but once this is compiled the
useless TEX instruction goes away.  Leaving a varying that is never
fetched, which makes the hw unhappy.

In the process fix a signed vs unsigned compare.  If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 14:19:52 -04:00
Rob Clark a2c22d80d4 freedreno/ir3: fix potential segfault in RA
Triggered by shaders like:

  FRAG
  PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
  DCL OUT[0], COLOR
  DCL CONST[0]
  DCL TEMP[0..2], LOCAL
    0: IF CONST[0].xxxx :0
    1:   MOV TEMP[0], TEMP[1]
    2: ELSE :0
    3:   MOV TEMP[0], TEMP[2]
    4: ENDIF
    5: MOV OUT[0], TEMP[0]
    6: END

not really a sane shader, although driver segfaulting is probably
not the appropriate response.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-09-09 19:42:18 -04:00
Rob Clark db193e5ad0 freedreno/ir3: split out shader compiler from a3xx
Move the bits we want to share between generations from fd3_program to
ir3_shader.  So overall structure is:

  fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
                                    |- ...
                                    \- ir3_shader_variant -> ir3

So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.

There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.

Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-07-25 13:29:28 -04:00
Renamed from src/gallium/drivers/freedreno/a3xx/ir3_ra.c (Browse further)