Commit Graph

12 Commits

Author SHA1 Message Date
Eric Engestrom 2c67457e5e util/list: rename LIST_ENTRY() to list_entry()
This follows the Linux kernel convention, and avoids collision with
macOS header macro.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6751
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6840
Cc: mesa-stable
Signed-off-by: Eric Engestrom <eric@igalia.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17772>
2022-07-28 10:10:44 +00:00
Vasily Khoruzhick 3b15fb3575 lima/ppir: implement gl_FragDepth support
Mali4x0 supports writing depth and stencil from fragment shader
and we've been using it quite a while for depth/stencil buffer reload.

The missing part was specifying output register for depth/stencil.
To figure it out, I changed reload shader to use register $4 as output
and poked RSW bits (or rather consecutive 4 bit groups) until tests
that rely on reload started to pass again.

It turns out that register number for gl_FragDepth/gl_FragStencil is in
rsw->depth_test and register number for gl_FragColor is in
rsw->multi_sample and it's repeated 4 times for some reason (likely for
MSAA?)

With this knowledge we now can modify ppir compiler to support multiple
store_output intrinsics.

To do that just add destination SSA for store_output to the registers
list for regalloc and mark them explicitly as output. Since it's never
read in shader we have to take care about it in liveness analysis -
basically just mark it alive from the time when it's written to the end
of the block. If it's live only in the last instruction, mark it as
live_internal, so regalloc doesn't clobber it.

Then just let regalloc do its job, and then copy register number to the
shader state and program it in RSW.

The tricky part is gl_FragStencil, since it resides in the same register
as gl_FragDepth and with the current design of the compiler it's hard to
merge them. However gl_FragStencil doesn't seem to be part of GL2
or GLES2, so we can just leave it not implemented.

Also we need to take care of stop bit for instructions - now we can't
just set it in every instruction that stores output, since there may be
several outputs. So if there's any store_output instructions in the
block just mark that block has a stop, and set stop bit in the last
instruction in the block. The only exception is discard - we always need
to set stop bit in discard instruction.

Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
2021-11-24 02:26:08 +00:00
Erico Nunes ab71c0ba44 lima/ppir: rework liveness data structures to bitset
The liveness code in ppir can be a bottleneck in complicated shaders
with multiple spills, and the use of the mesa set data structure is one
of the main reasons it is expensive to run.
With some changes, it can be adapted to using bitsets which makes it run
substantially faster.
ppir liveness can't run with a regular bitset for registers since we
need to track inviditual component masks for non-ssa registers, but we
can switch to using a separate packed bit array just for the masks,
rather than a full blown hash set. This also makes operations such as
liveness propagation much more straightforward.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9745>
2021-03-25 14:17:26 +00:00
Erico Nunes 2b83d99538 lima/ppir: remove use of live_out
The live_out set was serving mostly as a temporary storage of data
propagated from the next instructions.
If we do an early copy to preserve the instruction live_in set and
propagate the liveness information directly to the live_in set, we can
skip having a live_out set entirely.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9745>
2021-03-25 14:17:26 +00:00
Erico Nunes 55b1dbb1c1 lima/ppir: remove liveness info from blocks
liveness information was stored in blocks so that the algorithm doesn't
have to find a valid instruction to inherit liveness information from.
It turns out this part of the code can be a performance bottleneck in
applications loading with complicated shaders, so skip the liveness
information in blocks to reduce the number of times we have to copy
liveness information.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9745>
2021-03-25 14:17:26 +00:00
Vasily Khoruzhick 951788b560 lima/ppir: don't use list_length() in loop in regalloc and liveness analysis
list_length() complexity is O(n), so it's better to store number of regs
separately and use it instead of list_length().

Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9570>
2021-03-14 16:39:26 +00:00
Erico Nunes 8c47640731 lima/ppir: rework store output
In many cases, it is possible to avoid creating a mov for the store
output node.
Additionally, nodes other than alu, such as load varying, can be valid
store output nodes too.

This is another small optimization, but helps a vast majority of
programs by 1 instruction.
Shaders with discard easily become complicated to handle properly.
Some example issues: ppir has to rely on instruction ordering; or a
node with ssa output could be required both before a discard_if (as a
condition) and after it (as the instruction with the 'stop' bit set).
So don't try to handle them here.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4632>
2020-05-09 14:40:34 +02:00
Erico Nunes cef1c73634 lima/ppir: introduce liveness internal live set
The current solution for handling registers that live and die within a
single instruction does not handle all cases. In particular, these
intra-instruction use register also conflict with registers that are
part of the live_in set.
Unfortunately, adding them to the live_in set is not an easy solution as
that would cause them to be propagated upwards. So, add a separate set
to handle these registers in the particular instructions, without
propagating them.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
2020-05-09 11:30:07 +00:00
Erico Nunes ae0b8ba5d5 lima/ppir: fix src read mask swizzling
The src mask can't be calculated from the dest write_mask.
Instead, it must be calculated from the swizzled operators of the src.
Otherwise, liveness calculation may report incorrect live components for
non-ssa registers.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
2020-01-25 14:48:55 +01:00
Erico Nunes d6b1917c01 lima/ppir: handle write to dead registers in ppir
nir can output writes to dead registers when expanding vec4 operations
to non-ssa registers. In that case, some components of the vec4 may be
assigned but never read. These are also not currently removed by a nir
dead code elimination pass as they are not ssa.
In order to prevent regalloc from allocating a live register for this
operation, an interference must be assigned to it during liveness
analysis.

This workaround may be removed in the future if the assignments to dead
components can be removed earlier in ppir or nir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
2020-01-25 14:48:02 +01:00
Erico Nunes 9bf210ba98 lima/ppir: implement full liveness analysis for regalloc
The existing liveness analysis in ppir still ultimately relies on a
single continuous live_in and live_out range per register and was
observed to be the bottleneck for register allocation on complicated
examples with several control flow blocks.
The use of live_in and live_out ranges was fine before ppir got control
flow, but now it ends up creating unnecessary interferences as live_in
and live_out ranges may span across entire blocks after blocks get
placed sequentially.

This new liveness analysis implementation generates a set of live
variables at each program point; before and after each instruction and
beginning and end of each block.
This is a global analysis and propagates the sets of live registers
across blocks independently of their sequence.
The resulting sets optimally represent all variables that cannot share a
register at each program point, so can be directly translated as
interferences to the register allocator.

Special care has to be taken with non-ssa registers. In order to
properly define their live range, their alive components also need to be
tracked. Therefore ppir can't use simple bitsets to keep track of live
registers.

The algorithm uses an auxiliary set data structure to keep track of the
live registers. The initial implementation used only trivial arrays,
however regalloc execution time was then prohibitive (>1minute on
Cortex-A53) on extreme benchmarks with hundreds of instructions,
hundreds of registers and several spilling iterations, mostly due to the
n^2 complexity to generate the interferences from the live sets. Since
the live registers set are only a very sparse subset of all registers at
each instruction, iterating only over this subset allows it to run very
fast again (a couple of seconds for the same benchmark).

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
2020-01-15 22:55:31 +00:00
Vasily Khoruzhick 1cdf585613 lima/ppir: add better liveness analysis
Add better liveness analysis that was modelled after one in vc4.
It uses live ranges and is aware of multiple blocks which is prerequisite
for adding CF support

Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-08-24 08:17:31 -07:00