Commit Graph

49 Commits

Author SHA1 Message Date
Kostiantyn Lazukin e9cc1633a2 util/ra: Fix numeric overflow during bitset allocation
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Kostiantyn Lazukin <kostiantyn.lazukin@globallogic.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5752
Fixes: d4a4cd20d5 ("util/ra: use adjacency matrix for undirected graph")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14224>
2021-12-19 13:10:26 -08:00
Kostiantyn Lazukin d4a4cd20d5 util/ra: use adjacency matrix for undirected graph
Since this graph is actually not oriented, its adjacency matrix can be
represented using less than half bits required by full adjacency matrix.
It reduces memory consumption and number of cache misses. It also simplifies
logic of growing this matrix - no need to touch adjacency bits for previously
allocated number of nodes.

Move adjacency bits from nodes to graph to reduce the number of allocations.

No changes to shader-db.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kostiantyn Lazukin <kostiantyn.lazukin@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14189>
2021-12-14 09:19:01 +00:00
Dave Airlie d051854cca treewide: drop mtypes/macros includes from main
These aren't required in lots of places, so remove them.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14127>
2021-12-08 22:14:45 +00:00
Caio Oliveira 6eb86efe91 util/ra: Fix deserialization of register sets
Set ra_class::regset and ra_class::index  when deserializing.

Fixes: 95d41a3525 ("ra: Use struct ra_class in the public API.")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13728>
2021-11-10 22:57:57 +00:00
Emma Anholt d6d7421e98 util/ra: Use the conflicting neighbor to skip unavailable registers.
Now that we have an idea of how many regs the conflicting allocation uses,
we can just skip to the next one and save repeated tests to find the same
conflicting neighbor again.

shadowrun-returns shader-db time on skl -1.62821% +/- 1.58079% (n=679),
now there's no statistically significant change from the start of the series
(n=420)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Eric Anholt 2d7bcdaf6b ra: Add fast-path support for register classes of contiguous regs.
In the fully general case of register classes, to expose an allocation
class of unaligned 2-contiguous-regs allocations, for example, you'd have
your base individual regs (128 on intel), and another set of 127 regs that
each conflicted with the corresponding pair of the base regs.  Single-reg
nodes would allocate in the 128, and double-reg nodes would allocate in
the 127 and the user would remap from the 127 down to the base regs with
some irritating table.

If you need many different contiguous allocation sizes (16 is a pretty
common number across drivers), your number of regs explodes, wasting
memory and making the q computation expensive at startup.

If all the user has is contiguous-reg classes, we can easily compute the q
value up front (as found in the intel driver and nouveau, for example),
and we only have to change a couple of places in the conflict-checking
logic so the contiguous-reg classes can use the base registers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Eric Anholt 95d41a3525 ra: Use struct ra_class in the public API.
All these unsigned ints are awful to keep track of.  Use pointers so we
get some type checking.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Eric Anholt 7e0127fb3e ra: Document that class index is allocated in order, use that in r300.
etnaviv also relies on this being the case, just drop the remapping.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Eric Anholt 3072318ab8 ra: Add a unit test.
This is mostly checking that we agree with a bit of the table from the
paper.  It proved quite useful as I was refactoring.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Jason Ekstrand 492d664be0 util/ra: Add [de]serialization support
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5019>
2020-05-13 23:36:44 +00:00
Eric Anholt 05e6f763e7 util/ra: Improve ra_set_finalize() performance.
BITSET_FOR_EACH_SET can walk a sparse set (such as a register class's set
of registers) much faster than just iterating over individual bits.

Improves freedreno startup time (as measured by shader-db ./run
shaders/closed/gputest/triangle on my x86 system) by -4.12679% +/-
1.99006% (n=151)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
2020-04-29 19:46:08 +00:00
Eric Anholt 53ac2dabec util/ra: Use util_dynarray for handling the conflict lists.
Again, shortens the code significantly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
2020-04-29 19:46:08 +00:00
Eric Anholt 57088854e6 util/ra: Use util_dynarray for the adjacency list.
This make the code significantly more readable, I think (along with
shorter).  Also, using util_dynarray_delete_unordered() saves us a move of
the rest of the list when removing adjacency on a node.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
2020-04-29 19:46:08 +00:00
Eric Anholt a1de267a21 util/ra: Sanity check that we're adding a valid reg to a class.
BITSET_SET might not segfault on you right away if you're just slightly
off, and an assert is nicer anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
2020-04-29 19:46:08 +00:00
Eric Anholt 5bcaf30aba util/ra: Sanity check that the driver selected a valid reg.
freedreno was returning -1 when it didn't pick a reg from the given bitset
due to an off-by-a-small-number error.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
2020-04-29 19:46:08 +00:00
Dylan Baker 8e3696137f remove final imports.h and imports.c bits
This moves the fi_types to a new mesa_private.h and removes the
imports.c file. The vast majority of this patch is just removing
pound includes of imports.h and fixing up the recursive includes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:04 -07:00
Marek Olšák 76f79db3f5 util: stop including files from mesa/main
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4324>
2020-03-27 21:00:09 +00:00
Marek Olšák e5339fe4a4 Move compiler.h and imports.h/c from src/mesa/main into src/util
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4324>
2020-03-27 21:00:09 +00:00
Rob Clark dd2e050a84 util/ra: move NO_REG to header
In the select_reg callback, I want to be able to determine if a given
node is already assigned, and if so what physical register has been
assigned.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
2020-03-10 16:01:39 +00:00
Rob Clark 36aed70b59 util/ra: spiff out select_reg_callback
Add a parameter so the callback can know which node it is selecting a
register for.  And remove the graph parameter, as it is unused by
existing users, and somewhat unnecessary (ie. the callback data could
be used instead).

And add a comment so $future_me remembers how this works.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
2020-03-10 16:01:39 +00:00
Matt Turner 4413537c80 util: Remove tmp argument from BITSET_FOREACH_SET macro
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3499>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3499>
2020-01-23 01:52:43 +00:00
Kristian H. Kristensen f9d35ea55b ir3: Set up full/half register conflicts correctly
Setting up transitive conflicts between a full register and its two
half registers (eg r0.x and hr0.x and hr0.y) will make the half
registers conflict.  They don't actually conflict and this prevents us
from using both at the same time.

Add and use a new ra helper that sets up transitive conflicts between
a register and its subregisters, except it carefully avoids the
subregister conflict.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2020-01-09 16:03:25 -08:00
Alyssa Rosenzweig b2a3ca6bd5 util/ra: Add a getter for a node class
Complements the existing getters and the setter for node class. To be
used in the Panfrost RA refactor.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-25 06:14:12 -07:00
Jason Ekstrand 5911abd76f util/ra: Assert nodes are in-bounds in add_node_interference
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand e291cd8a7e util/ra: Don't destroy the graph in ra_allocate()
We want to be able to call ra_allocate() and, when it fails, mutate the
graph and try again rather than re-building the graph from scratch.
This commit moves all the scratch bits except the final register
allocation (which is really an out value not scratch) into sub-structs
named "tmp" to make it clear which things are scratch.  It also adds
bits to the ra_select() initialization loop to initialize things (since
we can't trust rzalloc anymore) and copy q_test and forced_reg over.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand 9040215f5d util/ra: Add a helper for resetting a node's interference
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand 698bb9b984 util/ra: Add helpers for adding nodes to an interference graph
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand 41b310e219 util/ra: Improve the performance of ra_simplify
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation.  This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast.  Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.

In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa.  In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355468050 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand e1511f1d4c util/ra: Only update q_total if the reg is not assigned
We only use q_total if the reg is not assigned so there's no point in
updating it if the reg is not assigned.  This has no known perf benefit
but it will reduce churn in a future commit.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand 9d6d1f47e7 util/ra: Only update best_optimistic_node if !progress
This shaves about half a second off the 30 second compile time of one of
the compute shaders in Aztec ruins.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand de56d3a2d1 util/ra: Make in_stack a bitset in the graph
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Jason Ekstrand 7720ad65ae util/ra: Get rid of tabs
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-14 12:30:22 -05:00
Marek Olšák 43d66c8c2d mesa: include mtypes.h less
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h

v2: fix radv build

Reviewed-by: Brian Paul <brianp@vmware.com>
2018-04-12 19:31:30 -04:00
Eric Engestrom 70c6f656f9 util/ra: fix memory leak
CID: 1415909
Fixes: 7a34a0e890 "ra: Add a callback for selecting a register
                             from what's available."
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-31 12:55:19 -07:00
Eric Anholt 7a34a0e890 ra: Add a callback for selecting a register from what's available.
VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to
use low-numbered registers (more parallelism on Intel, fewer delay slots
on vc4), but in order to give instruction scheduling the most freedom to
avoid delays we want to round-robin between registers of the same cost.
Our two heuristics so far have chosen one end or the other of that
tradeoff.

The callback, instead, hands the driver the set of registers that are
available, and the driver gets to make its own choice.  This will be used
in vc4 to round-robin between registers of the same cost, and might be
used in the future for improving bank selection.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-25 14:44:52 -07:00
Eric Anholt 3dae034423 ra: Don't put a node in its own adjacency set.
All the paths looping over adjacency had guards against considering
themselves (the non-obvious one was ra_any_neighbors_conflict(), which has
in_stack set).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-25 14:44:52 -07:00
Eric Anholt 30146f29a7 ra: Pull the body of a loop out to a helper function.
I was going to indent this code another level, and decided it would be
easier to read as a helper.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-07-25 14:44:52 -07:00
Roland Scheidegger 2b40a144b7 util/ra: (trivial) fix c99 loop variable initialization
Fails with old msvc otherwise.
2015-08-19 04:17:49 +02:00
Jason Ekstrand f01bdb0484 util/ra: Make allocating conflict lists optional
Since i965 is now using make_reg_conflicts_transitive and doesn't need
q-value computations, they are disabled on i965.  They are enabled
everywhere else so that they get the old behavior.  This reduces the time
spent in eglInitialize() on BDW by around 10-15%.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-18 17:48:53 -07:00
Jason Ekstrand 9b49284c22 util/ra: Add a function for making all conflicts on a register transitive
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-08-18 17:48:45 -07:00
Jason Ekstrand bdcc8f3230 ra: Delete the conflict lists in ra_set_finalize
They are never used after the set is finalized so there's no reason to keep
them around.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-10 11:58:58 -07:00
Jason Ekstrand 7539ac7fe2 ra: Refactor ra_set_finalize
All this commit does is change an early return to an if with an else
clause.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-08-10 11:58:45 -07:00
Matt Turner b568a5f6a8 util: Avoid double promotion.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-07-29 09:34:52 -07:00
Francisco Jerez f80af89d48 ra: Disable round-robin strategy for optimistically colorable nodes.
The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around.  On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the number of equally-colored nearby nodes, what increases
the likelihood of failure in presence of optimistically colorable
nodes.

This patch disables the round-robin strategy for optimistically
colorable nodes.  These typically arise in situations of high register
pressure or for registers with large live intervals, in both cases the
task of the instruction scheduler shouldn't be constrained excessively
by the dense packing of those nodes, and a spill (or on Intel hardware
a fall-back to SIMD8 mode) is invariably worse than a slightly less
optimal scheduling.

Shader-db results on the i965 driver:

total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
instructions in affected programs:     1121 -> 1071 (-4.46%)
helped:                                1
HURT:                                  0
GAINED:                                49
LOST:                                  5

v2: Re-enable round-robin already for the lowest one of the nodes
    pushed optimistically onto the sack (Connor).
v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor).

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-02-23 20:55:40 +02:00
Eric Anholt b53d035825 util: Move Mesa's bitset.h to util/.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-02-20 11:36:34 -08:00
Jan Vesely bc18b48924 util: Silence signed-unsigned comparison warnings
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-12-17 17:15:36 +00:00
Matt Turner 2e007fd621 ra: Don't use regs as the ralloc context.
The i965 backends pass something out of 'screen', which is allocated
per-process, making using this as a ralloc context not thread-safe.

All callers ra_alloc_interference_graph() already ralloc_free() its
return value.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-01 11:32:54 -08:00
Jason Ekstrand f84adb8481 util: Use reg_belongs_to_class instead of BITSET_TEST
This shouldn't be a functional change since reg_belongs_to_class is just a
wrapper around BITSET_TEST.  It just makes the code a little easier to
read.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-10-24 16:23:08 -07:00
Eric Anholt 517e01b5c3 mesa: Move register_allocate.c to util.
The r300 gallium driver is using it outside of the Mesa tree, and I wanted
to do so for vc4 as well.  Rather than make the multiple-definitions
problem even more complicated, just move it to more-shared code.

v2: Don't forget to delete the symlink in r300 (review by Matt).
    Delete more r300-helper references (review by Emil)
    Don't prefix util/ header inclusion with "util/" (review by Emil)

Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
2014-09-23 13:40:10 -07:00
Renamed from src/mesa/program/register_allocate.c (Browse further)