KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Kostiantyn Lazukin	e9cc1633a2	util/ra: Fix numeric overflow during bitset allocation Reviewed-by: Emma Anholt <emma@anholt.net> Signed-off-by: Kostiantyn Lazukin <kostiantyn.lazukin@globallogic.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5752 Fixes: `d4a4cd20d5` ("util/ra: use adjacency matrix for undirected graph") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14224>	2021-12-19 13:10:26 -08:00
Kostiantyn Lazukin	d4a4cd20d5	util/ra: use adjacency matrix for undirected graph Since this graph is actually not oriented, its adjacency matrix can be represented using less than half bits required by full adjacency matrix. It reduces memory consumption and number of cache misses. It also simplifies logic of growing this matrix - no need to touch adjacency bits for previously allocated number of nodes. Move adjacency bits from nodes to graph to reduce the number of allocations. No changes to shader-db. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Kostiantyn Lazukin <kostiantyn.lazukin@globallogic.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14189>	2021-12-14 09:19:01 +00:00
Dave Airlie	d051854cca	treewide: drop mtypes/macros includes from main These aren't required in lots of places, so remove them. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14127>	2021-12-08 22:14:45 +00:00
Caio Oliveira	6eb86efe91	util/ra: Fix deserialization of register sets Set ra_class::regset and ra_class::index when deserializing. Fixes: `95d41a3525` ("ra: Use struct ra_class in the public API.") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13728>	2021-11-10 22:57:57 +00:00
Emma Anholt	d6d7421e98	util/ra: Use the conflicting neighbor to skip unavailable registers. Now that we have an idea of how many regs the conflicting allocation uses, we can just skip to the next one and save repeated tests to find the same conflicting neighbor again. shadowrun-returns shader-db time on skl -1.62821% +/- 1.58079% (n=679), now there's no statistically significant change from the start of the series (n=420) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Eric Anholt	2d7bcdaf6b	ra: Add fast-path support for register classes of contiguous regs. In the fully general case of register classes, to expose an allocation class of unaligned 2-contiguous-regs allocations, for example, you'd have your base individual regs (128 on intel), and another set of 127 regs that each conflicted with the corresponding pair of the base regs. Single-reg nodes would allocate in the 128, and double-reg nodes would allocate in the 127 and the user would remap from the 127 down to the base regs with some irritating table. If you need many different contiguous allocation sizes (16 is a pretty common number across drivers), your number of regs explodes, wasting memory and making the q computation expensive at startup. If all the user has is contiguous-reg classes, we can easily compute the q value up front (as found in the intel driver and nouveau, for example), and we only have to change a couple of places in the conflict-checking logic so the contiguous-reg classes can use the base registers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Eric Anholt	95d41a3525	ra: Use struct ra_class in the public API. All these unsigned ints are awful to keep track of. Use pointers so we get some type checking. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Eric Anholt	7e0127fb3e	ra: Document that class index is allocated in order, use that in r300. etnaviv also relies on this being the case, just drop the remapping. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Eric Anholt	3072318ab8	ra: Add a unit test. This is mostly checking that we agree with a bit of the table from the paper. It proved quite useful as I was refactoring. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Jason Ekstrand	492d664be0	util/ra: Add [de]serialization support Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5019>	2020-05-13 23:36:44 +00:00
Eric Anholt	05e6f763e7	util/ra: Improve ra_set_finalize() performance. BITSET_FOR_EACH_SET can walk a sparse set (such as a register class's set of registers) much faster than just iterating over individual bits. Improves freedreno startup time (as measured by shader-db ./run shaders/closed/gputest/triangle on my x86 system) by -4.12679% +/- 1.99006% (n=151) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>	2020-04-29 19:46:08 +00:00
Eric Anholt	53ac2dabec	util/ra: Use util_dynarray for handling the conflict lists. Again, shortens the code significantly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>	2020-04-29 19:46:08 +00:00
Eric Anholt	57088854e6	util/ra: Use util_dynarray for the adjacency list. This make the code significantly more readable, I think (along with shorter). Also, using util_dynarray_delete_unordered() saves us a move of the rest of the list when removing adjacency on a node. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>	2020-04-29 19:46:08 +00:00
Eric Anholt	a1de267a21	util/ra: Sanity check that we're adding a valid reg to a class. BITSET_SET might not segfault on you right away if you're just slightly off, and an assert is nicer anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>	2020-04-29 19:46:08 +00:00
Eric Anholt	5bcaf30aba	util/ra: Sanity check that the driver selected a valid reg. freedreno was returning -1 when it didn't pick a reg from the given bitset due to an off-by-a-small-number error. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>	2020-04-29 19:46:08 +00:00
Dylan Baker	8e3696137f	remove final imports.h and imports.c bits This moves the fi_types to a new mesa_private.h and removes the imports.c file. The vast majority of this patch is just removing pound includes of imports.h and fixing up the recursive includes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>	2020-04-21 11:09:04 -07:00
Marek Olšák	76f79db3f5	util: stop including files from mesa/main Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4324>	2020-03-27 21:00:09 +00:00
Marek Olšák	e5339fe4a4	Move compiler.h and imports.h/c from src/mesa/main into src/util Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4324>	2020-03-27 21:00:09 +00:00
Rob Clark	dd2e050a84	util/ra: move NO_REG to header In the select_reg callback, I want to be able to determine if a given node is already assigned, and if so what physical register has been assigned. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>	2020-03-10 16:01:39 +00:00
Rob Clark	36aed70b59	util/ra: spiff out select_reg_callback Add a parameter so the callback can know which node it is selecting a register for. And remove the graph parameter, as it is unused by existing users, and somewhat unnecessary (ie. the callback data could be used instead). And add a comment so $future_me remembers how this works. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>	2020-03-10 16:01:39 +00:00
Matt Turner	4413537c80	util: Remove tmp argument from BITSET_FOREACH_SET macro Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3499> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3499>	2020-01-23 01:52:43 +00:00
Kristian H. Kristensen	f9d35ea55b	ir3: Set up full/half register conflicts correctly Setting up transitive conflicts between a full register and its two half registers (eg r0.x and hr0.x and hr0.y) will make the half registers conflict. They don't actually conflict and this prevents us from using both at the same time. Add and use a new ra helper that sets up transitive conflicts between a register and its subregisters, except it carefully avoids the subregister conflict. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@chromium.org>	2020-01-09 16:03:25 -08:00
Alyssa Rosenzweig	b2a3ca6bd5	util/ra: Add a getter for a node class Complements the existing getters and the setter for node class. To be used in the Panfrost RA refactor. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-25 06:14:12 -07:00
Jason Ekstrand	5911abd76f	util/ra: Assert nodes are in-bounds in add_node_interference Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	e291cd8a7e	util/ra: Don't destroy the graph in ra_allocate() We want to be able to call ra_allocate() and, when it fails, mutate the graph and try again rather than re-building the graph from scratch. This commit moves all the scratch bits except the final register allocation (which is really an out value not scratch) into sub-structs named "tmp" to make it clear which things are scratch. It also adds bits to the ra_select() initialization loop to initialize things (since we can't trust rzalloc anymore) and copy q_test and forced_reg over. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	9040215f5d	util/ra: Add a helper for resetting a node's interference Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	698bb9b984	util/ra: Add helpers for adding nodes to an interference graph Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	41b310e219	util/ra: Improve the performance of ra_simplify The most expensive part of register allocation is the ra_simplify step which is a fixed-point algorithm with a worst-case complexity of O(n^2) which adds the registers to a stack which we then use later to do the actual allocation. This commit uses bit sets and changes the core loop of ra_simplify to first walk 32-node chunks and then walk each chunk. This lets us skip whole 32-node chunks in one go based on bit operations and compute the minimum q value potentially 32x as fast. Of course, the algorithm still has the same fundamental O(n^2) run-time but the constant is now much lower. In the nasty Aztec Ruins compute shader, this shaves a full four seconds off the 30s compile time for a release build of mesa. In a debug build (needed for accurate stack traces), perf says that ra_select takes 20% of runtime before this patch and only 5-6% of runtime after this patch. It also makes shader-db runs faster. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	e1511f1d4c	util/ra: Only update q_total if the reg is not assigned We only use q_total if the reg is not assigned so there's no point in updating it if the reg is not assigned. This has no known perf benefit but it will reduce churn in a future commit. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	9d6d1f47e7	util/ra: Only update best_optimistic_node if !progress This shaves about half a second off the 30 second compile time of one of the compute shaders in Aztec ruins. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	de56d3a2d1	util/ra: Make in_stack a bitset in the graph Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	7720ad65ae	util/ra: Get rid of tabs Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Marek Olšák	43d66c8c2d	mesa: include mtypes.h less - remove mtypes.h from most header files - add main/menums.h for often used definitions - remove main/core.h v2: fix radv build Reviewed-by: Brian Paul <brianp@vmware.com>	2018-04-12 19:31:30 -04:00
Eric Engestrom	70c6f656f9	util/ra: fix memory leak CID: 1415909 Fixes: `7a34a0e890` "ra: Add a callback for selecting a register from what's available." Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-31 12:55:19 -07:00
Eric Anholt	7a34a0e890	ra: Add a callback for selecting a register from what's available. VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to use low-numbered registers (more parallelism on Intel, fewer delay slots on vc4), but in order to give instruction scheduling the most freedom to avoid delays we want to round-robin between registers of the same cost. Our two heuristics so far have chosen one end or the other of that tradeoff. The callback, instead, hands the driver the set of registers that are available, and the driver gets to make its own choice. This will be used in vc4 to round-robin between registers of the same cost, and might be used in the future for improving bank selection. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-25 14:44:52 -07:00
Eric Anholt	3dae034423	ra: Don't put a node in its own adjacency set. All the paths looping over adjacency had guards against considering themselves (the non-obvious one was ra_any_neighbors_conflict(), which has in_stack set). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-25 14:44:52 -07:00
Eric Anholt	30146f29a7	ra: Pull the body of a loop out to a helper function. I was going to indent this code another level, and decided it would be easier to read as a helper. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-25 14:44:52 -07:00
Roland Scheidegger	2b40a144b7	util/ra: (trivial) fix c99 loop variable initialization Fails with old msvc otherwise.	2015-08-19 04:17:49 +02:00
Jason Ekstrand	f01bdb0484	util/ra: Make allocating conflict lists optional Since i965 is now using make_reg_conflicts_transitive and doesn't need q-value computations, they are disabled on i965. They are enabled everywhere else so that they get the old behavior. This reduces the time spent in eglInitialize() on BDW by around 10-15%. Reviewed-by: Eric Anholt <eric@anholt.net>	2015-08-18 17:48:53 -07:00
Jason Ekstrand	9b49284c22	util/ra: Add a function for making all conflicts on a register transitive Reviewed-by: Eric Anholt <eric@anholt.net>	2015-08-18 17:48:45 -07:00
Jason Ekstrand	bdcc8f3230	ra: Delete the conflict lists in ra_set_finalize They are never used after the set is finalized so there's no reason to keep them around. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-08-10 11:58:58 -07:00
Jason Ekstrand	7539ac7fe2	ra: Refactor ra_set_finalize All this commit does is change an early return to an if with an else clause. Reviewed-by: Matt Turner <mattst88@gmail.com>	2015-08-10 11:58:45 -07:00
Matt Turner	b568a5f6a8	util: Avoid double promotion. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2015-07-29 09:34:52 -07:00
Francisco Jerez	f80af89d48	ra: Disable round-robin strategy for optimistically colorable nodes. The round-robin allocation strategy is expected to decrease the amount of false dependencies created by the register allocator and give the post-RA scheduling pass more freedom to move instructions around. On the other hand it has the disadvantage of increasing fragmentation and decreasing the number of equally-colored nearby nodes, what increases the likelihood of failure in presence of optimistically colorable nodes. This patch disables the round-robin strategy for optimistically colorable nodes. These typically arise in situations of high register pressure or for registers with large live intervals, in both cases the task of the instruction scheduler shouldn't be constrained excessively by the dense packing of those nodes, and a spill (or on Intel hardware a fall-back to SIMD8 mode) is invariably worse than a slightly less optimal scheduling. Shader-db results on the i965 driver: total instructions in shared programs: 5488539 -> 5488489 (-0.00%) instructions in affected programs: 1121 -> 1071 (-4.46%) helped: 1 HURT: 0 GAINED: 49 LOST: 5 v2: Re-enable round-robin already for the lowest one of the nodes pushed optimistically onto the sack (Connor). v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor). Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2015-02-23 20:55:40 +02:00
Eric Anholt	b53d035825	util: Move Mesa's bitset.h to util/. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-02-20 11:36:34 -08:00
Jan Vesely	bc18b48924	util: Silence signed-unsigned comparison warnings Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-12-17 17:15:36 +00:00
Matt Turner	2e007fd621	ra: Don't use regs as the ralloc context. The i965 backends pass something out of 'screen', which is allocated per-process, making using this as a ralloc context not thread-safe. All callers ra_alloc_interference_graph() already ralloc_free() its return value. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2014-12-01 11:32:54 -08:00
Jason Ekstrand	f84adb8481	util: Use reg_belongs_to_class instead of BITSET_TEST This shouldn't be a functional change since reg_belongs_to_class is just a wrapper around BITSET_TEST. It just makes the code a little easier to read. Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-10-24 16:23:08 -07:00
Eric Anholt	517e01b5c3	mesa: Move register_allocate.c to util. The r300 gallium driver is using it outside of the Mesa tree, and I wanted to do so for vc4 as well. Rather than make the multiple-definitions problem even more complicated, just move it to more-shared code. v2: Don't forget to delete the symlink in r300 (review by Matt). Delete more r300-helper references (review by Emil) Don't prefix util/ header inclusion with "util/" (review by Emil) Reviewed-by: Matt Turner <mattst88@gmail.com> (v1) Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)	2014-09-23 13:40:10 -07:00

49 Commits