KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Engestrom	2c67457e5e	util/list: rename LIST_ENTRY() to list_entry() This follows the Linux kernel convention, and avoids collision with macOS header macro. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6751 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6840 Cc: mesa-stable Signed-off-by: Eric Engestrom <eric@igalia.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Reviewed-by: Yonggang Luo <luoyonggang@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17772>	2022-07-28 10:10:44 +00:00
Connor Abbott	77a852c1ba	lima/gpir: Rewrite register allocation for value registers The usual linear-scan register allocation algorithm can't handle preallocated registers, since we might be forced to choose a color for a non-preallocated variable that overlaps with a pre-allocated variable. But in such cases we can simply split the live range of the offending variable when we reach the beginning of the pre-allocated variable's live range. This is still optimal in the sense that it always finds a coloring whenever one is possible, but we may not insert the smallest possible number of moves. However, since it's actually the scheduler which splits live ranges afterwards, we can simply fold in the move while keeping its fake dependencies, and then everything still works! In other words, inserting a live range split for a value register during register allocation is pretty much free. This means that we can split register allocation in two. First globally allocate the cross-block registers accessed through load_reg and store_reg instructions, which is still done via graph coloring, and then run a linear scan algorithm over each block, treating the load_reg and store_reg nodes as referring to pre-allocated registers. This makes the existing RA more complicated, but it has two benefits: first, using round-robin with the linear scan allocator results in much fewer fake dependencies, resulting in around 15 less instructions in the glmark2 jellyfish shader and fixing a regression in instruction count since branching support went in. Second, it will simplify handling spilling. With just graph coloring for everything, every time we spill a node, we have to create new value registers which become new nodes in the graph and re-run RA. This is worsened by the fact that when writing a value to a temporary, we need to have an extra register available to load the write address with a load_const node. With the new scheme, we can ignore this entirely in the first part and then in the second part we can just reserve an extra register in sections where we know we have to spill. So no re-running RA many times, and we can get a good result quickly. The current implementation does linear scan backwards, so that we can insert the fake dependencies while allocating and avoid creating any move nodes at all when we have to split a live range. However, it turns out that this makes handling schedule_first nodes a bit more complicated, so it's not clear if that was worth it. Note: The commit was originally authored by Connor Abbott <cwabbott@gmail.com> and was cherry-picked from <mesa/mesa!2315>. Rebasing was necessary due to changes to BITSET_FOREACH_SET, see `4413537c` Because some deqp tests pass now, deqp-lima-fails.txt was also changed. The above changes are Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7682>	2021-09-01 08:30:57 +00:00
Vinson Lee	70652885e3	lima: Fix typos. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8658>	2021-01-31 11:27:40 -08:00
Eric Anholt	f6456d74ed	lima: Fix unused var/function warnings in release build from assertions. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6462>	2020-08-28 22:45:08 +00:00
Timothy Arceri	7f106a2b5d	util: rename list_empty() to list_is_empty() This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-10-28 11:24:38 +00:00
Connor Abbott	fed5b605f0	lima/gpir: Fix 64-bit shift in scheduler spilling There are 64 physical registers so the shift must be 64 bits. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-24 08:44:54 +02:00
Connor Abbott	96c31d9a55	lima/gpir: Fix postlog2 fixup handling We guarantee that a complex1 op is always used by postlog2 directly by rewriting the postlog2 op to be a move when there would be a move inserted between them. But we weren't doing this in all circumstances where there might be a move. Move the logic to place_move() so that it always happens. Fixes a few log tests that happened to start failing due to changes in the register allocator leading to a different scheduling order. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-24 08:43:06 +02:00
Connor Abbott	1cd1cce035	lima/gpir: Use registers for values live in multiple blocks This commit adds the framework for cross-basic-block register allocation. Like ARM's compiler, we assume that the value registers aren't usable across branches, which means we have to use physical registers to store any value that crosses a basic block. There are three parts to this: 1. When translating from NIR, we rely on the NIR out-of-ssa pass to coalesce values into registers. We insert store_reg instructions for values used in more than one basic block, and load_reg instructions for values not defined in the same basic block (or defined after their use, for loops). So by the time we've translated out of NIR we've already split things into values (which are only used in the same basic block) and registers (which are only used in different basic blocks than where they're defined). 2. We allocate the registers at the same time that we allocate the values, before the final scheduler. Unlike the values, where the assigned color is fake, we assign the actual physical index & component to physregs at this stage. load_reg and store_reg are treated as moves in the allocator and when creating write-after-read dependencies. 3. Finally, in the main scheduler we have to avoid overwriting existing live physregs when spilling. First, we have to tell the scheduler which physical registers are live at the end of each block, to avoid overwriting those. If a register is only live at the beginning, we can reuse it for spilling after the last original use in the final program happens, i.e. before any original use is scheduled, but we have to be careful to add the proper dependencies so that the spill write is scheduled before the original reads. To handle this we repurpose reg_link for uses to be used by the scheduler. A few register-related things copied over from NIR or from other drivers can be dropped. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-24 08:37:37 +02:00
Connor Abbott	2df2e081fd	lima/gpir: Only try to place actual children When picking a node to be scheduled, we try to schedule its children as well. But we shouldn't try to schedule nodes which only have a fake dependency on the original node, since this isn't the point of scheduling children at the same time and can break some expectations of the rest of the code. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-24 08:35:26 +02:00
Connor Abbott	c64f30546d	lima/gpir: Disallow moves for schedule_first nodes The entire point of schedule_first is that the node has to be scheduled as soon as possible without any moves because it doesn't produce a proper floating-point value, or its value changes depending on where you read it. We were still introducing a move for preexp2 in some cases though, even if it got scheduled as soon as possible, which broke some exp() tests. Fix that. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-09 17:42:19 +07:00
Connor Abbott	8c7ad22adb	lima/gpir: Fix fake dep handling for schedule_first nodes The whole point of schedule_first nodes is that they need to be scheduled as soon as possible, so if a schedule_first node is the successor in a fake dependency that prevents it from being scheduled after its parent, that can cause problems. We need to add these fake dependencies to the parent as well, and we need to guarantee that the pre-RA scheduler puts schedule_first nodes right before their parents in order to prevent this from adding cycles to the dependency graph. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-09 17:42:00 +07:00
Connor Abbott	2955875381	lima/gpir: Fix schedule_first insertion logic The idea was to make sure schedule_first nodes were always first in the ready list. I made sure they were inserted first, but not that other nodes wouldn't later be scheduled ahead of them. Fixes spec@glsl-1.10@execution@built-in-functions@vs-exp-float and probably others. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-09 17:41:35 +07:00
Connor Abbott	63acdb5ce6	lima/gpir: Ignore unscheduled successors in can_use_complex() The point of the function is to avoid creating a complex move which is used by certain slots in the next instruction, but unscheduled successors will never be in the next instruction. Found while debugging a crash that the previous commit fixed. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-09 17:40:58 +07:00
Eric Engestrom	abc226cf41	tree-wide: replace MAYBE_UNUSED with ASSERTED Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Connor Abbott	11a49f289d	lima/gp: Support exp2 and log2 log2 is tricky because there cannot be a move between complex1 and postlog2. We can't guarantee that scheduling complex1 will succeed when we schedule postlog2, so we try to schedule complex1 and if it fails we back out by rewriting the postlog2 as a move and introducing a new postlog2 so that we can try again later. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-30 23:01:15 +02:00
Connor Abbott	c2f48d8f32	lima/gpir: Always schedule complex2 and _impl right after complex1 See https://gitlab.freedesktop.org/lima/mesa/issues/94 for the gory details of why this is needed. For _impl this is easy, since it never increases register pressure and it goes in the complex slot hence it never counts against max nodes. It's a bit more challenging for complex2, since it does count against max nodes, so we need to change the reservation logic to reserve an extra slot for complex2 when scheduling complex1. This second part isn't strictly necessary yet, but it will be for exp2. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-30 23:00:41 +02:00
Connor Abbott	6fc7384fd4	lima/gpir/sched: Handle more special ops in can_use_complex() We were missing handling for a few other ops that rearrange their sources somehow in codegen, namely complex2 and select. This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3 and possibly other random regressions from the new scheduler which were supposed to be fixed in the commit right after. Fixes: `54434fe670` ("lima/gpir: Rework the scheduler") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-28 23:38:31 +02:00
Connor Abbott	d26d8c5617	lima/gpir/sched: Don't try to spill when something else has succeeded In try_node(), we assume that the node we pick can still be scheduled successfully after speculatively trying all the other nodes. Normally we always undo every node after speculating it, so that when we finally schedule best_node the scheduler state is exactly the same and it succeeds. However, we also try to spill nodes, which can change the state and in a corner case that can make scheduling best_node fail. In particular, the following sequence of events happened with piglit shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a register store node S, which is a use of N, was created and then later the other uses of N were scheduled, so that S is now ready and N is partially ready. First we try to schedule S and succeed, then we try to schedule another node M, which fails, so we try to spill the remaining uses of N. This succeeds, but scheduling M still fails so that best_node is still S. However since one of the uses of N is one cycle ago, and therefore we inserted a read dependent on S one cycle ago when spilling N, S can no longer be scheduled as read-after-write latency is three cycles. While we could ad-hoc try to catch cases like this, or (the best option but very complicated) treat the spill as speculative and roll it back if we decide not to schedule the node, a simpler solution is to just give up on spilling if we've already successfully speculatively scheduled another node. We'd give up a few cases where we discover that by spilling even harder we could schedule a more desirable node, but that seems like it would be pretty rare in practice. With this we guarantee that nothing has been touched after best_node was successfully scheduled. We also cut down on pointless spilling, since if we already scheduled a node it's unlikely that spilling harder will let us schedule an even better node, and hence any spilling at this point is probably useless. While we're here, clean up the code around spilling by flattening the two if's and getting rid of the second unnecessary check for INT_MIN. Fixes: `54434fe670` ("lima/gpir: Rework the scheduler") Acked-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-28 23:38:31 +02:00
Connor Abbott	b178fdf486	lima/gp: Fix problem with complex moves When writing the scheduler, we forgot that you can't read the complex unit in certain sources because it gets overwritten to 0 or 1. Fixing this turned out to be possible without giving up and reducing GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't expect. There can be at most 4 next-max nodes that can't have moves scheduled in the complex slot, so it actually isn't a problem for getting the number of next-max nodes at 5 or lower. However, it is a problem for stores. If a given node is a next-max node whose move cannot go in the complex slot and is used by a store that we decide to schedule, we have to reserve one of the non-complex slots for a move instead of all the slots, or we can wind up in a situation where only the complex slot is free and we fail the move. This means that we have to add another term to the reservation logic, for stores whose children cannot be in the complex slot. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	54434fe670	lima/gpir: Rework the scheduler Now, we do scheduling at the same time as value register allocation. The ready list now acts similarly to the array of registers in value_regalloc, keeping us from running out of slots. Before this, the value register allocator wasn't aware of the scheduling constraints of the actual machine, which meant that it sometimes chose the wrong false dependencies to insert. Now, we assign value registers at the same time as we actually schedule instructions, making its choices reflect reality much better. It was also conservative in some cases where the new scheme doesn't have to be. For example, in something like: 1 = ld_att 2 = ld_uni 3 = add 1, 2 It's possible that one of 1 and 2 can't be scheduled in the same instruction as 3, meaning that a move needs to be inserted, so the value register allocator needs to assume that this sequence requires two registers. But when actually scheduling, we could discover that 1, 2, and 3 can all be scheduled together, so that they only require one register. The new scheduler speculatively inserts the instruction under consideration, as well as all of its child load instructions, and then counts the number of live value registers after all is said and done. This lets us be more aggressive with scheduling when we're close to the limit. With the new scheduler, the kmscube vertex shader is now scheduled in 40 instructions, versus 66 before. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Qiang Yu	8d91cd64aa	lima/gpir: fix compile fail when two slot node Come from glmark2-es2 jellyfish test. Fixes: `92d7ca4b1c` "gallium: add lima driver" Signed-off-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-04-14 12:10:23 +08:00
Qiang Yu	92d7ca4b1c	gallium: add lima driver v2: - use renamed util_dynarray_grow_cap - use DEBUG_GET_ONCE_FLAGS_OPTION for debug flags - remove DRM_FORMAT_MOD_ARM_AGTB_MODE0 usage - compute min/max index in driver v3: - fix plbu framebuffer state calculation - fix color_16pc assemble - use nir_lower_all_source_mods for lowering neg/abs/sat - use float arrary for static GPU data - add disassemble comment for static shader code - use drm_find_modifier v4: - use lima_nir_lower_uniform_to_scalar v5: - remove nir_opt_global_to_local when rebase Cc: Rob Clark <robdclark@gmail.com> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Koen Kooi <koen@dominion.thruhere.net> Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: marmeladema <xademax@gmail.com> Signed-off-by: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Rohan Garg <rohan@garg.io> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 09:57:53 +08:00

22 Commits