From 558f6329676f53b7869367ff296a4f8153647031 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pavel=20Ondra=C4=8Dka?= Date: Thu, 3 Mar 2022 12:59:00 +0100 Subject: [PATCH] r300: schedule TEX instructions before OUT instructions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NIR-to-TGSI produces partial output writes contrary to the old paths that always wrote the full outputs. Therefore if there is now a partial output write ready to be scheduled and nothing else besides a tex is ready, we would schedule the output write first. This was not a problem before as usually at last some component of the full output write depended on the tex result. This is not optimal from the performance point of view and resulted in ~20% slowdown in the Unigine demos. The docs say: The first OUTPUT instruction will reserve space in the output register fifo. This space is limited, therefore issuing an OUTPUT earlier than necessary may cause threads to stall earlier than necessary. You should not set an ALU instruction as type OUTPUT unless it is actually writing to an output register, or it is the last instruction of the program. Fix it by explicitly prefering a TEX before OUT and restore the performance: 9.66 -> 12.12 fps (as compared to 11.83 with the old glsl-to-TGSI path) in Unigine Sanctuary. No change in Lightsmark or GLmark. This is also a win from the intructions point of view as we are usually able to schedule the partial output writes in a single pair at the end. total instructions in shared programs: 106009 -> 105891 (-0.11%) instructions in affected programs: 10153 -> 10035 (-1.16%) helped: 118 HURT: 0 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5840 Signed-off-by: Pavel Ondračka Reviewed-by: Emma Anholt Part-of: --- src/gallium/drivers/r300/compiler/radeon_pair_schedule.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r300/compiler/radeon_pair_schedule.c b/src/gallium/drivers/r300/compiler/radeon_pair_schedule.c index 63bfd599185..e30d6bec014 100644 --- a/src/gallium/drivers/r300/compiler/radeon_pair_schedule.c +++ b/src/gallium/drivers/r300/compiler/radeon_pair_schedule.c @@ -1111,7 +1111,8 @@ static void emit_instruction( update_max_score(s, &s->ReadyAlpha, &max_score, &max_inst, &max_list); if (tex_count >= s->max_tex_group || max_score == -1 - || (s->TEXCount > 0 && tex_count == s->TEXCount)) { + || (s->TEXCount > 0 && tex_count == s->TEXCount) + || (tex_count > 0 && max_score < NO_OUTPUT_SCORE)) { emit_all_tex(s, before); } else {