r300: schedule TEX instructions before OUT instructions

NIR-to-TGSI produces partial output writes contrary to the old paths
that always wrote the full outputs. Therefore if there is now a partial
output write ready to be scheduled and nothing else besides a tex
is ready, we would schedule the output write first. This was not a
problem before as usually at last some component of the full output write
depended on the tex result.

This is not optimal from the performance point of view and resulted in
~20% slowdown in the Unigine demos. The docs say:

The first OUTPUT instruction will reserve space in the output register
fifo. This space is limited, therefore issuing an OUTPUT earlier than
necessary may cause threads to stall earlier than necessary. You
should not set an ALU instruction as type OUTPUT unless it is actually
writing to an output register, or it is the last instruction of
the program.

Fix it by explicitly prefering a TEX before OUT and restore the
performance: 9.66 -> 12.12 fps (as compared to 11.83 with the old
glsl-to-TGSI path) in Unigine Sanctuary. No change in Lightsmark or
GLmark.

This is also a win from the intructions point of view as we are usually
able to schedule the partial output writes in a single pair at the end.

total instructions in shared programs: 106009 -> 105891 (-0.11%)
instructions in affected programs: 10153 -> 10035 (-1.16%)
helped: 118
HURT: 0

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5840
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15165>
This commit is contained in:
Pavel Ondračka 2022-03-03 12:59:00 +01:00 committed by Marge Bot
parent aff1a85c09
commit 558f632967
1 changed files with 2 additions and 1 deletions

View File

@ -1111,7 +1111,8 @@ static void emit_instruction(
update_max_score(s, &s->ReadyAlpha, &max_score, &max_inst, &max_list);
if (tex_count >= s->max_tex_group || max_score == -1
|| (s->TEXCount > 0 && tex_count == s->TEXCount)) {
|| (s->TEXCount > 0 && tex_count == s->TEXCount)
|| (tex_count > 0 && max_score < NO_OUTPUT_SCORE)) {
emit_all_tex(s, before);
} else {