pan/mdg: Limit work registers for large workgroups

When more than 8 registers are used, Midgard can only fit 64 threads in a
thread group. For barriers to work properly, a threadgroup must fit an entire
work group. The GL driver configures the hardware to have threadgroups the size
of work groups. That means if more than 64 threads are used in a workgroup, and
more than 8 registers are used, the hardware will fault spawning threads.

To workaround this hardware limitation, we need to limit the number of work
registers used depending on the size of the workgroup. Typically, the work group
size is known at compile-time so that determination can usually be made without
variants. To avoid variants, we make a pessimistic estimate in the case when
it's not known at compile-time.

shader-db shows 6 shaders affected. I expect that all of these would fault with
DATA_INVALID_FAULT if they tried to execute before this patch, due to the
oversize local size, and faulting is even slower than spilling ;-)

Fixes dEQP-GLES31.functional.synchronization.* on Mali-T860.

instructions HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 121 -> 157 (29.75%)
instructions HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 121 -> 157 (29.75%)
instructions HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 141 -> 184 (30.50%)
instructions HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 141 -> 184 (30.50%)
instructions HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 513 -> 933 (81.87%)
instructions HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 505 -> 1002 (98.42%)

bundles HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 73 -> 116 (58.90%)
bundles HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 73 -> 116 (58.90%)
bundles HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 61 -> 97 (59.02%)
bundles HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 61 -> 97 (59.02%)
bundles HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 281 -> 701 (149.47%)
bundles HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 278 -> 775 (178.78%)

registers helped:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 11 -> 8 (-27.27%)
registers helped:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 11 -> 8 (-27.27%)
registers helped:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 14 -> 8 (-42.86%)
registers helped:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 14 -> 8 (-42.86%)
registers helped:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 16 -> 8 (-50.00%)
registers helped:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 16 -> 8 (-50.00%)

threads helped:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)
threads helped:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 1 -> 2 (100.00%)

spills HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 0 -> 5
spills HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 0 -> 5
spills HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 0 -> 8
spills HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 0 -> 8
spills HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 0 -> 112
spills HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 0 -> 146

fills HURT:   shaders/android/gfxbench/carchase/6.shader_test MESA_SHADER_COMPUTE: 0 -> 26
fills HURT:   shaders/android/gfxbench/carchase/386.shader_test MESA_SHADER_COMPUTE: 0 -> 26
fills HURT:   shaders/android/gfxbench/carchase/374.shader_test MESA_SHADER_COMPUTE: 0 -> 33
fills HURT:   shaders/android/gfxbench/carchase/4-1.shader_test MESA_SHADER_COMPUTE: 0 -> 33
fills HURT:   shaders/android/com.miHoYo.GenshinImpact/18.shader_test MESA_SHADER_COMPUTE: 0 -> 209
fills HURT:   shaders/android/com.miHoYo.GenshinImpact/16.shader_test MESA_SHADER_COMPUTE: 0 -> 234

total instructions in shared programs: 1521691 -> 1522766 (0.07%)
instructions in affected programs: 1542 -> 2617 (69.71%)
helped: 0
HURT: 6
HURT stats (abs)   min: 36.0 max: 497.0 x̄: 179.17 x̃: 43
HURT stats (rel)   min: 29.75% max: 98.42% x̄: 50.13% x̃: 30.50%
95% mean confidence interval for instructions value: -49.36 407.69
95% mean confidence interval for instructions %-change: 17.14% 83.12%
Inconclusive result (value mean confidence interval includes 0).

total bundles in shared programs: 649296 -> 650371 (0.17%)
bundles in affected programs: 827 -> 1902 (129.99%)
helped: 0
HURT: 6
HURT stats (abs)   min: 36.0 max: 497.0 x̄: 179.17 x̃: 43
HURT stats (rel)   min: 58.90% max: 178.78% x̄: 94.01% x̃: 59.02%
95% mean confidence interval for bundles value: -49.36 407.69
95% mean confidence interval for bundles %-change: 36.20% 151.83%
Inconclusive result (value mean confidence interval includes 0).

total registers in shared programs: 90681 -> 90647 (-0.04%)
registers in affected programs: 82 -> 48 (-41.46%)
helped: 6
HURT: 0
helped stats (abs) min: 3.0 max: 8.0 x̄: 5.67 x̃: 6
helped stats (rel) min: 27.27% max: 50.00% x̄: 40.04% x̃: 42.86%
95% mean confidence interval for registers value: -8.03 -3.30
95% mean confidence interval for registers %-change: -50.95% -29.13%
Registers are helped.

total threads in shared programs: 55717 -> 55723 (0.01%)
threads in affected programs: 6 -> 12 (100.00%)
helped: 6
HURT: 0
helped stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.00 1.00
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are helped.

total spills in shared programs: 1108 -> 1392 (25.63%)
spills in affected programs: 0 -> 284
helped: 0
HURT: 6

total fills in shared programs: 4721 -> 5282 (11.88%)
fills in affected programs: 0 -> 561
helped: 0
HURT: 6

Cc: mesa-stable
Closes: #7228
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19092>
This commit is contained in:
Alyssa Rosenzweig 2022-10-15 19:01:47 -04:00 committed by Marge Bot
parent 9b19104a30
commit 2c446b6636
2 changed files with 64 additions and 13 deletions

View File

@ -45,14 +45,6 @@ dEQP-GLES31.functional.separate_shader.random.68,Fail
dEQP-GLES31.functional.separate_shader.random.79,Fail
dEQP-GLES31.functional.separate_shader.random.80,Fail
dEQP-GLES31.functional.separate_shader.random.89,Fail
dEQP-GLES31.functional.synchronization.in_invocation.image_alias_overwrite,Crash
dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_alias_overwrite,Crash
dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_alias_write,Crash
dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_overwrite,Crash
dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_write_read,Crash
dEQP-GLES31.functional.synchronization.in_invocation.image_overwrite,Crash
dEQP-GLES31.functional.synchronization.inter_invocation.image_alias_overwrite,Crash
dEQP-GLES31.functional.synchronization.inter_invocation.image_atomic_alias_overwrite,Crash
dEQP-GLES31.functional.texture.gather.basic.cube.depth32f.no_corners.size_pot.compare_greater.clamp_to_edge_repeat,Fail
dEQP-GLES31.functional.texture.gather.basic.cube.depth32f.no_corners.size_pot.compare_greater.mirrored_repeat_clamp_to_edge,Fail
dEQP-GLES31.functional.texture.gather.basic.cube.depth32f.no_corners.size_pot.compare_greater.repeat_mirrored_repeat,Fail

View File

@ -442,17 +442,76 @@ mir_is_64(midgard_instruction *ins)
return false;
}
/*
* Determine if a shader needs a contiguous workgroup. This impacts register
* allocation. TODO: Optimize if barriers and local memory are unused.
*/
static bool
needs_contiguous_workgroup(compiler_context *ctx)
{
return gl_shader_stage_uses_workgroup(ctx->stage);
}
/*
* Determine an upper-bound on the number of threads in a workgroup. The GL
* driver reports 128 for the maximum number of threads (the minimum-maximum in
* OpenGL ES 3.1), so we pessimistically assume 128 threads for variable
* workgroups.
*/
static unsigned
max_threads_per_workgroup(compiler_context *ctx)
{
if (ctx->nir->info.workgroup_size_variable) {
return 128;
} else {
return ctx->nir->info.workgroup_size[0] *
ctx->nir->info.workgroup_size[1] *
ctx->nir->info.workgroup_size[2];
}
}
/*
* Calculate the maximum number of work registers available to the shader.
* Architecturally, Midgard shaders may address up to 16 work registers, but
* various features impose other limits:
*
* 1. Blend shaders are limited to 8 registers by ABI.
* 2. If there are more than 8 register-mapped uniforms, then additional
* register-mapped uniforms use space that otherwise would be used for work
* registers.
* 3. If more than 4 registers are used, at most 128 threads may be spawned. If
* more than 8 registers are used, at most 64 threads may be spawned. These
* limits are architecturally visible in compute kernels that require an
* entire workgroup to be spawned at once (for barriers or local memory to
* work properly).
*/
static unsigned
max_work_registers(compiler_context *ctx)
{
if (ctx->inputs->is_blend)
return 8;
unsigned rmu_vec4 = ctx->info->push.count / 4;
unsigned max_work_registers = (rmu_vec4 >= 8) ? (24 - rmu_vec4) : 16;
if (needs_contiguous_workgroup(ctx)) {
unsigned threads = max_threads_per_workgroup(ctx);
assert(threads <= 128 && "maximum threads in ABI exceeded");
if (threads > 64)
max_work_registers = MIN2(max_work_registers, 8);
}
return max_work_registers;
}
/* This routine performs the actual register allocation. It should be succeeded
* by install_registers */
static struct lcra_state *
allocate_registers(compiler_context *ctx, bool *spilled)
{
/* The number of vec4 work registers available depends on the number of
* register-mapped uniforms and the shader stage. By ABI we limit blend
* shaders to 8 registers, should be lower XXX */
int rmu = ctx->info->push.count / 4;
int work_count = ctx->inputs->is_blend ? 8 : 16 - MAX2(rmu - 8, 0);
int work_count = max_work_registers(ctx);
/* No register allocation to do with no SSA */