llvmpipe/cs: change submission pattern for threadpool

Recent ncnn benchmarks showed a slowdown, and this change seemed more likely. The batching into threads for the main workloads is fine, however the remainder stuff doesn't get spread out and can bottleneck in one thread. Switch to a model where the initial work is batched, but the remainder is iterated over one by one. Brings ncnn benchmarks back in line with previously. Fixes: 69109e0b19 ("llvmpipe/cs: rework thread pool for avoid mtx locking") Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13210>
2021-10-06 10:31:26 +10:00 · 2021-10-06 10:31:26 +10:00 · 9392bd89e9
parent 3924df9fe7
commit 9392bd89e9
1 changed files with 4 additions and 2 deletions
--- a/src/gallium/drivers/llvmpipe/lp_cs_tpool.c
+++ b/src/gallium/drivers/llvmpipe/lp_cs_tpool.c
@ -59,8 +59,10 @@ lp_cs_tpool_worker(void *data)
      iter_per_thread = task->iter_per_thread;

      if (task->iter_remainder &&
-          task->iter_start + task->iter_remainder == task->iter_total)
-         iter_per_thread = task->iter_remainder;
+          task->iter_start + task->iter_remainder == task->iter_total) {
+         task->iter_remainder--;
+         iter_per_thread = 1;
+      }

      task->iter_start += iter_per_thread;