ci/turnip: Increase the hangcheck timer to 2 seconds.

We get a lot of useful coverage from running graphicsfuzz with spilling
enabled, but it's also pretty slow and can cause intermittent hangcheck
failures.  I thought I'd categorized them when merging !14839 (device loss
on reset), but it looks like not all of them and we're now more likely to
have flakes take out the whole test run when a single flake makes the rest
of the caselist a flake.

This is a little unfortunate in that it means our test environment is not
the same as a stock system you would want to run deqp on to submit
conformance, but I think it's an improvement in the test maintenance work
vs needing to fix things up later.

We have some other tests besides turnip that can trigger hangchecks which
we might also like this increase for (some disabled traces, for example).
However, freedreno GL has a 5-second timeout waiting for idle when
mapping, and a couple of 2-second timeouts in a row can result in spurious
failures in other tests!

Fixes: #6163
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15435>
This commit is contained in:
Emma Anholt 2022-03-16 15:50:00 -07:00 committed by Marge Bot
parent 0cbe4dd4c4
commit f831ba238f
4 changed files with 14 additions and 0 deletions

View File

@ -49,6 +49,7 @@ for var in \
FDO_UPSTREAM_REPO \ FDO_UPSTREAM_REPO \
FD_MESA_DEBUG \ FD_MESA_DEBUG \
FLAKES_CHANNEL \ FLAKES_CHANNEL \
FREEDRENO_HANGCHECK_MS \
GALLIUM_DRIVER \ GALLIUM_DRIVER \
GALLIVM_PERF \ GALLIVM_PERF \
GPU_VERSION \ GPU_VERSION \

View File

@ -9,6 +9,7 @@ cd /
mount -t proc none /proc mount -t proc none /proc
mount -t sysfs none /sys mount -t sysfs none /sys
mount -t debugfs none /sys/kernel/debug
mount -t devtmpfs none /dev || echo possibly already mounted mount -t devtmpfs none /dev || echo possibly already mounted
mkdir -p /dev/pts mkdir -p /dev/pts
mount -t devpts devpts /dev/pts mount -t devpts devpts /dev/pts

View File

@ -38,6 +38,12 @@ if [ "$HWCI_FREQ_MAX" = "true" ]; then
test -z "$GPU_AUTOSUSPEND" || echo -1 > $GPU_AUTOSUSPEND || true test -z "$GPU_AUTOSUSPEND" || echo -1 > $GPU_AUTOSUSPEND || true
fi fi
# Increase freedreno hangcheck timer because it's right at the edge of the
# spilling tests timing out (and some traces, too)
if [ -n "$FREEDRENO_HANGCHECK_MS" ]; then
echo $FREEDRENO_HANGCHECK_MS | tee -a /sys/kernel/debug/dri/128/hangcheck_period_ms
fi
# Start a little daemon to capture the first devcoredump we encounter. (They # Start a little daemon to capture the first devcoredump we encounter. (They
# expire after 5 minutes, so we poll for them). # expire after 5 minutes, so we poll for them).
./capture-devcoredump.sh & ./capture-devcoredump.sh &

View File

@ -22,6 +22,9 @@
variables: variables:
DEQP_VER: vk DEQP_VER: vk
VK_DRIVER: freedreno VK_DRIVER: freedreno
# Increase the hangcheck timer for our spilling tests which bump up against
# the .5s default.
FREEDRENO_HANGCHECK_MS: 2000
.freedreno-test-traces: .freedreno-test-traces:
extends: extends:
@ -150,6 +153,9 @@ a618_vk:
BOOT_METHOD: depthcharge BOOT_METHOD: depthcharge
KERNEL_IMAGE_TYPE: "" KERNEL_IMAGE_TYPE: ""
RUNNER_TAG: mesa-ci-x86-64-lava-sc7180-trogdor-lazor-limozeen RUNNER_TAG: mesa-ci-x86-64-lava-sc7180-trogdor-lazor-limozeen
# Increase the hangcheck timer for our spilling tests which bump up against
# the .5s default.
FREEDRENO_HANGCHECK_MS: 2000
a618_vk_full: a618_vk_full:
extends: extends: