Commit Graph

23 Commits

Author SHA1 Message Date
Emma Anholt 5f09b1ebe9 ci/bare-metal: Add test phase timeouts to all boards.
This should help with "marge got stuck for an hour and all I got was this
failed job with no results/" when a system intermittently wedges.

This replaces the BM_POE_TIMEOUT ("did we get something on serial in the
last 3 minutes?") that rpi had, in favor of checking that the whole test
job gets through in 20 minutes.

Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
2022-06-21 21:38:25 +00:00
Emma Anholt cd3d9a7a92 ci/bare-metal: Add handling of netboot firmwares for servo boards.
My local trogdor has a netboot firmware and I want to be able to use it to
test the timeout code I'm working on.

Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
2022-06-21 21:38:25 +00:00
Emma Anholt 3f8114d1e0 ci/bare-metal: Get rid of servo's serial feed threads.
If the SerialBuffers can just feed the same line queue, then we don't need
the extra threads reading line queues into a new merged line queue.

Less python threading code is always better.  Plus, now we can pass args
to SerialBuffer.lines() for timeout/phase.

Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
2022-06-21 21:38:25 +00:00
Emma Anholt 1e15ec1949 ci/bare-metal: Apply autopep8 to our python scripts.
My editor likes to pep8 as I edit, and I'm tired of carefully not
committing those changes.

Acked-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17096>
2022-06-21 21:38:25 +00:00
Emma Anholt d633eace3f ci/freedreno: Try to detect a wedged MMU that's happened recently.
Possibly since the VK-GL-CTS 1.3.1.0 uprev.  It doesn't seem to recover,
like it says.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14945>
2022-02-10 01:13:31 +00:00
Emma Anholt 8f5a0bd9b4 ci/bare-metal: Close serial and join serial threads before exit.
This should fix the intermittent (~1/week) cheza failure where python
complains that a thread tried to do stdio while the main thread has
exited.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13462>
2021-11-10 20:36:57 +00:00
Emma Anholt b86da01c54 ci/freedreno: Restart the run if cheza spontenously reboots.
Occasionally (once every couple weeks?) a cheza reboots mid run, around a
GPU fault.  Detect that and do an internal retry instead of failing out
the job.

Closes: #5388
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13181>
2021-10-04 22:15:27 +00:00
Emma Anholt 306a039472 ci/baremetal: Retry if our network device spontaneously fails.
Seen in https://gitlab.freedesktop.org/mesa/mesa/-/jobs/13824132.  It's
unlikely that graphics would kill the network, so just assume it's not our
fault and keep going.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12939>
2021-09-20 19:55:55 +00:00
Daniel Stone 5f32d2a438 ci: Consistent pass/fail result output
One less point of differentiation.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
2021-06-15 14:02:44 +02:00
Eric Anholt 1af7be02d7 ci/bare-metal: Move the db820c lockup detect to the right boot script.
Fixes: 2407952ec9 ("ci/bare-metal: Restart a run on intermittent kernel lockups.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9715>
2021-03-19 22:07:57 +00:00
Eric Anholt 2407952ec9 ci/bare-metal: Restart a run on intermittent kernel lockups.
Since enabling SMP on db820c and cranking up how many tests we run, we've
been seeing lockups like this a couple of times a week.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9655>
2021-03-17 17:13:22 +00:00
Juan A. Suarez Romero e45d372968 ci/baremetal: highlight message errors
Highlight in red errors from the baremetal run, so user is more aware of
what happened.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9335>
2021-03-01 18:22:24 +00:00
Eric Anholt ce1bb26b06 ci/freedreno: Detect cheza HFI errors and restart the run.
These are intermittent (~1/day), seem to be around GPU faults (so
hopefully will go away once we clean up piglit's fault errors), and are
probably also related to our vintage firmware.  Until we can get new
hardware in the farm, just restart the flaked job.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8722>
2021-01-26 19:17:27 +00:00
Eric Anholt 5fca7cd8b8 ci/freedreno: Detect the cheza power management bus error and restart.
This is an issue on the cheza platform, the theory is due to some old
firmware bug that will be fixed in future platforms.  Given that cheza was
a target that didn't get released and we expect future platforms to be
fixed, just detect the issue and restart.

I've noticed this error in my CI monitoring less than once a week.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7993>
2020-12-08 23:31:17 +00:00
Eric Anholt ff6741728d ci/bare-metal: Apply autopep8 to the bare-metal scripts.
Let's follow proper python formatting (easy now that vscode does it for
me)

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
2020-11-06 19:48:39 +00:00
Eric Anholt e3c7748b2e ci/bare-metal: Move the "POWER_GOOD not seen in time" check to the right time.
The poweron failure happens before we get to the bootloader
("load_archive: loading locale_en.bin") not after we're trying to boot the
kernel and we're waiting for the deqp run to complete.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6970>
2020-10-02 02:41:37 +00:00
Eric Anholt 0453a46f66 ci/bare-metal: Fix capturing of serial output as job artifacts.
I tried to put them in the wrong directory -- everything needs to go in
results/, which we want clean and ready before we start our job.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
2020-09-03 23:22:43 +00:00
Eric Anholt 24f5f11719 ci/bare-metal: Log why our run restarts when it does.
It would be confusing to see a job quietly restart itself in the middle.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
2020-09-03 23:22:43 +00:00
Eric Anholt ff42b7e804 ci/bare-metal: Fix detection of "POWER_GOOD not seen in time" fails
We were only reading from the CPU serial, not EC, so we'd never notice
these sources of job timeouts.  I couldn't find a cleaner solution, so I
spawned two threads to do the blocking reads from our serial line fifos
and merge them together in a single queue to read.

Closes: #3470
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
2020-09-03 23:22:41 +00:00
Eric Anholt b7787ce18d ci/bare-metal: Use re.search() instead re.match() for our line matching.
match() looks for the start of the line to match our regex, while search
just looks for the regex anywhere in the line.  I messed this up when
converting our greps in shell to python, which was part of breaking the
POWER_GOOD flake detection.  Most of our matches worked, but let's
consistently use this one so we don't mess this up in the future.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
2020-09-03 23:22:40 +00:00
Eric Anholt 2da1178bf3 ci/bare-metal: Try rebooting chezas again if they get stuck during tftp.
Occasionally something goes weird in the network and a group of chezas
will produce streams of these errors during the tftp process, eventually
timing out after 60 minutes in the job.  By the time we notice, the next
jobs seem to go through fine, so watch for them and try rebooting the
cheza to see if that gets our jobs to pass again.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
2020-08-21 20:10:18 +00:00
Eric Anholt c27075e9e1 ci/bare-metal: Retry booting chezas instead of failing when !POWER_GOOD
If we get this error, we can just try rebooting again and see if it comes
up then.  The POWER_GOOD failures are clustered in time, but it's better
to retry a few times in a row in one job (which has its own 60min timeout)
than to spuriously fail someone's pipeline.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
2020-08-21 20:10:18 +00:00
Eric Anholt c63648121e ci/bare-metal: Convert the main cros-servo boot code to python
Switching this part to python makes the code clearer and cleans up our
logs as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
2020-08-21 20:10:18 +00:00