Getting piglit to fit onto our test devices was proving difficult, and we
need the ability to handle flakes, so switch to the rust piglit runner
that @pepp wrote as part of the deqp-runner repo which gives us flake
detection, sharding across boards, fractional runs, and almost half the
runtime.
It doesn't handle piglit subtests yet, but if you can't run piglit's
python on your devices because it's too bloated and unstable, this is a
way forward.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9468>
So far, testing VC4 and V3D/V3DV requires the CI runners having access
to a Raspberry Pi 3/4 kernel, and the correspondent modules and
bootloader files. If a different kernel must be used, it means touching
the runners to provide them.
This commit adds the option to define an URL pointing to a (compressed)
tarball containing such files, without requiring dealing with the
runners. This link is provided through the `BM_BOOTFS` job variable.
The tarball must contain two directories in the root: a `/boot`
directory (containing the kernel, DTBs and bootloader files), and a
`/lib/modules` (or `/usr/lib/modules`) with the kernel modules.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9527>
So far we were retrying the testing (through device rebooting) if we did
not detect the boot sequence.
But found a couple of times that the serial log can also be "lost"
during the testing process. In all those times a manual retry of the job
was enough to complete the test.
Thus, let's apply the retry once automatically in this case.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9335>
Highlight in red errors from the baremetal run, so user is more aware of
what happened.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9335>
This allows to split a piglit job in several parallel jobs, to speed up
the execution.
Due piglit restrictions, this only works for single profiles. Otherwise
an error will be shown in the runner.
Also, a new gitlab job variable `PIGLIT_TESTS` is introduced that
contains the excluded/included tests with `-x` or `-n`. The rest of the
piglit options go to `PIGLIT_OPTIONS` (like `--timeout n`).
v2 (Andres):
- Replay profile is supported in parallel jobs.
- Bail out inmediately if parallel jobs is tried with multiple
profiles.
- Use testlist only when doing parallel jobs.
- Do not drop pass tests when filtering executed tests.
- Get rid of PIGLIT_FRACTION.
v4:
- uncommit unrelated change (Andres).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9022>
We have to disable the GLSL unit tests because with asan it runs way too
much code under qemu and times out. Those unit tests have coverage on
x86, anyway.
I also included a vulkan run, which is disabled by default due to timeouts
that I need to sort out still. It should be a useful tool for turnip
devs, though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9070>
We are now using pages.
v2:
- Define a helper variable for the artifacts base URL (Juan).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9092>
Add OpenGL and Vulkan testing for V3D and V3DV respectively.
Add also a couple of manual piglit jobs for V3D.
v2:
- Replace custom mustpass with running fraction of tests (Eric)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8745>
When loading Vulkan ICD file, it uses the CPU machine identifier to
load the correct one, in case multiple versions are installed.
This is fine if the machine where Mesa has been built and the machine
where the test is run are exactly the same. But this is not always the
case. As example, for armhf architecture, the machine where Mesa is
built is identified as `arm7hlf`, but the Raspberry Pi 4 is identified
as `armv7l`, so it will fail to load the ICD file, though both are
totally compatible.
This allow to define the architecture instead.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8745>
3 commits in 0.5.0:
- 20-40s savings on many of our CI runs by dropping the clever test size
scaling code.
- Even bigger savings (especially on deqp-vk runs) by increasing maximuim
test group size (~1/4 of runtime was spawning deqp on cheza, that cost
is cut by ~75%)
- No more needing to manually set MESA_DEBUG=silent
2 commits in 0.5.1:
- Fixed automatic thread pool sizing to keep all CPUs busy (thanks for
catching that Bas!).
- Automatically size down test groups on short test lists and many CPUs,
so split the list evenly between CPUs (such as on freedreno -options
jobs).
Acked-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8787>
We were using surfaceless, which misses out on some useful coverage we'd
like to have in the GLX/EGL piglit tests, but more importantly prevented
many traces from running.
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8727>
The gitlab artifacts handling has been slow in the past as we hit
gitlab.fdo from multiple runners, and it costs fd.o egress bandwidth. Use
the local http cache against the packet.net minio to cut that downloads
cost.
Closes: #3249
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8727>
These are intermittent (~1/day), seem to be around GPU faults (so
hopefully will go away once we clean up piglit's fault errors), and are
probably also related to our vintage firmware. Until we can get new
hardware in the farm, just restart the flaked job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8722>
The script that monitors activity in the serial assumes that something
was wrong if it does not detect activity in 60 seconds, rebooting the
device and re-trying the test again.
While this timeout is enough for most cases, in some cases it is not
enough. For instance, when executing piglit testsuite it takes quite a
few time to generate the results after the test is done.
This allow to setup a custom timeout (`BM_POE_TIMEOUT`) in the proper
jobs.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8702>
Replace the expect-based script to turn on/off the Raspberry Pi devices
using a python-based script.
v2:
- Fix small nitpicks (Juan)
- Limit line length (Andres)
v3:
- Bump image tags (Eric, Andres)
v4:
- Bump image tags (Eric)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8362>
These bring a whole lot of new coverage to these drivers, since dEQP is
bad at desktop GL feature coverage around early GL 3.x. piglit also gets
at a lot of MSAA, fast clearing, and texture layout issues that dEQP
doesn't do much with.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
I've set it up in the gitlab-runer config on all the freedreno boards.
This means that for piglit, where the run.sh always choose either this
variable or 4 threads otherwise, we'll have the right number of parallel
tasks.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
ARM64 had it for traces only, upgrade it to a full build so we can test
a630. We also add it for armhf, as we'll want it on both rpi and etnaviv.
Bumped the LAVA tag as well, since the script changes a bit and it does
impact the final image (even if we aren't pulling in full piglit there
yet). Note I also had to drop the "v" on the tarring of their rootfs, as
the verbosity on baremetal was exceeding job log size.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
v2:
- Squashed the commit to remove tracie jobs (Eric).
v3:
- Rename *-piglit-traces jobs with *-traces (Eric).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6388>
When introducing/removing these files, it's easy to forget to update the
yml to point to them. Instead of requiring the separate update, just have
the runner script pick the right one from a single per-gpu variable.
As a result, we now pick up the new deqp-lvp-skips.txt that was added but
not conected. This also required moving some bypass flakes from the
shared a630 flakes list to a separate list, which is a feature because now
we'd notice the introduction of flakes to the gmem path.
Fixes: ab79e6b8e3 ("ci: skip failing test on lavapipe")
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8147>
This is an issue on the cheza platform, the theory is due to some old
firmware bug that will be fixed in future platforms. Given that cheza was
a target that didn't get released and we expect future platforms to be
fixed, just detect the issue and restart.
I've noticed this error in my CI monitoring less than once a week.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7993>
This tests OpenGL ES 2.0 CTS suite with VC4 drivers, through baremetal
Raspberry Pi 3 devices.
The devices are connected to a switch that supports Power over Ethernet
(PoE), so the devices can be started/stopped through the switch, and
also to a host that runs the GitLab runner through serial-to-USB cables,
to monitor the devices to know when the testing finishes.
The Raspberries uses a network boot, using NFS and TFTP. For the root
filesystem, they use the one created in the armhf container. For the
kernel/modules case, this is handled externally. Currently it is using
the same kernel/modules that come with the Raspberry Pi OS. In future we
could build them in the same armhf container.
At this moment we only test armhf architecture, as this is the default
one suggested by the Raspberry Pi Foundation. In future we could also
add testing for arm64 architecture.
Finally, for the very rare ocassions where the Raspberry Pi 3 device is
booted but no data is received, it retries the testing for a second
time, powering off and on the device in the process.
v2:
- Remove commit that exists capture devcoredump (Eric)
- Squash remaining commits in one (Andres)
v3:
- Add missing boot timeout check (Juan)
v4:
- Use locks when running the PoE on/off script (Eric)
- Use a timeout for serial read (Eric)
v5:
- Rename stage to "raspberrypi" (Eric)
- Bump up arm64_test tag (Eric)
v6:
- Make serial buffer timeout optional (Juan)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7628>
To increase our VK coverage on a630, we want to have two jobs in parallel,
but we still can't hit full coverage so we need the fractional setting to
be separate from gitlab CI's flags for setting up parallel jobs.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
This saves the minute and a half boot time on each of these minute-or-less
test jobs. The whole job was 3.5 minutes in my last run.
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
I found the C++ runner hard to develop on, and we had stability issues and
outstanding feature needs that made me want something I felt good about
hacking on. Thus, Rewrite It In Rust of the deqp runner.
The new runner includes:
- Skip lists don't reshuffle the test list.
- Known-flake handling without resorting to skip lists (fixing our main CI
reliability issue on a3xx right now).
- Per-thread Vulkan shader caches should speed up VK CI runtime.
- Tracking of crashes separate from fails (so we can see progress on that
front).
- Logging of deqp stderr spam (particularly assertion failures!) in the CI
log.
- Integrated QPA filtering so we don't have bash perf issues for it.
- Logging of what caselist to go look at for a given error report (in red,
so it's easier to find in your CI log).
- The code is 1/3 unit tests, and easy to extend for more coverage.
- Non-LAVA CI runs create a failures.csv in artifacts that you can check
in as your deqp-*-fails.txt file.
- Test runtime is included in results.csv so you can debug how to speed up
your CI job.
- Pretty summary at the end of the run of slow/flaky/failed tests.
Since this is a new runner with a different RNG, the test groups are
shuffled one more time. This seems to result in some panfrost T720
stability issues (See its new deqp-panfrost-t720-flakes.txt), and one new
flake in freedreno a630.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
We don't want the next line of our timestamp and other context to inherit
colors set by the serial command (visible with the new dEQP runner)
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
The poweron failure happens before we get to the bootloader
("load_archive: loading locale_en.bin") not after we're trying to boot the
kernel and we're waiting for the deqp run to complete.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6970>
It's useful for kernel dev to be able throw all of our testing
infrastructure at a risky kernel change, but it's expensive (time and
bandwidth) to roll new containers every time your rev your kernel. Make
it so you can just point the env vars to your personal build you've
uploaded.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6592>
Modeling after what I did for cros_servo_run.py, this gives us easy
support for restarting the test run a530 when we detect a spontaneous
reboot. I had to touch up serial_buffer.py to handle buffering in from a
file instead of a serial device, to support the upcoming etnaviv CI
(tested by running it against a serial log from db410c and seeing it step
to calling "fastboot")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
gitlab CI doesn't include timestamps in its logs by default, but it's
really useful for finding delays in our CI so stuff one in on the lines
coming in from serial and being output to the gitlab log. The artifacts
file is still the raw serial output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
We were only reading from the CPU serial, not EC, so we'd never notice
these sources of job timeouts. I couldn't find a cleaner solution, so I
spawned two threads to do the blocking reads from our serial line fifos
and merge them together in a single queue to read.
Closes: #3470
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
match() looks for the start of the line to match our regex, while search
just looks for the regex anywhere in the line. I messed this up when
converting our greps in shell to python, which was part of breaking the
POWER_GOOD flake detection. Most of our matches worked, but let's
consistently use this one so we don't mess this up in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
Occasionally something goes weird in the network and a group of chezas
will produce streams of these errors during the tftp process, eventually
timing out after 60 minutes in the job. By the time we notice, the next
jobs seem to go through fine, so watch for them and try rebooting the
cheza to see if that gets our jobs to pass again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
If we get this error, we can just try rebooting again and see if it comes
up then. The POWER_GOOD failures are clustered in time, but it's better
to retry a few times in a row in one job (which has its own 60min timeout)
than to spuriously fail someone's pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
This one uses python threads to move some of our logic from shell
pipelines to python, and opens the door to doing better serial output
tracking in the future (the SerialBuffer.lines() method)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
So far, we've been putting our known flakes that intermittently fail CI
into the skips list. This has two downsides:
1) You don't know when the flakes stop happening and when to delist them
from skips, unless you go do a bunch of manual runs with the skips list
cleared.
2) If the flake was because the previous test left some broken state in
the HW, you may just move your intermittent to a new test.
With this new path, you can list your flakes in the flakes file to keep
them from erroring out people's pipelines. They still get run and
reported as is.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6392>
Eric Anholt identified the issue when merging one of my MRs: the
variable contained words in '`' backticks, which caused them to be
executed by the bare metal runner's shell.
Quote the value printed using bash's shell expansion feature to make
sure anything in the future will be properly quoted.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6389>
The previous code considered unset variables the same as set-but-empty;
sometimes setting a variable as something empty is meaningful, so let's
pass them through properly.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6389>