We were using surfaceless, which misses out on some useful coverage we'd
like to have in the GLX/EGL piglit tests, but more importantly prevented
many traces from running.
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8727>
The gitlab artifacts handling has been slow in the past as we hit
gitlab.fdo from multiple runners, and it costs fd.o egress bandwidth. Use
the local http cache against the packet.net minio to cut that downloads
cost.
Closes: #3249
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8727>
These are intermittent (~1/day), seem to be around GPU faults (so
hopefully will go away once we clean up piglit's fault errors), and are
probably also related to our vintage firmware. Until we can get new
hardware in the farm, just restart the flaked job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8722>
The script that monitors activity in the serial assumes that something
was wrong if it does not detect activity in 60 seconds, rebooting the
device and re-trying the test again.
While this timeout is enough for most cases, in some cases it is not
enough. For instance, when executing piglit testsuite it takes quite a
few time to generate the results after the test is done.
This allow to setup a custom timeout (`BM_POE_TIMEOUT`) in the proper
jobs.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8702>
Replace the expect-based script to turn on/off the Raspberry Pi devices
using a python-based script.
v2:
- Fix small nitpicks (Juan)
- Limit line length (Andres)
v3:
- Bump image tags (Eric, Andres)
v4:
- Bump image tags (Eric)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8362>
These bring a whole lot of new coverage to these drivers, since dEQP is
bad at desktop GL feature coverage around early GL 3.x. piglit also gets
at a lot of MSAA, fast clearing, and texture layout issues that dEQP
doesn't do much with.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
I've set it up in the gitlab-runer config on all the freedreno boards.
This means that for piglit, where the run.sh always choose either this
variable or 4 threads otherwise, we'll have the right number of parallel
tasks.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
ARM64 had it for traces only, upgrade it to a full build so we can test
a630. We also add it for armhf, as we'll want it on both rpi and etnaviv.
Bumped the LAVA tag as well, since the script changes a bit and it does
impact the final image (even if we aren't pulling in full piglit there
yet). Note I also had to drop the "v" on the tarring of their rootfs, as
the verbosity on baremetal was exceeding job log size.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7370>
v2:
- Squashed the commit to remove tracie jobs (Eric).
v3:
- Rename *-piglit-traces jobs with *-traces (Eric).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6388>
When introducing/removing these files, it's easy to forget to update the
yml to point to them. Instead of requiring the separate update, just have
the runner script pick the right one from a single per-gpu variable.
As a result, we now pick up the new deqp-lvp-skips.txt that was added but
not conected. This also required moving some bypass flakes from the
shared a630 flakes list to a separate list, which is a feature because now
we'd notice the introduction of flakes to the gmem path.
Fixes: ab79e6b8e3 ("ci: skip failing test on lavapipe")
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8147>
This is an issue on the cheza platform, the theory is due to some old
firmware bug that will be fixed in future platforms. Given that cheza was
a target that didn't get released and we expect future platforms to be
fixed, just detect the issue and restart.
I've noticed this error in my CI monitoring less than once a week.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7993>
This tests OpenGL ES 2.0 CTS suite with VC4 drivers, through baremetal
Raspberry Pi 3 devices.
The devices are connected to a switch that supports Power over Ethernet
(PoE), so the devices can be started/stopped through the switch, and
also to a host that runs the GitLab runner through serial-to-USB cables,
to monitor the devices to know when the testing finishes.
The Raspberries uses a network boot, using NFS and TFTP. For the root
filesystem, they use the one created in the armhf container. For the
kernel/modules case, this is handled externally. Currently it is using
the same kernel/modules that come with the Raspberry Pi OS. In future we
could build them in the same armhf container.
At this moment we only test armhf architecture, as this is the default
one suggested by the Raspberry Pi Foundation. In future we could also
add testing for arm64 architecture.
Finally, for the very rare ocassions where the Raspberry Pi 3 device is
booted but no data is received, it retries the testing for a second
time, powering off and on the device in the process.
v2:
- Remove commit that exists capture devcoredump (Eric)
- Squash remaining commits in one (Andres)
v3:
- Add missing boot timeout check (Juan)
v4:
- Use locks when running the PoE on/off script (Eric)
- Use a timeout for serial read (Eric)
v5:
- Rename stage to "raspberrypi" (Eric)
- Bump up arm64_test tag (Eric)
v6:
- Make serial buffer timeout optional (Juan)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7628>
To increase our VK coverage on a630, we want to have two jobs in parallel,
but we still can't hit full coverage so we need the fractional setting to
be separate from gitlab CI's flags for setting up parallel jobs.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
This saves the minute and a half boot time on each of these minute-or-less
test jobs. The whole job was 3.5 minutes in my last run.
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
I found the C++ runner hard to develop on, and we had stability issues and
outstanding feature needs that made me want something I felt good about
hacking on. Thus, Rewrite It In Rust of the deqp runner.
The new runner includes:
- Skip lists don't reshuffle the test list.
- Known-flake handling without resorting to skip lists (fixing our main CI
reliability issue on a3xx right now).
- Per-thread Vulkan shader caches should speed up VK CI runtime.
- Tracking of crashes separate from fails (so we can see progress on that
front).
- Logging of deqp stderr spam (particularly assertion failures!) in the CI
log.
- Integrated QPA filtering so we don't have bash perf issues for it.
- Logging of what caselist to go look at for a given error report (in red,
so it's easier to find in your CI log).
- The code is 1/3 unit tests, and easy to extend for more coverage.
- Non-LAVA CI runs create a failures.csv in artifacts that you can check
in as your deqp-*-fails.txt file.
- Test runtime is included in results.csv so you can debug how to speed up
your CI job.
- Pretty summary at the end of the run of slow/flaky/failed tests.
Since this is a new runner with a different RNG, the test groups are
shuffled one more time. This seems to result in some panfrost T720
stability issues (See its new deqp-panfrost-t720-flakes.txt), and one new
flake in freedreno a630.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
We don't want the next line of our timestamp and other context to inherit
colors set by the serial command (visible with the new dEQP runner)
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
The poweron failure happens before we get to the bootloader
("load_archive: loading locale_en.bin") not after we're trying to boot the
kernel and we're waiting for the deqp run to complete.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6970>
It's useful for kernel dev to be able throw all of our testing
infrastructure at a risky kernel change, but it's expensive (time and
bandwidth) to roll new containers every time your rev your kernel. Make
it so you can just point the env vars to your personal build you've
uploaded.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6592>
Modeling after what I did for cros_servo_run.py, this gives us easy
support for restarting the test run a530 when we detect a spontaneous
reboot. I had to touch up serial_buffer.py to handle buffering in from a
file instead of a serial device, to support the upcoming etnaviv CI
(tested by running it against a serial log from db410c and seeing it step
to calling "fastboot")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
gitlab CI doesn't include timestamps in its logs by default, but it's
really useful for finding delays in our CI so stuff one in on the lines
coming in from serial and being output to the gitlab log. The artifacts
file is still the raw serial output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
We were only reading from the CPU serial, not EC, so we'd never notice
these sources of job timeouts. I couldn't find a cleaner solution, so I
spawned two threads to do the blocking reads from our serial line fifos
and merge them together in a single queue to read.
Closes: #3470
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
match() looks for the start of the line to match our regex, while search
just looks for the regex anywhere in the line. I messed this up when
converting our greps in shell to python, which was part of breaking the
POWER_GOOD flake detection. Most of our matches worked, but let's
consistently use this one so we don't mess this up in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
Occasionally something goes weird in the network and a group of chezas
will produce streams of these errors during the tftp process, eventually
timing out after 60 minutes in the job. By the time we notice, the next
jobs seem to go through fine, so watch for them and try rebooting the
cheza to see if that gets our jobs to pass again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
If we get this error, we can just try rebooting again and see if it comes
up then. The POWER_GOOD failures are clustered in time, but it's better
to retry a few times in a row in one job (which has its own 60min timeout)
than to spuriously fail someone's pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
This one uses python threads to move some of our logic from shell
pipelines to python, and opens the door to doing better serial output
tracking in the future (the SerialBuffer.lines() method)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
So far, we've been putting our known flakes that intermittently fail CI
into the skips list. This has two downsides:
1) You don't know when the flakes stop happening and when to delist them
from skips, unless you go do a bunch of manual runs with the skips list
cleared.
2) If the flake was because the previous test left some broken state in
the HW, you may just move your intermittent to a new test.
With this new path, you can list your flakes in the flakes file to keep
them from erroring out people's pipelines. They still get run and
reported as is.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6392>
Eric Anholt identified the issue when merging one of my MRs: the
variable contained words in '`' backticks, which caused them to be
executed by the bare metal runner's shell.
Quote the value printed using bash's shell expansion feature to make
sure anything in the future will be properly quoted.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6389>
The previous code considered unset variables the same as set-but-empty;
sometimes setting a variable as something empty is meaningful, so let's
pass them through properly.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6389>
As drivers have been tested with more and more traces, the yml file is
becoming a bit unwieldy. As more drivers are going to be tested with
traces, and more traces will be used, split them in per-driver files so
the size stays manageable.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Reviewed-By: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6185>
Baremetal jobs filter the variables they get from .gitlab-ci.yml, and
TRACIE_UPLOAD_TO_MINIO and others weren't being let through.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Fixes: d4ca45eca2 ("ci: Upload traces' reference and actual images to MinIO")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6171>
Connor recently ran into an issue where the chezas were hanging where his
GPUs weren't, and was blocked on getting some feedback on what was
happening. A devcoredump will help non-cheza-having devs debug (or
hopefully with other intermittent fails).
Closes: #3187
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6036>
When the lava files were moved out of the container, this stopped
working which caused the traces job for Freedreno to not run any traces
at all.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Fixes: dcd171f5e9 ("gitlab-ci: More stable URL for kernel and ramdisk artifacts, for LAVA")
Acked-by: Andres Gomez <agomez@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6021>
We don't want these files shared between builds (it'll get blown away by
the next rsync), and NFS will just increase our latency for hitting the
cache.
Drops a630 gles31 run from 11-17 minutes to 5.5. Maximum cache size on a
run I've seen is 153M, which it seems we can easily spare.
Fixes: f97acb4bb4 ("freedreno/ir3: disk-cache support")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5998>
I tried not to edit too much meaning in the process, but I did shuffle
some stuff around to work as structured documentation.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5510>
This job runs in about one minute on the current set of traces, and has
successfully revealed some bugs in our current rendering. Takes about 7
minutes currently.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5433>
Builds the renderdoc and apitrace programs so we can replay GL traces on
DUTs.
[Separated out from 5472's commit that also enabled the jobs in LAVA,
dropped unnecessary python packages from arm_build, fixed up arm64_test
build, traces-db in baremetal, new commit message by anholt]
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5433>