The stages and mapping of jobs to them are somewhat arbitrary; the goal
is to avoid having to scroll through large numbers of jobs.
v2: (Pierre-Eric Pelloux-Prayer)
* Use even more stages for test jobs
* Give somewhat meaningful names to stages
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3995>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3995>
Now that we have 7 (soon 8) boards available, there's capacity to be
testing GLES 3.0. However, due to (it looks like) buffer overflows in the
driver, we end up with flaky test results: 1/60 jobs spuriously failed,
and another 6/60 jobs reported flakes. At 6 jobs per pipeline, that's way
too high of a failure rate to enable for non-freedreno developers. Leave
the job present but disabled so that we can do manual test runs for
regressions.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3661>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3661>
This should get us better stability of the db410c boards by having a
smaller per-board software stack, with no disks involved (just initramfs).
Additionally, the new cluster is 7 (soon 8) db410cs, while currently the
docker cluster only has 1/4 of its db410cs still running.
Unfortunately, we have to prepare the fastboot boot image during the ARM
drivers build stage, because LAVA relies on publicly available URLs for
the images to load into the bootloaders of the boards, and the only thing
we have for that is gitlab's artifacts.
Note that this testing relies on the boards being freshly flashed with the
linaro v136 firmware to pick up the initramfs size fixes and to stop the
boot at fastboot.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3661>
We can drop now the GLES version overrides now that we have a DEBUG flag
that enables all what is expected from a GLES 3.0 implementation.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
We are able to run only 1/5th of the tests in around the same time that
dEQP-GLES2 takes, so do that for now while more DUTs are installed.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3876>
Some dEQP tests have started passing and it's taking a while to update
the expectations and skips list.
Disable for now so CI doesn't fail and stuff can be merged.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2935>
Amend fails and skips lists basing on lists from Andreas Baierl,
shard mali400 job across two devices since it takes close to 10min
and rename jobs to lima-mali400-test and lima-mali450-test.
Also don't set MESA_GLES_VERSION_OVERRIDE=3.0 for lima since we don't support
GLES 3.0 and lower DEQP_PARALLEL to 3 for jobs on H3.
Keep mali400 jobs disabled atm since they take too much time to complete
and we also get some unexplicable failures in dEQP-GLES2.functional.default_vertex_attrib.*
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
deqp-runner.sh uses it to determine whether we split job across multiple
devices and if we do what's the node index.
With this change we now can set 'parallel: N' in job description if we want
to split the job.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Use the normal build job to also prepare the artifacts for LAVA jobs.
For that, the build container needs to also build the test suites,
kernel, ramdisk, etc.
Then the build job will place the just-built Mesa in the ramdisk and the
test job can generate a LAVA job and point to those artifacts.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3295>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3295>
Take one step towards sharing code between the LAVA and non-LAVA jobs,
with the goals of reducing maintenance burden and use of computational
resources.
The env var DEQP_NO_SAVE_RESULTS allows us to skip the procesing of the
XML result files, which can take a long time and is not useful in the
LAVA case as we are not uploading artifacts anywhere at the moment.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I feel really bad about this but this one test is flaking. I don't want
to do a mass revert (and bisection is extremely difficult with
nondeterministic/Heisenbugs), but it's Friday night and master needs to
pass. This commit should be reverted asap (once the flake is solved)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Having different policies could have some weird results, e.g. changes
only touching documentation (where the intention is not to run the
pipeline by default) would still create a pipeline with the LAVA jobs
running by default.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Now that the Mali T720 GPU is supoprted at the same level as the T760,
test it on PINE64 H64 boards.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Two benefits:
Most docker image related environment variables can now be defined in
the jobs where they're used instead of globally. The DEBIAN_TAG values
are propagated to other jobs via YAML anchors.
Images on https://gitlab.freedesktop.org/mesa/mesa/container_registry
are now organized in separate repositories with a suffix matching the
name of the job which makes sure the image is there.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Cleans up .gitlab-ci/ a little, and allows using a single DEBIAN_EXEC
line for all container jobs.
v2:
* Use lava_arm.sh instead of arm_lava.sh for consistency with v2 of the
previous change
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This makes it easier to tell which job is which in a pipeline.
v2:
* Use lava_arm{64,hf} instead of arm{64,hf}_lava to keep these jobs
together in pipeline overviews
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Otherwise there can be weird breakage.
(Removing the include from .gitlab-ci/lava-gitlab-ci.yml doesn't seem
possible unfortunately:
https://gitlab.freedesktop.org/daenzer/mesa/pipelines/79458)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
One job for the quick_gl profile, one for the glslparser & quick_shader
profiles (doing these together takes hardly any more time than
quick_shader alone).
v2:
* Don't break lava tests
v3:
* Remove piglit test artifacts paths:
* Exclude some quick_shader tests again:
- Test whose result flips between pass/fail/skip
- *@vs_in tests, as not the same one of these gets picked every time
v4:
* Do not list passing tests in .gitlab-ci/piglit/*.txt (Eric Anholt)
* Include the test number summary in .gitlab-ci/piglit/*.txt
* Completely disable generating any vs_in tests in the piglit build.
* Remove some more unneded files from the piglit build tree.
* Exclude quick_gl arb_gpu_shader5 tests; they were all skipped anyway,
as llvmpipe doesn't support this extension yet, but occasionally they
would spuriously fail instead.
v5:
* Set LD_LIBRARY_PATH, so we actually test the Mesa build from the
pipeline...
* Verify that wflinfo reports the expected Mesa version
* Pass -noreset to Xvfb
v6:
* Don't use autoscale runners, run piglit with -j4 (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit c9df92bf79.
It turns out that gitlab-runner uses kubernetes all wrong, spawning Pods
and sshing into them to run the script instead of Jobs containing the
script to run. This means that when anything goes wrong with the pod
(autoscale, preemption, VM maintenance, cluster reconfiguration), the job
fails and only sometimes gets handled as a runner system failure. Even
worse, due to bugs in either the runner or k8s itself, some classes of
timeout-related failure end up not being reported as failures, and the job
will incorrectly report success!
Disable using the "autoscale" cluster until we can do something else
(docker-machine instead of k8s, or the custom third-party k8s-native
runner).
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Run only jobs needed for testing on LAVA devices if a branch starts with
lava-ci-.
This allows developers to have faster test cycles as these pipelines
take only a bit above 8 minutes. Also has the advantage of conserving
resources.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The GKE pool we're using is 1-3 32-core VMs, preemptible (to keep
costs down), with 8 jobs concurrent per system. We have plenty of
memory (4G/core), so we run make -j8 to try to keep the cores busy even
when one job is in a single-threaded step (docker image download, git
clone, artifacts processing, etc.) When all jobs are generating work
for all the cores, they'll be scheduled fairly.
The nodes in the pool have 300GB boot disks (over-provisioned in space
to provide enough iops and throughput) mounted to /ccache, and
CACHE_DIR set pointing to them. This means that once a new
autoscaled-up node has run some jobs, it should have a hot ccache from
then on (instead of having to rely on the docker container cache
having our ccache laying around and not getting wiped out by some
other fd.o job). Local SSDs would provide higher performance, but
unfortunately are not supported with the cluster autoscaler.
For now, the softpipe/llvmpipe test runs are still on the shared
runners, until I can get them ported onto Bas's runner so they can be
parallelized in a single job.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
The runner that submits jobs there is down and will turn some time to
get fixed. Disable them for now to keep the CI green.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Update to 5.4-rc4 so we can test Panfrost on devices with Mali T720 and
T820.
A bug was found that prevented things working at all on RK3288 devices,
so we carry a patch for now in my personal fork.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
It's been throwing the following error today:
"<Fault -32603: 'Internal Server Error (contact server administrator
for details): could not extend file "base/17952/18226": No space left
on device\nHINT: Check free disk space.\n'>"
Reviewed-by: Daniel Stone <daniels@collabora.com>
While at it, rename to singular "container" for consistency.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Run dEQP on boards with Mali 400 and 450 in Baylibre's lab.
There's lots of skipped tests because of crashes and undetermined
behavior. May be a good idea to run the tests with valgrind and fix any
issues found.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
As the non-LAVA runner script does, have per-GPU version files listing
the tests that are to be skipped, due to being very slow, unstable, etc.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Without this, the test jobs could spuriously run after the container
job failed or was cancelled, even if the build job didn't run at all.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
In the test stage, we can use any of the two container images as we
arent going to do anything architecture-dependent when submitting the
jobs to LAVA.
But if we are in a pipeline in which the images need to be rebuilt and
one finishes much earlier than the other, it could happen that the test
job that executes first fails to find the container image.
To avoid that, have each job in the test stage to use the image that has
been already implicitly built by depending on the build job for the
given arch.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
In preparation for testing drivers other than Panfrost in LAVA labs.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>