Commit Graph

32 Commits

Author SHA1 Message Date
Guilherme Gallo 201b0b6d29 ci/lava: Retry when data fetching log RPC call is corrupted
Rarely the jobs.logs RPC call can return corrupted data, such as
mal-formed YAML data. As this is expected and very rare to occur, let's
retry this RPC call several times to give it a chance to fix itself.

Retrying would not swallow the log lines since we keep track of how many
log lines each job has.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 4ffd21ca70 ci/lava: Improve exception handling
Move exceptions to its own file.
Create MesaCITimeoutError and MesaCIRetryError with specific exception
data for better exception classification.
Avoid the use of `fatal_err` in favor of raising a proper exception.
Make _call_proxy exception handling exhaustive, add missing
ResponseError treatment.

Also, detect JobError during job result parsing. So when a LAVA timeout error
happens, it is probably cause by some boot/network issues with a
specific device, we can retry the same job in other device with the same
device_type.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 5fc333d0b6 ci/lava: Cancel the job if the script is interrupted
During development, we may want to test lava_job_submitter.py locally,
sometimes one can find what one is been looking for before the LAV job
is done. Let's respond to SIGINT signal and cancel the LAVA job, as we
can't follow what is happening anymore via script.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo c64e3d92df ci/lava: Reduce LAVA boot phase timeout to 3 minutes
A normal boot on LAVA device would take less than 1 minute. Sometimes
the depthcharge freezes and LAVA waits until the current timeout (25
minutes) to kick in and fail the job.
With this lower timeout value, the job could fail faster and eventually
LAVA will be able to retry it.

Furthermore, set a default timeout for all actions to fix an undesired
behavior with some LAVA job subactions, such as depthcharge-action.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 9666ab7172 ci/lava: Let LAVA job submitter run without JWT file
Make jwt-file argument optional, this means that this submitter could
deal with more generic use cases, such as running it locally, since the
MINIO JWT is a sensitive information, which is not really required to
test if the submitter is working well.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 18d80f25ee ci/lava: Parse all test cases from 0_mesa suite
LAVA can filter which test suite to show the results from, let's list
all testcases possible in the mesa test suite, to be able to divide more
complex jobs into test_cases.
Another advantage is that the test case can vary its name.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 09236d9607 ci/lava: Use lava-test-case to run custom scripts in LAVA
The exit code is automatically parsed to fail/pass the job, so this
commit removes the `hwci.*pass|fail` regex and printings.

Add mesa-job-name parameter to get the CI_JOB_NAME easily to serve as
test name.

Besides, there is the treatment for the mesa job naeme, as LAVA does not
like whitespace character inside the test case/suite name, since it
interprets it as a LAVA signal parameter and it can make the job fail
when the script looks for the results from the LAVA RPC.

And the slash character seems to break gitlab log sectioning, so
removing every character after the first whitespace.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 33a1c51e3e ci/lava: Always validate the lava job
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 805de830c9 ci/lava: Set lava-signal to kmsg
By default, LAVA emit signals as specific lines of message to the
console serial for subsequent parsing. However, this is prone to be
interleaved with the dmesg messages, particularly with debug messages
that can happen just after the process exit, such as the ones set by
CONFIG_STACK_DEBUG_USAGE Kconfig.

Setting the `lava-signal` to `kmsg` will make those special messages to
be dumped to /dev/kmsg, where the each line is printed atomically

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 75410c3d76 ci/lava: Fix LAVA job validation
When jobs.validate returns something, it means that there is a dict
filled with the error message, so we were running the job with some
validation errors for a quite while

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 43d8ed840e ci/lava: Filter log lines from LAVA return
Start to differentiate between the different types of LAVA log message;
the only visible change right now is that we make warnings and errors
bold and red to match bare-metal, but it comes in useful later as we
will use the results markers to watch us step through the different
stages of job execution.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 84a5ea4228 ci/lava: Encapsulate job data in a class
Less free-form passing stuff around, and also makes it easier to
implement log-based following in future.

The new class has:
- job log polling: This allows us to get rid of some more function-local
  state; the job now contains where we are, and the timeout etc is
  localised within the thing polling it.
- has-started detection into job class
- heartbeat logic to update the job instance state with the start time
  when the submitter begins to track the logs from the LAVA device

Besides:

- Split LAVA jobs and Mesa CI policy
- Update unit tests with LAVAJob class

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Daniel Stone b3ba448ba5 ci/lava: Sleep before, not after, API calls
We rate-limit LAVA API calls as they are standard polling calls rather
than blocking for changes. However when we sleep after making the calls
rather than before, we can block when we want to exit - e.g. after
getting the final logs, we will still sleep even though we can drop out.

Fix this by moving the calls to before the API calls, rather than after.
This means that the first calls (when we're waiting to be scheduled, or
haven't got our first log lines yet), will be delayed compared to
previously, but that's not going to slow us down as even in the best
case we won't be executing in a device within the first 15 seconds.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Tomeu Vizoso b46000f076 ci: Allow specifying a different kernel in LAVA jobs
To make it possible to use a kernel different from that built along with
the rootfs.

This can make it more convenient for other projects to reuse these
scripts.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15891>
2022-04-13 07:34:36 +00:00
Tomeu Vizoso 30be6788cc ci: Disable Link Power Management with RTL8153
There are reliability problems with the RTL8153 ethernet driver under
certain network loads, related to incompatibility of the device with
Link Power Management.

Add usbcore.quirks=0bda:8153:k to the kernel command line to enable the
USB_QUIRK_NO_LPM option.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15791>
2022-04-08 14:07:31 +00:00
Emma Anholt ad0fcaf1ee ci/lava: Simplify passthrough of the request to upload results/ to minio.
We already have a way to pass env vars around, just use that instead of
packing/unpacking it on the kernel command line.

Cleans up HW runner job log output some more.

Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15332>
2022-04-05 21:37:46 +00:00
Guilherme Gallo addac10443 ci: Make LAVA jobs fail CI job when retry is exhausted
When the lava_job_submitter.py retry loop finishes normally (without
falling through break-loop) it means that the submitter has exceeded the
retry count limit. However, when it happens the script
finishes normally. This patch adds a treatment to this case, warning the
user what happened and forcing the job to fail.

Moreover, this commit will make retry configurations configurable by
CI job, as it can take the default value from the following variables:

- LAVA_DEVICE_HANGING_TIMEOUT_SEC
- LAVA_WAIT_FOR_DEVICE_POLLING_TIME_SEC
- LAVA_LOG_POLLING_TIME_SEC
- LAVA_NUMBER_OF_RETRIES_TIMEOUT_DETECTION

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
2022-02-16 23:32:39 +00:00
Guilherme Gallo dabc068e6c ci: Use ci-fairy minio login via token file
For every CI job, put JWT content into a file and unset CI_JOB_JWT
environment var
=======

* virgl jobs:
	- Share JWT token file to crosvm instance
	- Keep using `export -p` due to high complexity in the scripts
	  of these jobs. At least, the CI_JOB_JWT will not be leaked,
	  since it is being unset at the `before_script` phase of each
	  Mesa CI job.

* iris jobs: Update lava_job_submitter to take token file as argument
	- generate-env with CI_JOB_JWT_TOKEN_FILE
	- create token file during baremetal init stage

* baremetal jobs: Copy token file to bare-metal NFS

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14004>
2021-12-02 18:01:29 +00:00
Tomeu Vizoso 83a0bb007f ci: Let manual LAVA jobs have a longer timeout than others
So far only LAVA jobs make use of it, but I guess baremetal could be
extended to have these timeouts as well.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13441>
2021-11-02 10:51:54 +00:00
Guilherme Gallo 7244aa1980 gitlab-ci: refactor timeout constants and tweak timeout values
* Refactor timeouts and retry attempts constants to variables in the top
  of the python script.

* Increase LAVA job timeout value from 1 minute to 5 minutes, since the
  timeout detection is just a heuristic based on the log silence in LAVA
  devices. If we keep 1 minute timeout, maybe we could cancel jobs that
  have tasks which may take too long to respond. Also, one minute
  timeout is prone to misdetect scenarios when some network errors or
  slowness may happen.

* Increase polling rate to check if the job has started from 1 check
  every 30 seconds to 1 check every 10 seconds. Since it was taking 30
  seconds in the worst case to start to get the log output from a LAVA
  job. It is important to note that some LAVA jobs take less than 2
  minutes to finish, so a 10 second wait would be more suitable in those
  cases.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12870>
2021-09-15 15:12:52 +00:00
Guilherme Gallo 460ce7e75d gitlab-ci: Implement a simple timeout detection for LAVA jobs
* Retry twice if the job does not generates logs for 5 minutes.
* Only active the timeout detection when the job starts.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12808>
2021-09-10 13:01:16 -03:00
Guilherme Gallo 2587fa1570 gitlab-ci: Add sleep for every `scheduler.jobs.logs` call
Add a time.sleep call between proxy.scheduler.jobs.logs calls, since
they do not block. This will relieve some request pressure on LAVA
dispatchers.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12797>
2021-09-10 12:34:26 +00:00
Erico Nunes 574bff9087 ci: enable CI for lima again
Enable CI for lima again on meson-gxl-s805x-libretech-ac boards
with Mali-450.
These boards are managed by a LAVA instance and so follow the LAVA CI
workflow in Mesa.
The goal is to have coverage for deqp-gles2, as lima is a GLES2-only
driver.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11789>
2021-08-17 11:22:59 +00:00
Daniel Stone a439db5844 ci/lava: Generate YAML from Python, not Jinja
This makes it much easier to use the common init scripts ... which we
also do here.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
2021-06-15 14:02:44 +02:00
Daniel Stone 759dcb482d ci/lava: Pass MinIO path on the command line
This brings us much closer with what bare-metal does, and also allows us
to upload job data to a local instance rather than the primary fd.o one.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>
2021-06-15 14:02:44 +02:00
Daniel Stone bbf5f412ab ci/lava: Disable stdout/stderr buffering
Frequency of writes is unlikely to be a performance bottleneck, and
given the number of steps in between execution and the user, less
buffering is gooder.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Stone 82631c7182 ci/lava: Add explicit fatal-error handler
Truth is relative in 2021, and Python's duck-typing means truthiness
isn't what you think it is. Use an explicit fatal-error handler to make
sure we crash out hard on failure, rather than hoping sys.exit() behaves
like you think it does, because it doesn't.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Stone 4082fe7ce2 ci/lava: Remove unused arguments
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Stone d0e5203855 ci/lava: Use per-job rootfs overlay for environment
Trying to get arbitrary strings suitably quoted for shell, embedded in a
YAML file, processed by Python templating, is like seven bad ideas all
embedded into one big can of bees.

Reuse the same script we use for bare-metal to generate the environment,
tar that up into a per-job overlay which is added to the
inter-pipeline-reusable rootfs built by the container jobs and the
intra-pipeline-reusable overlay built by the build jobs.

@anholt wrote a chunk of this - replacing the $ENV_VARS GitLab CI
variable with a Python loop across the POSIX job environment - in
!11192, but this still had YAML quoting nightmares, and was more
needless duplication between LAVA and bare-metal.

The diff is large and annoying, but is mostly a sed job to get
ENV_VARS="FOO=bar BAZ=quux" into FOO: bar\nBAZ: quux.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Stone 0fd2320c94 ci/lava: Clean up variable naming, document them
Our variable names haven't aged very well. Rename them to make them more
clear and straightforward, especially when we bring in a third rootfs
element to download.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Stone f3d69923a1 ci/lava: Pass JWT separately from environment variables
As the JWT is sensitive, we don't want to record or leak it anywhere.
Doing this lets us run --dump-yaml in normal execution so we can
artifact the result, as well as bringing us into line with bare-metal.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Daniel Stone 5793cefff8 ci/lava: Move LAVA files to lava/
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11309>
2021-06-11 12:13:00 +00:00
Renamed from .gitlab-ci/lava_job_submitter.py (Browse further)