Commit Graph

22 Commits

Author SHA1 Message Date
Guilherme Gallo 4783e55039 ci/lava: Add `slow` pytest marker
Mark test_full_yaml_log with this new marker to be easily run by the
developers.
Make `debian-testing` skip this test with `not slow` marker hint.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
2022-07-08 12:26:05 +00:00
Guilherme Gallo 2c51b7a9c9 ci/lava: Detect R8152 issues preemptively and retry
Implement a log-based retry hint for R8152 issue described in #6681,
which is based on detecting these two consecutive lines:

```
r8152 <USB> eth0: Tx status -71
nfs: server <IP> not responding, still trying
```

Where <IP> and <USB> could be any IP and USB addresses, respectfully.

This commit is a temporary fix since it requires a section-aware log
follower, implemented in !16323. When the cited MR is merged, one will
make a proper fix on top of that.

Closes: #6681

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
2022-07-08 12:26:05 +00:00
Guilherme Gallo 45a4b01427 ci/lava: Split lava_log into modules
This script is getting too big, it been hard to extend it.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
2022-07-08 12:26:05 +00:00
Guilherme Gallo 20827dfa9b ci/lava: Update license header
Use SPDX to indicate the license.
Update authors of lava_job_submitter.py

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo 6ba2b33a8c ci/lava: Flexibilize section marker regexes
In some jobs, such as
https://gitlab.freedesktop.org/gallo/mesa/-/jobs/24904100, the kmsg is
interleaved with stderr/stdout in serial console, making it difficult to
confidently find the log messages to detect when the DUT is booting,
when the DUT is running etc.

Luckily, LAVA sends redundant messages about their signals. We can use
them to mitigate the chance of missing an interleaved message by being
more open to different messages, using the regex on both `debug` and
`target` LAVA log levels.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo 24f368d652 ci/lava: Stop printing after the result line
There are some leftovers in the jobs logs after the result log line.
Only print until the init-stage2.sh output, to raise the chance to check
for the test script results at the first glance in the Gitlab logs.

Extra changes:
- Add `hung` status for jobs considered hanging in the Gitlab
- print them after the retry loop

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo 29af421272 ci/lava: Don't print LAVA debug messages
Remove debug messages from the output in order to unclutter the log a
little more for the developers.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo 466917ea4c ci/lava: Add an integration test for LAVA jobs
test_full_yaml_log is a test that will look for a LAVA log YAML file at
`/tmp/log.yaml` and consume it as it was a realtime CI job.
It is useful for debugging issues related with LAVA.

Let's keep it skipped by default, to avoid introducing entire logs into
the codebase.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo aa26a6ab72 ci/lava: Follow job execution via LogFollower
Now LogFollower is used to deal with the LAVA logs.

Moreover, this commit adds timeouts per Gitlab section, if a section
takes longer than expected, cancel the job and retry again.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo 2569d7d7df ci/lava: Create LogFollower and move logging methods
- Create LogFollower to capture LAVA log and process it adding some
- GitlabSection and color treatment to it
- Break logs further, make new gitlab sections between testcases
- Implement LogFollower as ContextManager to deal with incomplete LAVA
  jobs.
- Use template method to simplify gitlab log sections management
- Fix sections timestamps

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo c86ba3640f ci/lava: Create Gitlab log sections handler
Gitlab has support for collapsible sections, so it would be good to
create collapsed log sections for the LAVA setup logs. This way, the
Mesa developers to see only the execution of the scripts, instead of
LAVA messages clutter.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16323>
2022-07-07 00:28:53 +00:00
Guilherme Gallo 3b8d10d270 ci/lava: Improve result parsing regex
LAVA job logs have an ongoing problem of message interleaving with kmsg.
So any kernel dumps and LAVA signals (which are being printed in kmsg)
will have a chance to clutter the pattern matching for `hwci: mesa:
(pass|fail)` line.

v2:
- Add an 1 second sleep before exiting the test script, to give enough
  time to print the result message without conflicting with LAVA ENDTC
  signal from kmsg

Closes: #6714

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17175>
2022-06-28 22:51:45 +00:00
Guilherme Gallo cee1c4fc7f ci/lava: Filter out undesired messages
Some LAVA jobs emit lots of messages "Listened to connection for
namespace 'common' for up to 1s" in a row at the end of the logs, making
difficult to see the result of the test script.

This commit removes those lines until a proper solution is deployed on
the LAVA side.

Closes: #6116

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17151>
2022-06-22 01:48:16 +00:00
Guilherme Gallo 75973e3a1c ci/lava: Add support for more complex color codes
Currently, the LAVA job submitter is employing a temporary solution for
the bash escape code mangling in the LAVA jobs. Until the issue is not
fixed on the LAVA side, the submitter will replace the wrong characters
with the fixed ones.

This commit improves the regex pattern to comprehend the scenarios of
color codes with font formatting and background color information, such
as: `echo -e "\e[1;41;39mRed background with white bold text color\e[0m"`

Fixes: #5503

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17046>
2022-06-15 19:10:09 +00:00
Guilherme Gallo ee2278de65 ci/lava: Fix Gitlab Section markers
LAVA is mangling the escape codes from ANSI during log fetching from the
target device, making the gitlab section markers from deqp, for example,
to not work, inputting noise into the log.

This commit makes the simplest fix which is to replace the mangled
characters to the fixed ones.

This approach is error-prone, since it may unwittingly replace a genuine
log that resembles the mangled escape code. But this solution should
suffice until we get a proper fix from LAVA team itself.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16520>
2022-05-23 16:51:47 +00:00
Guilherme Gallo e00281f6da ci/lava: Fix colored LAVA outputs
LAVA is mangling the escape codes from ANSI during log fetching from the
target device, making the colored lines from deqp, for example, to not
work, inputting noise into the log.

This commit makes the most straightforward fix which is to replace the
mangled characters to the fixed ones.

This approach is error-prone since it may unwittingly replace a genuine
log that resembles the mangled escape code. But this solution should
suffice until we get a proper fix from LAVA developers itself.

Fixes: #5503

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16520>
2022-05-23 16:51:47 +00:00
Guilherme Gallo 0ff3517fb7 ci/lava: Make job submitter parse the job result
Currently, the LAVA job submitter fetches the job results from the LAVA
XMLRPC call, but that is not necessary, as the job result is easily
found in the logs. E.g. the bare-metal and poe jobs uses that log to set
the final job status of their runs.

Another reason for the change is that the LAVA signals are not reliable
in some devices with one serial port, causing some troubles in a618
recently. So, if one signal fails to be sent/received, the job will
ultimately fail even when the hwci script has been successful.

Fixes: #6435

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16425>
2022-05-13 02:17:32 +00:00
Guilherme Gallo 201b0b6d29 ci/lava: Retry when data fetching log RPC call is corrupted
Rarely the jobs.logs RPC call can return corrupted data, such as
mal-formed YAML data. As this is expected and very rare to occur, let's
retry this RPC call several times to give it a chance to fix itself.

Retrying would not swallow the log lines since we keep track of how many
log lines each job has.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 4ffd21ca70 ci/lava: Improve exception handling
Move exceptions to its own file.
Create MesaCITimeoutError and MesaCIRetryError with specific exception
data for better exception classification.
Avoid the use of `fatal_err` in favor of raising a proper exception.
Make _call_proxy exception handling exhaustive, add missing
ResponseError treatment.

Also, detect JobError during job result parsing. So when a LAVA timeout error
happens, it is probably cause by some boot/network issues with a
specific device, we can retry the same job in other device with the same
device_type.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 18d80f25ee ci/lava: Parse all test cases from 0_mesa suite
LAVA can filter which test suite to show the results from, let's list
all testcases possible in the mesa test suite, to be able to divide more
complex jobs into test_cases.
Another advantage is that the test case can vary its name.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 84a5ea4228 ci/lava: Encapsulate job data in a class
Less free-form passing stuff around, and also makes it easier to
implement log-based following in future.

The new class has:
- job log polling: This allows us to get rid of some more function-local
  state; the job now contains where we are, and the timeout etc is
  localised within the thing polling it.
- has-started detection into job class
- heartbeat logic to update the job instance state with the start time
  when the submitter begins to track the logs from the LAVA device

Besides:

- Split LAVA jobs and Mesa CI policy
- Update unit tests with LAVAJob class

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
2022-04-28 06:33:46 +00:00
Guilherme Gallo 794009c9ee ci: Add unit tests for lava_job_submitter
These tests will explore some scenarios involving LAVA delays to submit
the job to the device, some device delays outputting data to LAVA
logs, and sensitive data protection.

For example, the subtests from test_retriable_follow_job, "timed out
more times than retry attempts" and "very long silence" caught a bug
where a job retried until the limited attempts and the CI job still
succeeded. https://gitlab.freedesktop.org/mesa/mesa/-/jobs/18325174

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14876>
2022-02-16 23:32:39 +00:00