Implement a log-based retry hint for R8152 issue described in #6681,
which is based on detecting these two consecutive lines:
```
r8152 <USB> eth0: Tx status -71
nfs: server <IP> not responding, still trying
```
Where <IP> and <USB> could be any IP and USB addresses, respectfully.
This commit is a temporary fix since it requires a section-aware log
follower, implemented in !16323. When the cited MR is merged, one will
make a proper fix on top of that.
Closes: #6681
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>
Rarely the jobs.logs RPC call can return corrupted data, such as
mal-formed YAML data. As this is expected and very rare to occur, let's
retry this RPC call several times to give it a chance to fix itself.
Retrying would not swallow the log lines since we keep track of how many
log lines each job has.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
Move exceptions to its own file.
Create MesaCITimeoutError and MesaCIRetryError with specific exception
data for better exception classification.
Avoid the use of `fatal_err` in favor of raising a proper exception.
Make _call_proxy exception handling exhaustive, add missing
ResponseError treatment.
Also, detect JobError during job result parsing. So when a LAVA timeout error
happens, it is probably cause by some boot/network issues with a
specific device, we can retry the same job in other device with the same
device_type.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>