mesa/.gitlab-ci/deqp-runner.sh

#!/bin/bash

set -ex

DEQP_OPTIONS=(--deqp-surface-width=256 --deqp-surface-height=256)
DEQP_OPTIONS+=(--deqp-surface-type=pbuffer)
DEQP_OPTIONS+=(--deqp-gl-config-name=rgba8888d24s8ms0)
DEQP_OPTIONS+=(--deqp-visibility=hidden)

# It would be nice to be able to enable the watchdog, so that hangs in a test
# don't need to wait the full hour for the run to time out.  However, some
# shaders end up taking long enough to compile
# (dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 for example)
# that they'll sporadically trigger the watchdog.
#DEQP_OPTIONS+=(--deqp-watchdog=enable)

if [ -z "$DEQP_VER" ]; then
   echo 'DEQP_VER must be set to something like "gles2" or "gles31" for the test run'
   exit 1
fi

if [ -z "$DEQP_SKIPS" ]; then
   echo 'DEQP_SKIPS must be set to something like "deqp-default-skips.txt"'
   exit 1
fi

ARTIFACTS=`pwd`/artifacts

# Set up the driver environment.
export LD_LIBRARY_PATH=`pwd`/install/lib/
export EGL_PLATFORM=surfaceless

# the runner was failing to look for libkms in /usr/local/lib for some reason
# I never figured out.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

RESULTS=`pwd`/results
mkdir -p $RESULTS

# Generate test case list file
cp /deqp/mustpass/$DEQP_VER-master.txt /tmp/case-list.txt

# If the job is parallel, take the corresponding fraction of the caselist.
# Note: N~M is a gnu sed extension to match every nth line (first line is #1).
if [ -n "$CI_NODE_INDEX" ]; then
   sed -ni $CI_NODE_INDEX~$CI_NODE_TOTAL"p" /tmp/case-list.txt
fi

if [ ! -s /tmp/case-list.txt ]; then
    echo "Caselist generation failed"
    exit 1
fi

if [ -n "$DEQP_EXPECTED_FAILS" ]; then
    XFAIL="--xfail-list $ARTIFACTS/$DEQP_EXPECTED_FAILS"
fi

set +e

vulkan-cts-runner \
    --deqp /deqp/modules/$DEQP_VER/deqp-$DEQP_VER \
    --output $RESULTS/cts-runner-results.txt \
    --caselist /tmp/case-list.txt \
    --exclude-list $ARTIFACTS/$DEQP_SKIPS \
    $XFAIL \
    --job ${DEQP_PARALLEL:-1} \
    -- \
    "${DEQP_OPTIONS[@]}"
DEQP_EXITCODE=$?

if [ $DEQP_EXITCODE -ne 0 ]; then
    echo "Some unexpected results found (see cts-runner-results.txt in artifacts for full results):"
    cat $RESULTS/cts-runner-results.txt | \
        grep -v ",Pass" | \
        grep -v ",Skip" | \
        grep -v ",ExpectedFail" | \
        head -n 50
    exit $DEQP_EXITCODE
fi
gitlab-ci: Run the GLES2 CTS on llvmpipe. This is the start of doing CTS tests on merges to Mesa master. We use the surfaceless platform so that we don't need to bother bringing up weston or X11. The surface size is kept low to reduce runtime, but this comes at the cost of many rendering tests skipping due to too-small render targets (as we see the impact of Mesa on the shared runner pool, we can reevaluate this and what set of CTS tests we want to run). We split the job up across 4 runners (each at 4 llvmpipe threads), so that the job can load-balance across our shared runners and finish sooner (since dEQP is very single-thread-performance bound). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> 2019-06-29 00:35:32 +01:00			`#!/bin/bash`

			`set -ex`

			`DEQP_OPTIONS=(--deqp-surface-width=256 --deqp-surface-height=256)`
			`DEQP_OPTIONS+=(--deqp-surface-type=pbuffer)`
			`DEQP_OPTIONS+=(--deqp-gl-config-name=rgba8888d24s8ms0)`
			`DEQP_OPTIONS+=(--deqp-visibility=hidden)`

gitlab-ci: Disable dEQP's watchdog timer. A handful of tests on freedreno have been close to the watchdog timeout, and now sporadically fail since range analysis has slowed down the compiler for them. Acked-by: Rob Clark <robdclark@chromium.org> Acked-by: Kenneth Graunke <kenneth@whitecape.org> 2019-09-03 23:52:33 +01:00			`# It would be nice to be able to enable the watchdog, so that hangs in a test`
			`# don't need to wait the full hour for the run to time out. However, some`
			`# shaders end up taking long enough to compile`
			`# (dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 for example)`
			`# that they'll sporadically trigger the watchdog.`
			`#DEQP_OPTIONS+=(--deqp-watchdog=enable)`

gitlab-ci: Run the GLES2 CTS on llvmpipe. This is the start of doing CTS tests on merges to Mesa master. We use the surfaceless platform so that we don't need to bother bringing up weston or X11. The surface size is kept low to reduce runtime, but this comes at the cost of many rendering tests skipping due to too-small render targets (as we see the impact of Mesa on the shared runner pool, we can reevaluate this and what set of CTS tests we want to run). We split the job up across 4 runners (each at 4 llvmpipe threads), so that the job can load-balance across our shared runners and finish sooner (since dEQP is very single-thread-performance bound). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> 2019-06-29 00:35:32 +01:00			`if [ -z "$DEQP_VER" ]; then`
			`echo 'DEQP_VER must be set to something like "gles2" or "gles31" for the test run'`
			`exit 1`
			`fi`

			`if [ -z "$DEQP_SKIPS" ]; then`
			`echo 'DEQP_SKIPS must be set to something like "deqp-default-skips.txt"'`
			`exit 1`
			`fi`

ci: Use cts_runner for our dEQP runs. This runner is a little project by Bas, written in C++, that spawns threads that then loop grabbing chunks of the (randomly shuffled but consistently so) test list and hand it to a dEQP instance. As the remaining list gets shorter, so do the chunks, so hopefully the threads all complete effectively at once. It also handles restarting after crashes automatically. I've extended the runner a bit to do what I was doing in the bash scripts before, like the skip list and expected failures handling. This project should also be a good baseline for extending to handle retesting of intermittent failures. By switching to it, we can have the swrast tests just take up one job slot on the shared runners and keep their allotment of CPUs busy, instead of taking up job slots with single-threaded dEQP jobs. It will also let us (eventually, once I reprovision) switch the freedreno runners over to threading within the job instead of running concurrent jobs, so that memory scribbles in one pipeline don't affect unrelated pipelines, and I can experiment with their parallelism (particularly on a306 where we are frequently backed up) without trashing other people's jobs. What we lose in this process is per-test output in the log (not a big loss, I think, since we summarize fails at the end and reducing log length keeps chrome from choking on our logs so badly). We also drop the renderer sanity checking, since it's not saving qpa files for us to go poke through. Given that all the drivers involved have fail lists, if we got the wrong renderer somehow, we'd get a job failure anyway. v2: Rebase on droppong of the autoscale cluster and the arm64 build/test split. Use a script to deduplicate the cts-runner build. v3: Rebase on the amd64 build/test container split. Acked-by: Daniel Stone <daniels@collabora.com> (v1) Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2) 2019-11-04 18:54:41 +00:00			ARTIFACTS=`pwd`/artifacts
gitlab-ci: Run the GLES2 CTS on llvmpipe. This is the start of doing CTS tests on merges to Mesa master. We use the surfaceless platform so that we don't need to bother bringing up weston or X11. The surface size is kept low to reduce runtime, but this comes at the cost of many rendering tests skipping due to too-small render targets (as we see the impact of Mesa on the shared runner pool, we can reevaluate this and what set of CTS tests we want to run). We split the job up across 4 runners (each at 4 llvmpipe threads), so that the job can load-balance across our shared runners and finish sooner (since dEQP is very single-thread-performance bound). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> 2019-06-29 00:35:32 +01:00
			`# Set up the driver environment.`
			export LD_LIBRARY_PATH=`pwd`/install/lib/
			`export EGL_PLATFORM=surfaceless`

			`# the runner was failing to look for libkms in /usr/local/lib for some reason`
			`# I never figured out.`
			`export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib`

			RESULTS=`pwd`/results
			`mkdir -p $RESULTS`

			`# Generate test case list file`
			`cp /deqp/mustpass/$DEQP_VER-master.txt /tmp/case-list.txt`

			`# If the job is parallel, take the corresponding fraction of the caselist.`
			`# Note: N~M is a gnu sed extension to match every nth line (first line is #1).`
			`if [ -n "$CI_NODE_INDEX" ]; then`
			`sed -ni $CI_NODE_INDEX~$CI_NODE_TOTAL"p" /tmp/case-list.txt`
			`fi`

			`if [ ! -s /tmp/case-list.txt ]; then`
			`echo "Caselist generation failed"`
			`exit 1`
			`fi`

ci: Use cts_runner for our dEQP runs. This runner is a little project by Bas, written in C++, that spawns threads that then loop grabbing chunks of the (randomly shuffled but consistently so) test list and hand it to a dEQP instance. As the remaining list gets shorter, so do the chunks, so hopefully the threads all complete effectively at once. It also handles restarting after crashes automatically. I've extended the runner a bit to do what I was doing in the bash scripts before, like the skip list and expected failures handling. This project should also be a good baseline for extending to handle retesting of intermittent failures. By switching to it, we can have the swrast tests just take up one job slot on the shared runners and keep their allotment of CPUs busy, instead of taking up job slots with single-threaded dEQP jobs. It will also let us (eventually, once I reprovision) switch the freedreno runners over to threading within the job instead of running concurrent jobs, so that memory scribbles in one pipeline don't affect unrelated pipelines, and I can experiment with their parallelism (particularly on a306 where we are frequently backed up) without trashing other people's jobs. What we lose in this process is per-test output in the log (not a big loss, I think, since we summarize fails at the end and reducing log length keeps chrome from choking on our logs so badly). We also drop the renderer sanity checking, since it's not saving qpa files for us to go poke through. Given that all the drivers involved have fail lists, if we got the wrong renderer somehow, we'd get a job failure anyway. v2: Rebase on droppong of the autoscale cluster and the arm64 build/test split. Use a script to deduplicate the cts-runner build. v3: Rebase on the amd64 build/test container split. Acked-by: Daniel Stone <daniels@collabora.com> (v1) Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2) 2019-11-04 18:54:41 +00:00			`if [ -n "$DEQP_EXPECTED_FAILS" ]; then`
			`XFAIL="--xfail-list $ARTIFACTS/$DEQP_EXPECTED_FAILS"`
gitlab-ci: Run the GLES2 CTS on llvmpipe. This is the start of doing CTS tests on merges to Mesa master. We use the surfaceless platform so that we don't need to bother bringing up weston or X11. The surface size is kept low to reduce runtime, but this comes at the cost of many rendering tests skipping due to too-small render targets (as we see the impact of Mesa on the shared runner pool, we can reevaluate this and what set of CTS tests we want to run). We split the job up across 4 runners (each at 4 llvmpipe threads), so that the job can load-balance across our shared runners and finish sooner (since dEQP is very single-thread-performance bound). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> 2019-06-29 00:35:32 +01:00			`fi`

ci: Use cts_runner for our dEQP runs. This runner is a little project by Bas, written in C++, that spawns threads that then loop grabbing chunks of the (randomly shuffled but consistently so) test list and hand it to a dEQP instance. As the remaining list gets shorter, so do the chunks, so hopefully the threads all complete effectively at once. It also handles restarting after crashes automatically. I've extended the runner a bit to do what I was doing in the bash scripts before, like the skip list and expected failures handling. This project should also be a good baseline for extending to handle retesting of intermittent failures. By switching to it, we can have the swrast tests just take up one job slot on the shared runners and keep their allotment of CPUs busy, instead of taking up job slots with single-threaded dEQP jobs. It will also let us (eventually, once I reprovision) switch the freedreno runners over to threading within the job instead of running concurrent jobs, so that memory scribbles in one pipeline don't affect unrelated pipelines, and I can experiment with their parallelism (particularly on a306 where we are frequently backed up) without trashing other people's jobs. What we lose in this process is per-test output in the log (not a big loss, I think, since we summarize fails at the end and reducing log length keeps chrome from choking on our logs so badly). We also drop the renderer sanity checking, since it's not saving qpa files for us to go poke through. Given that all the drivers involved have fail lists, if we got the wrong renderer somehow, we'd get a job failure anyway. v2: Rebase on droppong of the autoscale cluster and the arm64 build/test split. Use a script to deduplicate the cts-runner build. v3: Rebase on the amd64 build/test container split. Acked-by: Daniel Stone <daniels@collabora.com> (v1) Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2) 2019-11-04 18:54:41 +00:00			`set +e`

			`vulkan-cts-runner \`
			`--deqp /deqp/modules/$DEQP_VER/deqp-$DEQP_VER \`
			`--output $RESULTS/cts-runner-results.txt \`
			`--caselist /tmp/case-list.txt \`
			`--exclude-list $ARTIFACTS/$DEQP_SKIPS \`
			`$XFAIL \`
			`--job ${DEQP_PARALLEL:-1} \`
			`-- \`
			`"${DEQP_OPTIONS[@]}"`
			`DEQP_EXITCODE=$?`
gitlab-ci: Log the driver version that got tested. Sometimes you just want confirmation that dEQP really picked up the driver we built you thought. This is not as good as one might like, because git isn't present in the cross-build image. Acked-by: Rob Clark <robdclark@chromium.org> Acked-by: Kenneth Graunke <kenneth@whitecape.org> 2019-08-26 20:57:16 +01:00
gitlab-ci: Run the GLES2 CTS on llvmpipe. This is the start of doing CTS tests on merges to Mesa master. We use the surfaceless platform so that we don't need to bother bringing up weston or X11. The surface size is kept low to reduce runtime, but this comes at the cost of many rendering tests skipping due to too-small render targets (as we see the impact of Mesa on the shared runner pool, we can reevaluate this and what set of CTS tests we want to run). We split the job up across 4 runners (each at 4 llvmpipe threads), so that the job can load-balance across our shared runners and finish sooner (since dEQP is very single-thread-performance bound). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> 2019-06-29 00:35:32 +01:00			`if [ $DEQP_EXITCODE -ne 0 ]; then`
ci: Use cts_runner for our dEQP runs. This runner is a little project by Bas, written in C++, that spawns threads that then loop grabbing chunks of the (randomly shuffled but consistently so) test list and hand it to a dEQP instance. As the remaining list gets shorter, so do the chunks, so hopefully the threads all complete effectively at once. It also handles restarting after crashes automatically. I've extended the runner a bit to do what I was doing in the bash scripts before, like the skip list and expected failures handling. This project should also be a good baseline for extending to handle retesting of intermittent failures. By switching to it, we can have the swrast tests just take up one job slot on the shared runners and keep their allotment of CPUs busy, instead of taking up job slots with single-threaded dEQP jobs. It will also let us (eventually, once I reprovision) switch the freedreno runners over to threading within the job instead of running concurrent jobs, so that memory scribbles in one pipeline don't affect unrelated pipelines, and I can experiment with their parallelism (particularly on a306 where we are frequently backed up) without trashing other people's jobs. What we lose in this process is per-test output in the log (not a big loss, I think, since we summarize fails at the end and reducing log length keeps chrome from choking on our logs so badly). We also drop the renderer sanity checking, since it's not saving qpa files for us to go poke through. Given that all the drivers involved have fail lists, if we got the wrong renderer somehow, we'd get a job failure anyway. v2: Rebase on droppong of the autoscale cluster and the arm64 build/test split. Use a script to deduplicate the cts-runner build. v3: Rebase on the amd64 build/test container split. Acked-by: Daniel Stone <daniels@collabora.com> (v1) Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2) 2019-11-04 18:54:41 +00:00			`echo "Some unexpected results found (see cts-runner-results.txt in artifacts for full results):"`
			`cat $RESULTS/cts-runner-results.txt \| \`
			`grep -v ",Pass" \| \`
			`grep -v ",Skip" \| \`
			`grep -v ",ExpectedFail" \| \`
			`head -n 50`
			`exit $DEQP_EXITCODE`
gitlab-ci: Make the test job fail when bugs are unexpectedly fixed. If people fix bugs without updating the expected-fails list, then we end up with a lack of coverage of those failures in the future. Also, some day down the line another developer ends up trying to figure out if the bug was actually fixed or their environment is just failing to reproduce it. Suggested-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> 2019-09-12 20:34:50 +01:00			`fi`