mesa/.gitlab-ci/download-git-cache.sh

#!/usr/bin/env bash

set +e
set -o xtrace

# if we run this script outside of gitlab-ci for testing, ensure
# we got meaningful variables
CI_PROJECT_DIR=${CI_PROJECT_DIR:-$(mktemp -d)/$CI_PROJECT_NAME}

if [[ -e $CI_PROJECT_DIR/.git ]]
then
    echo "Repository already present, skip cache download"
    exit
fi

TMP_DIR=$(mktemp -d)

echo "$(date +"%F %T") Downloading archived master..."
if ! /usr/bin/wget \
	      -O "$TMP_DIR/$CI_PROJECT_NAME.tar.gz" \
              "https://${S3_HOST}/git-cache/${FDO_UPSTREAM_REPO}/$CI_PROJECT_NAME.tar.gz";
then
    echo "Repository cache not available"
    exit
fi

set -e

rm -rf "$CI_PROJECT_DIR"
echo "$(date +"%F %T") Extracting tarball into '$CI_PROJECT_DIR'..."
mkdir -p "$CI_PROJECT_DIR"
tar xzf "$TMP_DIR/$CI_PROJECT_NAME.tar.gz" -C "$CI_PROJECT_DIR"
rm -rf "$TMP_DIR"
chmod a+w "$CI_PROJECT_DIR"

echo "$(date +"%F %T") Git cache download done"
ci: enable shellcheck on whole .gitlab-ci Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21977> 2023-05-20 23:47:05 +01:00			`#!/usr/bin/env bash`
CI: reduce bandwidth for git pull Over the last 7 days, git pulls represented a total of 1.7 TB. On those 1.7 TB, we can see: - ~300 GB for the CI farm on hetzner - ~730 GB for the CI farm on packet.net - ~680 GB for the rest of the world We can not really change the rest of the world, but we can certainly reduce the egress costs towards our CI farms. Right now, the gitlab runners are not doing a good job at caching the git trees for the various jobs we make, and we end up with a lot of cache-misses. A typical pipeline ends up with a good 2.8GB of git pull data. (a compressed archive of the mesa folder accounts for 280MB) In this patch, we implemented what was suggested in https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576 - we host a brand new MinIO server on packet - jobs can upload files on 2 locations: git-cache/<namespace>/<project>/<branch-name>.tar.gz * artifacts/<namespace>/<project>/<pipeline-id>/ - the authorization is handled by gitlab with short tokens valid only for the time of the job is running - whenever a job runs, the runner are configured to execute (eval) $CI_PRE_CLONE_SCRIPT - this variable is set globally to download the current cache from the MinIO packet server, unpack it and replace the possibly out of date cache found on the runner - then git fetch is run by the runner, and only the delta between the upstream tree and the local tree gets pulled. We can rebuild the git cache in a schedule job (once a day seems sufficient), and then we can stop the cache miss entirely. First results showed that instead of pulling 280MB of data in my fork, I got a pull of only 250KB. That should help us. * arguably, there are other farms in the rest of the world, so hopefully we can change those too. Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428> 2020-06-11 16:16:28 +01:00
			`set +e`
			`set -o xtrace`

			`# if we run this script outside of gitlab-ci for testing, ensure`
			`# we got meaningful variables`
ci: Use CI_PROJECT_NAME instead of hardcoding 'mesa' This can make it more convenient for other projects to reuse these scripts. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15891> 2022-03-17 14:09:18 +00:00			`CI_PROJECT_DIR=${CI_PROJECT_DIR:-$(mktemp -d)/$CI_PROJECT_NAME}`
CI: reduce bandwidth for git pull Over the last 7 days, git pulls represented a total of 1.7 TB. On those 1.7 TB, we can see: - ~300 GB for the CI farm on hetzner - ~730 GB for the CI farm on packet.net - ~680 GB for the rest of the world We can not really change the rest of the world, but we can certainly reduce the egress costs towards our CI farms. Right now, the gitlab runners are not doing a good job at caching the git trees for the various jobs we make, and we end up with a lot of cache-misses. A typical pipeline ends up with a good 2.8GB of git pull data. (a compressed archive of the mesa folder accounts for 280MB) In this patch, we implemented what was suggested in https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576 - we host a brand new MinIO server on packet - jobs can upload files on 2 locations: git-cache/<namespace>/<project>/<branch-name>.tar.gz * artifacts/<namespace>/<project>/<pipeline-id>/ - the authorization is handled by gitlab with short tokens valid only for the time of the job is running - whenever a job runs, the runner are configured to execute (eval) $CI_PRE_CLONE_SCRIPT - this variable is set globally to download the current cache from the MinIO packet server, unpack it and replace the possibly out of date cache found on the runner - then git fetch is run by the runner, and only the delta between the upstream tree and the local tree gets pulled. We can rebuild the git cache in a schedule job (once a day seems sufficient), and then we can stop the cache miss entirely. First results showed that instead of pulling 280MB of data in my fork, I got a pull of only 250KB. That should help us. * arguably, there are other farms in the rest of the world, so hopefully we can change those too. Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428> 2020-06-11 16:16:28 +01:00
			`if [[ -e $CI_PROJECT_DIR/.git ]]`
			`then`
			`echo "Repository already present, skip cache download"`
			`exit`
			`fi`

			`TMP_DIR=$(mktemp -d)`

ci: include some timing information in the git cache download script In https://gitlab.freedesktop.org/zmike/mesa/-/jobs/46476087 it is unclear whether this script is where 30+ minutes were lost, or if this ran fine and the `git clone` afterwards is where the issue was. Printing these timestamps will help next time something like this happens. Signed-off-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24452> 2023-08-02 16:48:53 +01:00			`echo "$(date +"%F %T") Downloading archived master..."`
ci: enable shellcheck on whole .gitlab-ci Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21977> 2023-05-20 23:47:05 +01:00			`if ! /usr/bin/wget \`
ci: revert download of git cache to the wget At this point of CI there is not curl available. Fixes: 796686af1b37 ("ci: migrate from wget to curl") Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21414> 2023-02-20 01:32:27 +00:00			`-O "$TMP_DIR/$CI_PROJECT_NAME.tar.gz" \`
ci: rename MINIO_HOST variable to S3_HOST Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com> Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23527> 2023-06-08 16:38:22 +01:00			`"https://${S3_HOST}/git-cache/${FDO_UPSTREAM_REPO}/$CI_PROJECT_NAME.tar.gz";`
CI: reduce bandwidth for git pull Over the last 7 days, git pulls represented a total of 1.7 TB. On those 1.7 TB, we can see: - ~300 GB for the CI farm on hetzner - ~730 GB for the CI farm on packet.net - ~680 GB for the rest of the world We can not really change the rest of the world, but we can certainly reduce the egress costs towards our CI farms. Right now, the gitlab runners are not doing a good job at caching the git trees for the various jobs we make, and we end up with a lot of cache-misses. A typical pipeline ends up with a good 2.8GB of git pull data. (a compressed archive of the mesa folder accounts for 280MB) In this patch, we implemented what was suggested in https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576 - we host a brand new MinIO server on packet - jobs can upload files on 2 locations: git-cache/<namespace>/<project>/<branch-name>.tar.gz * artifacts/<namespace>/<project>/<pipeline-id>/ - the authorization is handled by gitlab with short tokens valid only for the time of the job is running - whenever a job runs, the runner are configured to execute (eval) $CI_PRE_CLONE_SCRIPT - this variable is set globally to download the current cache from the MinIO packet server, unpack it and replace the possibly out of date cache found on the runner - then git fetch is run by the runner, and only the delta between the upstream tree and the local tree gets pulled. We can rebuild the git cache in a schedule job (once a day seems sufficient), and then we can stop the cache miss entirely. First results showed that instead of pulling 280MB of data in my fork, I got a pull of only 250KB. That should help us. * arguably, there are other farms in the rest of the world, so hopefully we can change those too. Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428> 2020-06-11 16:16:28 +01:00			`then`
			`echo "Repository cache not available"`
			`exit`
			`fi`

			`set -e`

			`rm -rf "$CI_PROJECT_DIR"`
ci: include some timing information in the git cache download script In https://gitlab.freedesktop.org/zmike/mesa/-/jobs/46476087 it is unclear whether this script is where 30+ minutes were lost, or if this ran fine and the `git clone` afterwards is where the issue was. Printing these timestamps will help next time something like this happens. Signed-off-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24452> 2023-08-02 16:48:53 +01:00			`echo "$(date +"%F %T") Extracting tarball into '$CI_PROJECT_DIR'..."`
CI: reduce bandwidth for git pull Over the last 7 days, git pulls represented a total of 1.7 TB. On those 1.7 TB, we can see: - ~300 GB for the CI farm on hetzner - ~730 GB for the CI farm on packet.net - ~680 GB for the rest of the world We can not really change the rest of the world, but we can certainly reduce the egress costs towards our CI farms. Right now, the gitlab runners are not doing a good job at caching the git trees for the various jobs we make, and we end up with a lot of cache-misses. A typical pipeline ends up with a good 2.8GB of git pull data. (a compressed archive of the mesa folder accounts for 280MB) In this patch, we implemented what was suggested in https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576 - we host a brand new MinIO server on packet - jobs can upload files on 2 locations: git-cache/<namespace>/<project>/<branch-name>.tar.gz * artifacts/<namespace>/<project>/<pipeline-id>/ - the authorization is handled by gitlab with short tokens valid only for the time of the job is running - whenever a job runs, the runner are configured to execute (eval) $CI_PRE_CLONE_SCRIPT - this variable is set globally to download the current cache from the MinIO packet server, unpack it and replace the possibly out of date cache found on the runner - then git fetch is run by the runner, and only the delta between the upstream tree and the local tree gets pulled. We can rebuild the git cache in a schedule job (once a day seems sufficient), and then we can stop the cache miss entirely. First results showed that instead of pulling 280MB of data in my fork, I got a pull of only 250KB. That should help us. * arguably, there are other farms in the rest of the world, so hopefully we can change those too. Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428> 2020-06-11 16:16:28 +01:00			`mkdir -p "$CI_PROJECT_DIR"`
ci: Use CI_PROJECT_NAME instead of hardcoding 'mesa' This can make it more convenient for other projects to reuse these scripts. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15891> 2022-03-17 14:09:18 +00:00			`tar xzf "$TMP_DIR/$CI_PROJECT_NAME.tar.gz" -C "$CI_PROJECT_DIR"`
CI: reduce bandwidth for git pull Over the last 7 days, git pulls represented a total of 1.7 TB. On those 1.7 TB, we can see: - ~300 GB for the CI farm on hetzner - ~730 GB for the CI farm on packet.net - ~680 GB for the rest of the world We can not really change the rest of the world, but we can certainly reduce the egress costs towards our CI farms. Right now, the gitlab runners are not doing a good job at caching the git trees for the various jobs we make, and we end up with a lot of cache-misses. A typical pipeline ends up with a good 2.8GB of git pull data. (a compressed archive of the mesa folder accounts for 280MB) In this patch, we implemented what was suggested in https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576 - we host a brand new MinIO server on packet - jobs can upload files on 2 locations: git-cache/<namespace>/<project>/<branch-name>.tar.gz * artifacts/<namespace>/<project>/<pipeline-id>/ - the authorization is handled by gitlab with short tokens valid only for the time of the job is running - whenever a job runs, the runner are configured to execute (eval) $CI_PRE_CLONE_SCRIPT - this variable is set globally to download the current cache from the MinIO packet server, unpack it and replace the possibly out of date cache found on the runner - then git fetch is run by the runner, and only the delta between the upstream tree and the local tree gets pulled. We can rebuild the git cache in a schedule job (once a day seems sufficient), and then we can stop the cache miss entirely. First results showed that instead of pulling 280MB of data in my fork, I got a pull of only 250KB. That should help us. * arguably, there are other farms in the rest of the world, so hopefully we can change those too. Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428> 2020-06-11 16:16:28 +01:00			`rm -rf "$TMP_DIR"`
			`chmod a+w "$CI_PROJECT_DIR"`
ci: include some timing information in the git cache download script In https://gitlab.freedesktop.org/zmike/mesa/-/jobs/46476087 it is unclear whether this script is where 30+ minutes were lost, or if this ran fine and the `git clone` afterwards is where the issue was. Printing these timestamps will help next time something like this happens. Signed-off-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24452> 2023-08-02 16:48:53 +01:00
			`echo "$(date +"%F %T") Git cache download done"`