I have a gitlab job that downloads a bunch of dependencies and stuffs them in a cache (if necessary), then I have a bunch of jobs that use that cache. I notice at the end of the consuming jobs, they spend a bunch of time creating a new cache, even though they made no changes to it.
Is it possible to have them act only as consumers? Read-only?
cache:
paths:
- assets/
configure:
stage: .pre
script:
- conda env update --prefix ./assets/env/base -f ./environment.yml;
- source activate ./assets/env/base
- bash ./download.sh
parse1:
stage: build
script:
- source activate ./assets/env/base;
- ./build.sh -b test -s 2
artifacts:
paths:
- build
parse2:
stage: build
script:
- source activate ./assets/env/base;
- ./build.sh -b test -s 2
artifacts:
paths:
- build
Buried deeply in the very detailed .gitlab-ci.yml
documentation is a reference to a cache
setting called policy
. GitLab caches have the concept of push
(aka write
) and pull
(aka read
). By default it is set to pull-push
(read
at the beginning and write
at the end).
If you know the job does not alter the cached files, you can skip the upload step by setting
policy: pull
in the job specification. Typically, this would be twinned with an ordinary cache job at an earlier stage to ensure the cache is updated from time to time:
Which pretty much describes this situation: the job configure
updates the cache, and the parse
jobs do not alter the cache.
In the consuming jobs, add:
cache:
paths:
- assets/
policy: pull
For clarity, it probably wouldn't hurt to make that explicit in the global setting:
cache:
paths:
- assets/
policy: pull-push