[SOLVED] Is there a way to have GitLab Cache be consumed without being written to?

Is there a way to have GitLab Cache be consumed without being written to?

I have a gitlab job that downloads a bunch of dependencies and stuffs them in a cache (if necessary), then I have a bunch of jobs that use that cache. I notice at the end of the consuming jobs, they spend a bunch of time creating a new cache, even though they made no changes to it.

Is it possible to have them act only as consumers? Read-only?

cache:
  paths:
    - assets/
configure:
  stage: .pre
  script:
    - conda env update --prefix ./assets/env/base -f ./environment.yml; 
    - source activate ./assets/env/base 
    - bash ./download.sh
parse1:
  stage: build
  script:
    - source activate ./assets/env/base;
    - ./build.sh -b test -s 2
  artifacts:
    paths:
      - build
parse2:
  stage: build
  script:
    - source activate ./assets/env/base;
    - ./build.sh -b test -s 2
  artifacts:
    paths:
      - build

Solution

Buried deeply in the very detailed .gitlab-ci.yml documentation is a reference to a cache setting called policy. GitLab caches have the concept of push (aka write) and pull (aka read). By default it is set to pull-push (read at the beginning and write at the end).

If you know the job does not alter the cached files, you can skip the upload step by setting policy: pull in the job specification. Typically, this would be twinned with an ordinary cache job at an earlier stage to ensure the cache is updated from time to time:

.gitlab-ci.yml > cache:policy

Which pretty much describes this situation: the job configure updates the cache, and the parse jobs do not alter the cache.

In the consuming jobs, add:

cache:
  paths:
    - assets/
  policy: pull

For clarity, it probably wouldn't hurt to make that explicit in the global setting:

cache:
  paths:
    - assets/
  policy: pull-push