pythonpipgitlab-ci

How to cache python dependecies in Gitlab CI/CD without using venv?


I am trying to use cache in my .gitlab-ci.yml file, but the time only increases (testing by adding blank lines). I want to cache python packages I install with pip. Here is the stage where I install and use these packages (other stages uses Docker):

image: python:3.8-slim-buster

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip

stages:
  - lint
  - test
  - build
  - deploy

test-job:
  stage: test
  before_script:
    - apt-get update
    - apt-get install -y --no-install-recommends gcc
    - apt install -y default-libmysqlclient-dev
    - pip3 install -r requirements.txt
  script:
    - pytest tests/test.py

After running this pipeline, with each pipeline, the pipeline time just increases. I was following these steps from GitLab documentation - https://docs.gitlab.com/ee/ci/caching/#cache-python-dependencies Although I am not using venv since it works without it. I am still not sure why the PIP_CACHE_DIR variable is needed if it is not used, but I followed the documentation.

What is the correct way to cache python dependencies? I would prefer not to use venv.


Solution

  • PIP_CACHE_DIR is a pip feature that can be used to set the cache dir.

    The second answer to this question explains it.

    There may be some disagreement on this, but I think that for something like pip packages or node modules, it is quicker to download them fresh for each pipeline.

    When the packages are cached by Gitlab by using

    cache:
      paths:
        - .cache/pip
    

    The cache that Gitlab creates gets zipped and stored somewhere(where it gets stored depends on runner config). This requires zipping and uploading the cache. Then when another pipeline gets created, the cache needs to be downloaded and unpacked. If using a cache is slowing down job execution, then it might make sense to just remove the cache.