pythondockerfileconflicting-librariesdbt-bigquery

Resolving conflicts in python library dependency versions in apache/airflow docker image (due to dbt-bigquery library)


#15 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

#15 google-cloud-aiplatform 1.16.1 requires google-cloud-bigquery<3.0.0dev,>=1.15.0, but you have google-cloud-bigquery 3.10.0 which is incompatible.

#15 google-ads 18.0.0 requires protobuf!=3.18.*,!=3.19.*,<=3.20.0,>=3.12.0, but you have protobuf 3.20.3 which is incompatible.

We are receiving these errors in the logs of docker-compose build when building our apache airflow image. According to LLM model:

It's worth noting that dbt-bigquery==1.5.0 is a new release from only a few weeks ago.

Here is our Dockerfile:

FROM --platform=linux/amd64 apache/airflow:2.5.3

# install mongodb-org-tools
USER root
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
    curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
    add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
    apt-get update && apt-get install -y mongodb-org-tools
USER airflow

ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

and our requirements.txt

gcsfs==0.6.1                        # Google Cloud Storage file system interface
ndjson==0.3.1                       # Newline delimited JSON parsing and serialization
pymongo==3.12.1                     # MongoDB driver for Python
dbt-bigquery==1.5.0                 # dbt adapter for Google BigQuery
numpy==1.21.1                       # Numerical computing in Python
pandas==1.3.1                       # Data manipulation and analysis library
billiard                            # Multiprocessing replacement, to avoid "daemonic processes are not allowed to have children" error using Pool

How can we resolve these dependency conflicts? How can we even tell which library dependencies are for which libraries in our requirements.txt? My assumption is that google-cloud-aiplatform and google-cloud-bigquery are both dependencies of dbt-bigquery, however if they were dependencies to the same library, I wouldn't except a dependency conflict.

Edit: some useful logs from the build:

Requirement already satisfied: protobuf>=3.18.3 in /home/airflow/.local/lib/python3.7/site-packages (from dbt-core~=1.5.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (3.20.0)

Collecting google-cloud-bigquery~=3.0
Downloading google_cloud_bigquery-3.10.0-py2.py3-none-any.whl (218 kB)

Requirement already satisfied: proto-plus<2.0.0dev,>=1.15.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.19.6)

Requirement already satisfied: grpcio<2.0dev,>=1.47.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.53.0)

Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.4.1)

Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.3.2)

Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /home/airflow/.local/lib/python3.7/site-packages (from google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (2.8.2)

Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.56.4)

Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /home/airflow/.local/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5)) (1.48.2)

Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /home/airflow/.local/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery~=3.0->dbt-bigquery==1.5.0->-r /requirements.txt (line 5))

google-cloud-aiplatform and google-ads do not appear a single time in the build logs other than in the error message.


Solution

  • The problem arises from conflicts with Python packages the OS requests to install and the dependency graph of your project's packages.

    The short answer is to use the same strategy as you often would with any Python project: venv

    Solution

    Below is a complete working Dockerfile:

    FROM --platform=linux/amd64 apache/airflow:2.5.3-python3.9
    
    # install mongodb-org-tools
    ENV DEBIAN_FRONTEND noninteractive
    USER root
    RUN apt-get update && apt-get install -y --no-install-recommends gnupg software-properties-common python3-venv && \
        curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
        add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
        apt-get update && apt-get install -y --no-install-recommends mongodb-org-tools
    
    COPY requirements.txt /usr/local/app/requirements.txt
    
    ENV VIRTUAL_ENV=/usr/local/venv
    RUN python3 -m venv $VIRTUAL_ENV
    ENV PATH="$VIRTUAL_ENV/bin:$PATH"
    
    RUN \
        pip install --upgrade --no-cache-dir --no-user pip && \
        pip install --no-cache-dir --no-user -r /usr/local/app/requirements.txt
        # run your app
    

    Note the setup an use of venv here. Just like outside a container, this will partition your application dependencies from the system-installed one inside the container.

    Notes

    Detail

    When running apt-get install you can see a number of packages are installed:

    #6 4.003 The following NEW packages will be installed:
    #6 4.003   dbus dmsetup gir1.2-glib-2.0 gir1.2-packagekitglib-1.0 iso-codes
    #6 4.003   libapparmor1 libappstream4 libargon2-1 libcap2 libcap2-bin libcryptsetup12
    #6 4.003   libcurl3-gnutls libdbus-1-3 libdevmapper1.02.1 libdw1 libelf1
    #6 4.003   libgirepository-1.0-1 libglib2.0-0 libglib2.0-bin libglib2.0-data
    #6 4.003   libgstreamer1.0-0 libip4tc2 libkmod2 libnss-systemd libpackagekit-glib2-18
    #6 4.003   libpam-cap libpam-systemd libpolkit-agent-1-0 libpolkit-gobject-1-0
    #6 4.003   libstemmer0d libunwind8 libyaml-0-2 packagekit packagekit-tools policykit-1
    #6 4.003   python-apt-common python3-apt python3-dbus python3-distro-info python3-gi
    #6 4.003   python3-pycurl python3-software-properties shared-mime-info
    #6 4.003   software-properties-common systemd systemd-sysv systemd-timesyncd ucf
    #6 4.003   unattended-upgrades xdg-user-dirs xz-utils
    ...
    #7 6.657 The following NEW packages will be installed:
    #7 6.657   dbus dmsetup gir1.2-glib-2.0 gir1.2-packagekitglib-1.0 iso-codes
    #7 6.657   libapparmor1 libappstream4 libargon2-1 libcap2 libcap2-bin libcryptsetup12
    #7 6.657   libcurl3-gnutls libdbus-1-3 libdevmapper1.02.1 libdw1 libelf1
    #7 6.657   libgirepository-1.0-1 libglib2.0-0 libglib2.0-bin libglib2.0-data
    #7 6.657   libgstreamer1.0-0 libip4tc2 libkmod2 libnss-systemd libpackagekit-glib2-18
    #7 6.657   libpam-cap libpam-systemd libpolkit-agent-1-0 libpolkit-gobject-1-0
    #7 6.657   libstemmer0d libunwind8 libyaml-0-2 packagekit packagekit-tools policykit-1
    #7 6.657   python-apt-common python3-apt python3-dbus python3-distro-info python3-gi
    #7 6.657   python3-pycurl python3-software-properties shared-mime-info
    #7 6.657   software-properties-common systemd systemd-sysv systemd-timesyncd ucf
    #7 6.657   unattended-upgrades xdg-user-dirs xz-utils
    #7 6.658 The following packages will be upgraded:
    #7 6.658   libsystemd0
    

    I didn't track down the exact problem package, but you can see several python3-* packages requested to be installed. One of these conflicts with the dependency graph of your application.