I'm trying to build a Docker image and I'm running into a compatibility issue when building the Dockerfile.
The Dockerfile below leads to a successful build. But when I add "tensorflow-gpu" it fails with a requirements error. I'm not sure how to isolate this issue, so any guidance will be appreciated!
Dockerfile
:
FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.2-cudnn8-ubuntu20.04:20230530.v1
ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/tensorflow-2.7
# Create conda environment
RUN conda create -p $AZUREML_CONDA_ENVIRONMENT_PATH \
python=3.8 pip=20.2.4
# Prepend path to AzureML conda environment
ENV PATH $AZUREML_CONDA_ENVIRONMENT_PATH/bin:$PATH
# Install pip dependencies
RUN HOROVOD_WITH_TENSORFLOW=1 pip install 'matplotlib' \
'psutil' \
'tqdm' \
'pandas' \
'scipy' \
'numpy' \
'ipykernel' \
'azureml-core' \
'azureml-defaults' \
'azureml-mlflow' \
'azureml-telemetry' \
'tensorboard'
# This is needed for mpi to locate libpython
ENV LD_LIBRARY_PATH $AZUREML_CONDA_ENVIRONMENT_PATH/lib:$LD_LIBRARY_PATH
Error:
2024-03-29T03:05:42: ---> Running in bec9494787c1
2024-03-29T03:05:43: Collecting matplotlib~=3.5.0
2024-03-29T03:05:43: Downloading matplotlib-3.5.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
2024-03-29T03:05:44: Collecting psutil~=5.8.0
2024-03-29T03:05:44: Downloading psutil-5.8.0-cp38-cp38-manylinux2010_x86_64.whl (296 kB)
2024-03-29T03:05:44: Collecting tqdm~=4.62.0
2024-03-29T03:05:44: Downloading tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
2024-03-29T03:05:45: Collecting pandas~=1.3.0
2024-03-29T03:05:45: Downloading pandas-1.3.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)
2024-03-29T03:05:45: Collecting scipy~=1.7.0
2024-03-29T03:05:45: Downloading scipy-1.7.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (39.3 MB)
2024-03-29T03:05:47: Collecting numpy~=1.21.0
2024-03-29T03:05:47: Downloading numpy-1.21.6-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
2024-03-29T03:05:47: Collecting ipykernel~=6.0
2024-03-29T03:05:47: Downloading ipykernel-6.29.4-py3-none-any.whl (117 kB)
2024-03-29T03:05:47: Collecting azureml-core==1.51.0
2024-03-29T03:05:47: Downloading azureml_core-1.51.0-py3-none-any.whl (3.3 MB)
2024-03-29T03:05:47: Collecting azureml-defaults==1.51.0
2024-03-29T03:05:47: Downloading azureml_defaults-1.51.0-py3-none-any.whl (2.0 kB)
2024-03-29T03:05:47: Collecting azureml-mlflow==1.51.0
2024-03-29T03:05:47: Downloading azureml_mlflow-1.51.0-py3-none-any.whl (814 kB)
2024-03-29T03:05:47: Collecting azureml-telemetry==1.51.0
2024-03-29T03:05:47: Downloading azureml_telemetry-1.51.0-py3-none-any.whl (30 kB)
2024-03-29T03:05:47: [91mERROR: Could not find a version that satisfies the requirement tensorboard~=2.15.0 (from versions: 1.6.0rc0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.13.1, 1.14.0, 1.15.0, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.4.0, 2.4.1, 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.9.1, 2.10.0, 2.10.1, 2.11.0, 2.11.1, 2.11.2, 2.12.0, 2.12.1, 2.12.2, 2.12.3, 2.13.0, 2.14.0)
2024-03-29T03:05:47: ERROR: No matching distribution found for tensorboard~=2.15.0
2024-03-29T03:05:48: The command '/bin/sh -c HOROVOD_WITH_TENSORFLOW=1 pip install 'matplotlib~=3.5.0' 'psutil~=5.8.0' 'tqdm~=4.62.0' 'pandas~=1.3.0' 'scipy~=1.7.0' 'numpy~=1.21.0' 'ipykernel~=6.0' 'azureml-core==1.51.0' 'azureml-defaults==1.51.0' 'azureml-mlflow==1.51.0' 'azureml-telemetry==1.51.0' 'tensorboard~=2.15.0'' returned a non-zero code: 1
2024-03-29T03:05:48: [0m
2024-03-29T03:05:48: CalledProcessError(1, ['docker', 'build', '-f', 'Dockerfile', '.', '-t', 'e91555eeb3224b08b13539c983a2c3f8.azurecr.io/azureml/azureml_50f64810c9ea1320c2b49770067c34d2', '-t', 'e91555eeb3224b08b13539c983a2c3f8.azurecr.io/azureml/azureml_50f64810c9ea1320c2b49770067c34d2:1'])
2024-03-29T03:05:48: Building docker image failed with exit code: 1
Using tensorboard==2.14.0
seems to work. Is there anything in tensorboard==2.15.0
specifically that you need or will a slightly earlier minor version work for you?
🗎 Dockerfile
FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.2-cudnn8-ubuntu20.04:20230530.v1
ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/tensorflow-2.7
ENV LD_LIBRARY_PATH $AZUREML_CONDA_ENVIRONMENT_PATH/lib:$LD_LIBRARY_PATH
RUN conda create -p $AZUREML_CONDA_ENVIRONMENT_PATH \
python=3.8 pip=20.2.4
ENV PATH $AZUREML_CONDA_ENVIRONMENT_PATH/bin:$PATH
COPY requirements.txt .
RUN HOROVOD_WITH_TENSORFLOW=1 pip install -r requirements.txt
🗎 requirements.txt
azureml-core==1.55.0.post2
azureml-defaults==1.55.0
azureml-mlflow==1.55.0
azureml-telemetry==1.55.0
ipykernel==6.29.4
matplotlib==3.7.5
numpy==1.24.4
pandas==2.0.3
psutil==5.9.8
scipy==1.10.1
tensorboard==2.14.0
tensorflow-gpu==2.11.0
tqdm==4.66.2
Alternatively, if you really want tensorboard==2.15.0
then you could upgrade to Python 3.9.
🗎 Dockerfile
FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.2-cudnn8-ubuntu20.04:20230530.v1
ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/tensorflow-2.7
ENV LD_LIBRARY_PATH $AZUREML_CONDA_ENVIRONMENT_PATH/lib:$LD_LIBRARY_PATH
RUN conda create -p $AZUREML_CONDA_ENVIRONMENT_PATH \
python=3.9 pip=20.2.4
ENV PATH $AZUREML_CONDA_ENVIRONMENT_PATH/bin:$PATH
COPY requirements.txt .
RUN HOROVOD_WITH_TENSORFLOW=1 pip install -r requirements.txt
🗎 requirements.txt
azureml-core==1.55.0.post2
azureml-defaults==1.55.0
azureml-mlflow==1.55.0
azureml-telemetry==1.55.0
ipykernel==6.29.4
matplotlib==3.7.5
numpy==1.24.4
pandas==2.0.3
psutil==5.9.8
scipy==1.10.1
tensorboard==2.15.0
tensorflow-gpu==2.11.0
tqdm==4.66.2