I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container.
Following this repo for installation of llama_cpp_python==0.2.6.
DOCKERFILE
# Use the official Python image for Python 3.11
FROM python:3.11
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# ARG FORCE_CMAKE=1
# ARG CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# Install project dependencies
RUN FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" python -m pip install -r requirements.txt
# Command to run the server
CMD ["python", "./server.py"]
Run cmd:
docker build -t llm_server ./llm
docker run -it -p 2023:2023 --gpus all llm_server
Problem: For some reason, the env variables in the llama cpp docs do not work as expected in a docker container.
Current behaviour: BLAS= 0 (llm using CPU) llm initialization
Expected behaviour: BLAS= 1 (llm using GPU)
nvidia-smi output inside container:
# nvidia-smi
Thu Nov 23 05:48:30 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01 Driver Version: 546.01 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti On | 00000000:01:00.0 On | N/A |
| N/A 48C P8 4W / 80W | 1257MiB / 6144MiB | 7% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 392 G /Xwayland N/A |
+---------------------------------------------------------------------------------------+
#
# ARG FORCE_CMAKE=1
# ARG CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# ENV FORCE_CMAKE=1
# ENV CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# Install project dependencies
RUN FORCE_CMAKE=1 CMAKE_ARGS="-DLLAMA_CUBLAS=on" python -m pip install -r requirements.txt```
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
Update: This docker file works thanks to the person who answered.
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install Python and pip
RUN apt-get update && apt-get install -y python3 python3-pip
# Set environment variable
ENV CMAKE_ARGS="-DLLAMA_CUBLAS=ON"
# Install Python dependencies
RUN pip install --no-cache-dir --upgrade pip && \
pip install -r requirements.txt --no-cache-dir
# Command to run the server
CMD ["python3", "./server.py"]
On Windows I use this image:
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
And this is how I set the necessary vars before install.
ENV CMAKE_ARGS="-DLLAMA_CUBLAS=ON"
RUN pip install llama-cpp-python
Works for me. Again, on Windows with Docker Desktop!