I manage to run Ollama as a k8s STS. I am using it for Python Langchain LLM/RAG application. However the following Dockerfile ENTRYPOINT
script which tries to pull a list of images exported as MODELS
ENV from k8s STS manifest runs into problem. Dockerfile has the following ENTRYPOINT
and CMD
:
ENTRYPOINT ["/usr/local/bin/run.sh"]
CMD ["bash"]
run.sh
:
#!/bin/bash
set -x
ollama serve&
sleep 10
models="${MODELS//,/ }"
for i in "${models[@]}"; do \
echo model: $i \
ollama pull $i \
done
k8s logs:
+ models=llama3.2
/usr/local/bin/run.sh: line 10: syntax error: unexpected end of file
David Maze's solution:
lifecycle:
postStart:
exec:
command:
- bash
- -c
- |
for i in $(seq 10); do
ollama ps && break
sleep 1
done
for model in ${MODELS//,/ }; do
ollama pull "$model"
done
ollama-0 1/2 CrashLoopBackOff 4 (3s ago) 115s
ollama-1 1/2 CrashLoopBackOff 4 (1s ago) 115s
Warning FailedPostStartHook 106s (x3 over 2m14s) kubelet PostStartHook failed
$ k logs -fp ollama-0
Defaulted container "ollama" out of: ollama, fluentd
Error: unknown command "ollama" for "ollama"
Update Dockerfile
:
ENTRYPOINT ["/bin/ollama"]
#CMD ["bash"]
CMD ["ollama", "serve"]
I need the customized Dockerfile
so that I could install Nvidia Container Toolkit.
At a mechanical level, the backslashes inside the for
loop are causing problems. This causes the shell to combine the lines together, so you get a single command echo model: $i ollama pull $i done
, but there's not a standalone done
command to terminate the loop.
The next problem you'll run into is that this entrypoint script is the only thing the container runs, and when this script exits, the container will exit as well. It doesn't matter that you've started the Ollama server in the background. If you wanted to run the container this way, you need to wait
for the server to exit. That would look something like
#!/bin/bash
ollama serve &
pid=$! # ADD: save the process ID of the server
sleep 10
models="${MODELS//,/ }"
for i in "${models[@]}"; do # FIX: remove backslashes
echo model: "$i"
ollama pull "$i"
done
wait "$pid" # ADD: keep the script running as long as the server is too
However, this model of starting a background process and then wait
ing for it often isn't the best approach. If the Pod gets shut down, for example, the termination signal will go to the wrapper script and not the Ollama server, and you won'd be able to have a clean shutdown.
In a Kubernetes context (you say you're running this in a StatefulSet) a PostStart hook fits here. This will let you run an unmodified image, but add your own script that runs at about the same time as the container startup. In a Kubernetes manifest this might look like:
spec:
template:
spec:
containers:
- name: ollama
image: ollama/ollama # the unmodified upstream image
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
for i in $(seq 10); do
ollama ps && break
sleep 1
done
for model in llama3.2; do
ollama pull "$model"
done
This setup writes a shell script inline in the Kubernetes manifest. It wraps it in /bin/sh -c
to it can be run this way. This uses an "exec" mechanism, so the script runs as a secondary process in the same container. The first fragment waits up to 10 seconds for the server to be running, and the second is the loop to load the models.