I have a custom triton docker container that use a python backend. This container works perfectly on local.
Here is the container dockerfile (I have ommitted irrelevant parts).
ARG TRITON_RELEASE_VERSION=22.12
FROM nvcr.io/nvidia/tritonserver:${TRITON_RELEASE_VERSION}-pyt-python-py3
LABEL owner='toing'
LABEL maintainer='toing@toing.com'
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
ARG TRITON_RELEASE_VERSION
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV GIT_TRITON_RELEASE_VERSION="r$TRITON_RELEASE_VERSION"
ENV TRITON_MODEL_DIRECTORY="/opt/ml/model"
SHELL ["/bin/bash", "-c"]
# nvidia updated their repository keys recently
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN apt-get update && \
apt-get install -y --no-install-recommends \
# generic requirements
gcc \
libgl1-mesa-glx
RUN pip install --upgrade pip && \
pip install --no-cache-dir setuptools \
scikit-build \
opencv-python-headless \
cryptography
# run create model dir
RUN mkdir -p $TRITON_MODEL_DIRECTORY
# for mmcv installation
ENV FORCE_CUDA="1"
# set TORCH_CUDA_ARCH_LIST
ENV TORCH_CUDA_ARCH_LIST="7.5"
RUN pip install --no-cache-dir what-i-need --index-url
# install pytorch requirements from aws
RUN mkdir -p /app/snapshots && \
mkdir -p /keys
# Copy the requirements files
ADD requirements/build.txt /install/build.txt
# install specific packages
RUN pip install --no-cache-dir -r /install/build.txt
# number of workers per model
ENV SAGEMAKER_MODEL_SERVER_WORKERS=1
ENV SAGEMAKER_BIND_TO_PORT=8000
ENV SAGEMAKER_SAFE_PORT_RANGE=8000-8002
# HTTP Inference Service
EXPOSE 8000
# GRPC Inference Service
EXPOSE 8001
# Metrics Service
EXPOSE 8002
RUN echo -e "#!/bin/bash\n\
tritonserver --model-repository ${TRITON_MODEL_DIRECTORY}"\
>> /start.sh
RUN chmod +x /start.sh
# Set the working directory to /
WORKDIR /
ENTRYPOINT ["/start.sh"]
The problem is that when I launch it from the sagemaker MME endpoint, the triton server starts and runs, but apprently sagemaker fails to detect the running server, hence the healthchecks fail and the endpoint creation fails.
Am Is using the wrong port, or what should I do to avoid this error?
PS: I did see that the base NGC container used in this dockerfile uses an entrypoint at /opt/nvidia/nvidia_entrypoint.sh
but the code seems to be just a wrapper around the original entrypoint.
The problem was that sagemaker requires triton to run on port 8080:
ENV SAGEMAKER_MULTI_MODEL=true
ENV SAGEMAKER_BIND_TO_PORT=8080
EXPOSE 8080
and that triton needs to run in sagemaker mode --allow-sagemaker=true
. The command needed to run this was found on this link.
RUN echo -e "#!/bin/bash\n\
tritonserver --allow-sagemaker=true --allow-grpc=false --allow-http=false --allow-metrics=false --model-control-mode=explicit --model-repository ${TRITON_MODEL_DIRECTORY}"\
>> /start.sh
So I adapted this to my dockerfile and triton was able to startup with sagemaker.
PS: When using a custom python stub, there is an open issue where s3 removes execution permissions for the stub. To avoid this, I had to put the python stub directly in /opt/tritonserver/backends/python/triton_python_backend_stub
instead of in the model tar.gz file as recommended in the documentation. However, this does not work when different models use different stubs.