amazon-web-servicesamazon-sagemakerhuggingface-transformersmxnet

Sagemaker Serverless Inference & custom container: Model archiver subprocess fails


I would like to host a model on Sagemaker using the new Serverless Inference.

I wrote my own container for inference and handler following several guides. These are the requirements:

mxnet
multi-model-server
sagemaker-inference
retrying
nltk
transformers==4.12.4
torch==1.10.0

On non-serverless endpoints, this container works perfectly well. However, with the serverless version I get the following error message when loading the model:

ERROR - /.sagemaker/mms/models/model already exists.

The error is thrown by the following subprocess

['model-archiver', '--model-name', 'model', '--handler', '/home/model-server/handler_service.py:handle', '--model-path', '/opt/ml/model', '--export-path', '/.sagemaker/mms/models', '--archive-format', 'no-archive']

So something that has to do with the model-archiver (which I guess is a process from the MMS package?).


Solution

  • So the issue really was related to hosting the model using the sagemaker inference toolkit and MMS which always uses the multi-model scenario which is not supported by serverless inference.

    I ended up writing my own Flask API which actually is nearly as easy and more customizable. Ping me for details if you're interested.