I would like to host a model on Sagemaker using the new Serverless Inference.
I wrote my own container for inference and handler following several guides. These are the requirements:
mxnet
multi-model-server
sagemaker-inference
retrying
nltk
transformers==4.12.4
torch==1.10.0
On non-serverless endpoints, this container works perfectly well. However, with the serverless version I get the following error message when loading the model:
ERROR - /.sagemaker/mms/models/model already exists.
The error is thrown by the following subprocess
['model-archiver', '--model-name', 'model', '--handler', '/home/model-server/handler_service.py:handle', '--model-path', '/opt/ml/model', '--export-path', '/.sagemaker/mms/models', '--archive-format', 'no-archive']
So something that has to do with the model-archiver
(which I guess is a process from the MMS package?).
So the issue really was related to hosting the model using the sagemaker inference toolkit and MMS which always uses the multi-model scenario which is not supported by serverless inference.
I ended up writing my own Flask API which actually is nearly as easy and more customizable. Ping me for details if you're interested.