machine-learningdeploymentdata-scienceserving

What is the difference between Deploying and Serving ML model?


Recently I have developed a ML model for classification problem and now would like to put in the production to do classification on actual production data, while exploring I have came across two methods deploying and serving ML model what is the basic difference between them ?


Solution

  • Based on my own readings and understanding, here's the difference:

    1. Deploying = it means that you want to create a server/api (e.g. REST API) so that it will be able to predict on new unlabelled data

    2. Serving = it acts as a server that is specialized for predict models. The idea is that it can serve multiple models with different requests.

    Basically, if your use case requires deploying multiple ML models, you might want to look for serving like torchServe. But if it's just one model, for me, Flask is already good enough.

    Reference:

    Pytorch Deploying using flask

    TorchServe