azureendpointazure-machine-learning-service

Azure Machine Learning - Online Endpoint Schedule/Cost management


We're experimenting with MLOps on Azure Machine Learning, and as such, want to manage an Online Endpoint for inference. However, we also want to save costs, and since the software is only running in a single location for now, we know for a fact people won't be using it outside normal business hours.

The endpoint is deployed to a managed compute instance, which quotes us hourly (not based on requests) as long as the deployment (the endpoint) is live.

I haven't seen any option (neither on the UI nor the documentation) to schedule and delete a deployment automatically. I can configure AutoScaling, but I'm unsure scaling the endpoint to 0% in the nights and weekends also releases the compute (my guess is, it doesn't and we'd still be paying for the compute). I can delete the deployment by hand every night and deploy it every morning, but I'd expect to be able to do this automatically, as it would become unmanageable over time with more endpoints.

Can I - and if yes, how - reduce the cost to 0 USD of an Azure Machine Learning Online Endpoint outside business hours automatically based on a schedule? If yes, how?


Solution

  • You can delete a endpoint using the python azure ml sdk

    from azureml.core import Workspace, Webservice
    
    service = Webservice(workspace=ws, name='your-service-name')
    service.delete()
    

    Then if you want to re create you can re deploy the model

    from azureml.core.model import InferenceConfig
    from azureml.core.webservice import AciWebservice
    from azureml.core.model import Model
    
    service_name = 'my-custom-env-service'
    
    inference_config = InferenceConfig(entry_script='score.py', environment=environment)
    aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
    
    
    
    service = Model.deploy(workspace=ws,
                           name=service_name,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=aci_config,
                           overwrite=True)
    service.wait_for_deployment(show_output=True)
    

    There is no current way to schedule or temporary disable the endpoint. The only way would be to delete and re create using the azureml sdk. The other option would be to use a Azure function app for deployment for ml models and this way you only pay for requests made.