We're experimenting with MLOps on Azure Machine Learning, and as such, want to manage an Online Endpoint for inference. However, we also want to save costs, and since the software is only running in a single location for now, we know for a fact people won't be using it outside normal business hours.
The endpoint is deployed to a managed compute instance, which quotes us hourly (not based on requests) as long as the deployment (the endpoint) is live.
I haven't seen any option (neither on the UI nor the documentation) to schedule and delete a deployment automatically. I can configure AutoScaling, but I'm unsure scaling the endpoint to 0% in the nights and weekends also releases the compute (my guess is, it doesn't and we'd still be paying for the compute). I can delete the deployment by hand every night and deploy it every morning, but I'd expect to be able to do this automatically, as it would become unmanageable over time with more endpoints.
Can I - and if yes, how - reduce the cost to 0 USD of an Azure Machine Learning Online Endpoint outside business hours automatically based on a schedule? If yes, how?
You can delete a endpoint using the python azure ml sdk
from azureml.core import Workspace, Webservice
service = Webservice(workspace=ws, name='your-service-name')
service.delete()
Then if you want to re create you can re deploy the model
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.model import Model
service_name = 'my-custom-env-service'
inference_config = InferenceConfig(entry_script='score.py', environment=environment)
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
service = Model.deploy(workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_config=aci_config,
overwrite=True)
service.wait_for_deployment(show_output=True)
There is no current way to schedule or temporary disable the endpoint. The only way would be to delete and re create using the azureml sdk. The other option would be to use a Azure function app for deployment for ml models and this way you only pay for requests made.