azure-machine-learning-service

How can I specify the instance name when using a pipeline job in Azure ML?


I'm using Azure ML Python SDK v2.

If I create a command job, I can choose serverless compute and specify the instance name and instance count as shown in the documentation.

But I need to create a pipeline job, so I can schedule it. There doesn't seem to be any way to specify the instance name and instance count in the (same) documentation. The code works but it runs on a default instance type.

How can I specify the instance name and instance count?

I looked at this example

from azure.ai.ml import MLClient, Input, Output, command
from azure.ai.ml.constants import TimeZone
from azure.ai.ml.entities import ResourceConfiguration, JobResourceConfiguration, JobSchedule, RecurrenceTrigger, RecurrencePattern
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import load_component
from azure.ai.ml.dsl import pipeline


# Login details ommitted for this example
ml_client = MLClient(credential, subscription_id, resource_group, workspace_name)


# This works, but it creates a command job not a pipeline job
'''
job = command(
    code="./",  # local path where the code is stored
    command="python dummy.py",
    resources=JobResourceConfiguration(
        instance_type="Standard_F4s_v2", instance_count=1
    ),  # Standard_NC4as_T4_v3, Standard_F4s_v2
    environment="//registries/azureml/environments/minimal-py311-inference/versions/15",
    environment_variables={"PYTHONPATH": "./"},
    display_name="dummy-job",
    description="Some job",
)
'''

dummy_component = load_component(source="component.yml")

@pipeline()
def dummy_pipeline():
    dummy_component()
    return {}

pipeline_job = dummy_pipeline()
pipeline_job.settings.default_compute = "serverless"
# How can I set instance_type="Standard_F4s_v2", instance_count=1

pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="dummy_pipeline_experiment"
)

Here's the component.yml

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: dummy
display_name: Dummy code
version: 2
type: command
code: ./src
environment: azureml://registries/azureml/environments/mlflow-py312-inference/versions/2
command: >-
  python dummmy.py

Solution

  • This part of the documentation mentions how you can set the compute instance for serverless compute.

    In your example it will be:

    dummy_component = load_component(source="component.yml")
    @pipeline()
    def dummy_pipeline():
        component = dummy_component()
        component.resources = ResourceConfiguration(instance_type="Standard_F4s_v2",instance_count=1)
        return {}
    pipeline_job = dummy_pipeline()
    pipeline_job.settings.default_compute = "serverless"
    pipeline_job = ml_client.jobs.create_or_update(
        pipeline_job, experiment_name="dummy_pipeline_experiment"
    )
    

    Also note that the pipeline components do not have runtime settings so you can't hardcode the compute at the component level but at pipeline level during runtime, see more explanation here.