endpointazure-machine-learning-serviceazureml-python-sdkazure-aiazuremlsdk

How to use predict_proba with AzureML batch endpoint within invoke method, whilst using URI folder URL as the data?


I have a AutoML generated binary classification model deployed to a batch endpoint. I can successfully invoke the model using the below code to output a file that contains a binary prediction (1|0).

# Input being ADLS
input = Input(
    type=AssetTypes.URI_FOLDER, 
    path="https://mydatalake.blob.core.windows.net/my_container/folder_with_data"
)

# Invoke the end point
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name,
    inputs={
        "heart_dataset": input,
    }
)

I am trying to incorporate the global parameter predict_proba so that the file returned includes the predicted probability of either or both classes, however I have been unsuccessful.
Many of the examples (example) I am trying to follow are similar but their data is specified in the code (i.e. input is a dataframe manually populated in the code rather than a URL to a folder or file).

For example, the below returns a serialisation error:

# Input being ADLS
input = Input(
    type=AssetTypes.URI_FOLDER, 
    path="https://mydatalake.blob.core.windows.net/my_container/folder_with_data"
)

# Invoke endpoint
job = ml_client.batch_endpoints.invoke(
    endpoint_name="heart-test",
    inputs={
        "GlobalParameters": {
            "method": "predict_proba"
        },
        "inputs": {
            "heart_dataset": input
        }
    }
)

Error:

SerializationError: ("Attribute None in object dict cannot be serialized.\n{'method': 'predict_proba'}, AttributeError: 'dict' object has no attribute '_attribute_map'", AttributeError("'dict' object has no attribute '_attribute_map'"))

I have tried various placements of the GlobalParameters method predict_proba, asking LLMs and googling for solutions, trying to adapt them to my use case.
This includes variations on the above not working code, as well as leveraging the schema argument within the invoke method (this worked but resulted in binary prediction response).

For reference:

output_schema = {
    "output_data": {
        "definitions": {
            "scored_data": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        # Replace with actual class labels from your model
                        "Class_0": {"type": "number"},
                        "Class_1": {"type": "number"},
                        # Add additional classes as needed
                    }
                }
            }
        }
    }
}

# Invoke the endpoint with the output schema
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint_name,
    inputs={"heart_dataset": input},
    schema=output_schema
    )

As this is an AutoML generated model deployed to a batch end point, the scoring script is automatically generated, so I am hoping to avoid editing it and achieve the predicted probability being returned using the input to the endpoint invocation only.

Hopefully that is all required info, happy to add more if required. Thanks in advance.


Solution

  • For anyone finding this in the future, I ended up creating a basic scoring script and custom environment (based on the environment that model training run was run in).