azureendpointazure-machine-learning-service

Is it possible to alter the input variables in an Azure Machine Learning endpoint?


I have an input dataset that contains columns that are only known in the future, I use these for training (e.g. Arrival Time). The input data is as follows. { "toAddressPostCode", "toAddressCountryCode", "unloadInDatetime":, "unloadOutDatetime", "loadingMeters", "payableWeight", "name" }

I use the following python code to convert these columns right after import:

import os
import pandas as pd

def transform_data(df):
    if 'unloadInDatetime' in df.columns and 'unloadOutDatetime' in df.columns:
        # Convert to datetime if not already
        datetime_format = '%Y-%m-%d %H:%M:%S.%f'  # Format to include microseconds
        df['InDateTime'] = pd.to_datetime(df['unloadInDatetime'], format=datetime_format, errors='coerce')
        df['OutDateTime'] = pd.to_datetime(df['unloadOutDatetime'], format=datetime_format, errors='coerce')

        # Extract date components and hour as integer
        df['InDate'] = df['InDateTime'].dt.date
        df['InTime'] = df['InDateTime'].dt.hour  # Extract hour as integer
        df['InTime'] = df['InTime'].astype(int)
        df['OutDate'] = df['OutDateTime'].dt.date
        df['OutTime'] = df['OutDateTime'].dt.time

        # Other transformations
        df['toAddressPostCode'] = df['toAddressPostCode'].str[:5]
        df['UnloadMonth'] = df['InDateTime'].dt.month
        df['UnloadWeekday'] = df['InDateTime'].dt.weekday + 1

        # Calculate the duration
        df['UnloadingTime'] = (df['OutDateTime'] - df['InDateTime']).dt.total_seconds() / 60
        df['UnloadingTime'] = df['UnloadingTime'].astype(int)

        # Filter the data
        df = df[(df['UnloadingTime'] > 5) & (df['UnloadingTime'] < 300)]
        df.reset_index(drop=True, inplace=True)

        # Select desired columns
        df = df[['UnloadingTime', 'toAddressCountryCode', 'toAddressPostCode', 'UnloadWeekday', 'UnloadMonth', 'loadingMeters', 'payableWeight', 'InTime', 'name']]
    else:
        print("Required columns not found in the DataFrame")
    return df

def azureml_main(dataframe1=None, dataframe2=None):
    # Perform your data transformation
    if dataframe1 is not None:
        df_transformed = transform_data(dataframe1)
    else:
        df_transformed = pd.DataFrame()

    return df_transformed, None

From this I would like to use these columns as the input data for my endpoint: { "toAddressCountryCode" "toAddressPostCode" "UnloadWeekday" "UnloadMonth" "loadingMeters" "payableWeight" "InTime" "name" }

Is there any way to achieve this?

First of all I ran the flow and implemented it as an end point. The flow looks as follows: enter image description here

I have looked into the entire flow to see if the columns I wanted to use were present, and they were. In the endpoint testing windows I tried to input the desired columns, but then an error pops up saying the "input data are inconsistent with schema".


Solution

  • Whenever you try to score a model endpoint, it should match the endpoint schema on which it is trained.

    If it is not matching, transform the input data so that it matches the endpoint input schema.

    Alternatively, modify the scoring script to transform the input data in a way that matches the prediction and create an endpoint.

    You can make these changes while using a registered model to create an online endpoint with a custom scoring script.

    enter image description here

    Go to the models tab and open the registered model. You will see an option to deploy; click on it and you will be prompted for several inputs. There, you need to upload the altered scoring script.

    enter image description here

    Below is a sample script you can refer to score.py