azureazure-machine-learning-serviceautomlazuremlsdk

Azure AutoML with historical data context forecast


I have a question about the peculiar behaviour of Azure AutoML when using forecasting with historical data context.

Basically, I want to apply this usecase from the documentation (documentation)

Automl usecase

The idea is to train a model with historical data (imagine, 3 months of historical data) and then feed the model the current prediction context (for example, the last two weeks) in order to predict a certain prediction horizon.

According to the documentation, to train the model with historical data, need to do something like this for configuration:

    forecasting_parameters = ForecastingParameters(time_column_name='Timestamp', 
                                               target_aggregation_function = "mean",
                                               freq='H',
                                               forecast_horizon = prediction_horizon_hours,
                                               target_lags = 'auto',
                                               )

    label = signalTags

automl_config = AutoMLConfig(task='forecasting',
                             primary_metric='normalized_root_mean_squared_error',
                             experiment_timeout_minutes=30,
                             blocked_models=["AutoArima"],
                             enable_early_stopping=True,
                             training_data=Data,
                             label_column_name=label,
                             n_cross_validations=3,
                             enable_ensembling=False,
                             verbosity=logging.INFO,
                             forecasting_parameters = forecasting_parameters)

After training, in order to perform a predictiton I need to feed the "context" according to what I want to predict in the form of a dataframe (where the values for the target column are filled in in case of the context and empty in case of values I want to predict) and then just call forecast. Something like this:

     Timestamp                               Signal
0    2022-08-07T23:00:00Z                     63.16
1    2022-08-08T00:00:00Z                     62.92
2    2022-08-08T01:00:00Z                     62.89
3    2022-08-08T02:00:00Z                     62.79
4    2022-08-08T03:00:00Z                     62.75
..                    ...                       ...
233  2022-08-23T17:00:00Z                       nan
234  2022-08-23T18:00:00Z                       nan
235  2022-08-23T19:00:00Z                       nan
236  2022-08-23T20:00:00Z                       nan
237  2022-08-23T21:00:00Z                       nan

After all this context (pun intended) here is the question/problem.

When I use the above dataframe to forecast ahead I get an error that mentions the following:

ForecastingConfigException:
    Message: Expected column(s) target value column not found in y_pred.
    InnerException: None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Expected column(s) target value column not found in y_pred.",
        "target": "y_pred",
        "inner_error": {
            "code": "BadArgument",
            "inner_error": {
                "code": "MissingColumnsInData"
            }
        },
        "reference_code": "ac316505-87e4-4877-a855-65a24c3a796b"
    }
}

However, if I feed a slightly different dataframe (where the data to be forecasted has any other time except exactly on the hour, i.e. 10h30,11h01, 10h23 etc.) it works normally. If I give it something like this:

 Timestamp                               Signal
0    2022-08-07T23:00:00Z                     63.16
1    2022-08-08T00:00:00Z                     62.92
2    2022-08-08T01:00:00Z                     62.89
3    2022-08-08T02:00:00Z                     62.79
4    2022-08-08T03:00:00Z                     62.75
..                    ...                       ...
233  2022-08-23T17:00:01Z                       nan
234  2022-08-23T18:00:01Z                       nan
235  2022-08-23T19:00:01Z                       nan
236  2022-08-23T20:00:01Z                       nan
237  2022-08-23T21:00:01Z                       nan

It outputs good results. What gives?

I have tried resetting the index of the dataframe, replace None with nan but nothing seems to work. Azure Automl can predict any date except ones that are on the hour.

What can I do to fix this?

Thanks!


Solution

  • I managed to get it to work by changing how I call the forecast model.

    Taking into account these variables:

    For a univariate series, instead of using this:

    model.forecast(x, y)

    I need to call:

    model.forecast(df, y)

    Remember that to call forecast you need to supply the arguments in a dataframe or in numpy array