I have a question about the peculiar behaviour of Azure AutoML when using forecasting with historical data context.
Basically, I want to apply this usecase from the documentation (documentation)
The idea is to train a model with historical data (imagine, 3 months of historical data) and then feed the model the current prediction context (for example, the last two weeks) in order to predict a certain prediction horizon.
According to the documentation, to train the model with historical data, need to do something like this for configuration:
forecasting_parameters = ForecastingParameters(time_column_name='Timestamp',
target_aggregation_function = "mean",
freq='H',
forecast_horizon = prediction_horizon_hours,
target_lags = 'auto',
)
label = signalTags
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
experiment_timeout_minutes=30,
blocked_models=["AutoArima"],
enable_early_stopping=True,
training_data=Data,
label_column_name=label,
n_cross_validations=3,
enable_ensembling=False,
verbosity=logging.INFO,
forecasting_parameters = forecasting_parameters)
After training, in order to perform a predictiton I need to feed the "context" according to what I want to predict in the form of a dataframe (where the values for the target column are filled in in case of the context and empty in case of values I want to predict) and then just call forecast. Something like this:
Timestamp Signal
0 2022-08-07T23:00:00Z 63.16
1 2022-08-08T00:00:00Z 62.92
2 2022-08-08T01:00:00Z 62.89
3 2022-08-08T02:00:00Z 62.79
4 2022-08-08T03:00:00Z 62.75
.. ... ...
233 2022-08-23T17:00:00Z nan
234 2022-08-23T18:00:00Z nan
235 2022-08-23T19:00:00Z nan
236 2022-08-23T20:00:00Z nan
237 2022-08-23T21:00:00Z nan
After all this context (pun intended) here is the question/problem.
When I use the above dataframe to forecast ahead I get an error that mentions the following:
ForecastingConfigException:
Message: Expected column(s) target value column not found in y_pred.
InnerException: None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Expected column(s) target value column not found in y_pred.",
"target": "y_pred",
"inner_error": {
"code": "BadArgument",
"inner_error": {
"code": "MissingColumnsInData"
}
},
"reference_code": "ac316505-87e4-4877-a855-65a24c3a796b"
}
}
However, if I feed a slightly different dataframe (where the data to be forecasted has any other time except exactly on the hour, i.e. 10h30,11h01, 10h23 etc.) it works normally. If I give it something like this:
Timestamp Signal
0 2022-08-07T23:00:00Z 63.16
1 2022-08-08T00:00:00Z 62.92
2 2022-08-08T01:00:00Z 62.89
3 2022-08-08T02:00:00Z 62.79
4 2022-08-08T03:00:00Z 62.75
.. ... ...
233 2022-08-23T17:00:01Z nan
234 2022-08-23T18:00:01Z nan
235 2022-08-23T19:00:01Z nan
236 2022-08-23T20:00:01Z nan
237 2022-08-23T21:00:01Z nan
It outputs good results. What gives?
I have tried resetting the index of the dataframe, replace None with nan but nothing seems to work. Azure Automl can predict any date except ones that are on the hour.
What can I do to fix this?
Thanks!
I managed to get it to work by changing how I call the forecast model.
Taking into account these variables:
For a univariate series, instead of using this:
model.forecast(x, y)
I need to call:
model.forecast(df, y)
Remember that to call forecast you need to supply the arguments in a dataframe or in numpy array