pythonforecastingu8darts

Error Metric for Backtest and Historical Forecasting in darts are different


When using backtest and historical_forecast in darts I expect the same error. However, when doing a test, I get different MAPE values for the same input variables. Can somebody explain how this can happen? How can I make the two methods comparable?

Example:

import pandas as pd
from darts import TimeSeries
from darts.models import NaiveDrift
from darts.metrics import mape

df = pd.read_csv('AirPassengers.csv')

series = TimeSeries.from_dataframe(df, 'Month', '#Passengers')

print("Backtest MAPE: ", NaiveDrift().backtest(series,
                                            start=12,
                                            forecast_horizon=6,
                                            metric=mape))

historical_forecast = NaiveDrift().historical_forecasts(series,
                          start=12,
                          forecast_horizon=6,
                          verbose=False)


print("Historical Forecast MAPE: ", mape(historical_forecast, series))

Output:

Backtest MAPE: 16.821355933599133

Historical Forecast MAPE: 21.090231183002143

Links

Link to the documentation:https://unit8co.github.io/darts/generated_api/darts.models.forecasting.baselines.html

Link to the dataset: https://www.kaggle.com/datasets/rakannimer/air-passengers


Solution

  • The reason the MAPEs are different is because the data used to compute them are different. Historical_forecast() and backtest() have different default values for the parameter "last_points_only". For historical_forecast(), the parameter is set to True, while for backtest() it is False.

    This means that historical_forecast by default generates one series against which the accuracy is checked. On the other hand, backtest generates multiple series and averages the error among them. The backtest() function has the default "stride" parameter as 1, which means after it makes one prediction, it will move forward one time step and generate another, more forward one step and generate another, and so on. These multiple series will have a different average MAPE than the one series from the historical forecast.

    If last_points_only is set to True (as it is by default in historical_forecast()), when the function steps forward one time step, it will only include the new (last) time series point in the calculation, instead of the whole series again.

    You can check this by setting the "last_points_only" parameter in backtest() to True, and you will get the same result for both functions.