I am trying to make a timeseries forecasting using pycaret autoML package using the data in the following link parts_revenue_data in google colab. When I try to compare the models and find the best the code hangs and stays at 20%.
The code can be found in the following
# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"
def what_is_installed():
from pycaret import show_versions
show_versions()
try:
what_is_installed()
except ModuleNotFoundError:
!pip install pycaret
what_is_installed()
import pandas as pd
import numpy as np
import pycaret
pycaret.__version__ # 3.1.0
df = pd.read_csv('parts_revenue.csv', delimiter=';')
from pycaret.utils.time_series import clean_time_index
cleaned = clean_time_index(data=df,
index_col='Posting Date',
freq='D')
# Verify the resulting DataFrame
print(cleaned.head(n=50))
# parts['MA12'] = parts['Parts Revenue'].rolling(12).mean()
# import plotly.express as px
# fig = px.line(parts, x="Posting Date", y=["Parts Revenue",
# "MA12"], template = 'plotly_dark')
# fig.show()
import time
import numpy as np
from pycaret.time_series import *
# We want to forecast the next 12 days of data and we will use 3
# fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3
# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab,
# Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook
# saved size is reduced.
fig_kwargs = {
# "renderer": "notebook",
"renderer": "png",
"width": 1000,
"height": 600,
}
"""## EDA"""
eda = TSForecastingExperiment()
eda.setup(cleaned,
fh=fh,
numeric_imputation_target = 0,
fig_kwargs=fig_kwargs
)
eda.plot_model()
eda.plot_model(plot="diagnostics",
fig_kwargs={"height": 800, "width": 1000}
)
eda.plot_model(
plot="diff",
data_kwargs={"lags_list": [[1], [1, 7]],
"acf": True,
"pacf": True,
"periodogram": True},
fig_kwargs={"height": 800, "width": 1500} )
"""## Modeling"""
exp = TSForecastingExperiment()
exp.setup(data = cleaned,
fh=fh,
numeric_imputation_target = 0.0,
fig_kwargs=fig_kwargs,
seasonal_period = 5
)
# compare baseline models
best = exp_ts.compare_models(errors = 'raise') # CODE HANGS HERE!
# plot forecast for 36 months in future
plot_model(best,
plot = 'forecast',
data_kwargs = {'fh' : 24}
)
Is this related with a bug in pycaret or is something wrong with the code?
Note: I do not have enough rep to comment, so I'll drop this quasi-workaround here and I can delete it later if needed or move it to a comment once I have sufficient rep
I have also experienced compare_models
for time series to be uncannily slow (i.e., over 10 min runtime on a dataset with ~4000 records) when on my MBP with M1 Max. I have not tried it in Colab.
Noticing that it was hanging on the Auto ARIMA one, I excluded it from the list like below. This reduced the run time to roughly 1 minute.
# compare baseline models
best = exp_ts.compare_models(errors="raise", exclude="auto_arima")
While I'm aware this is not a fix per se, perhaps it can help you get unblocked.
Environment details:
Python 3.10.12
pycaret==3.1.0