I have a dataframe comprised of 11 columns one of which, Weekday, is categorical.
columns = ['Series' 'Year' 'Month' 'Day' 'Weekday' 'Number1' 'Number2' 'Number3'
'Number4' 'Number5' 'Number6']
Since data originally was given as a timeseries dataframe, I am using the following approach (Time Series Forecasting Tutorial) to forecast the values of six of the numerical columns utilizing pycaret
python package.
However, during model determination and comparison defined in the following function:
def ml_modelling(train, test) -> None:
"""This function models the given timeseries dataset
Args:
train (pd.DataFrame): _description_
test (pd.DataFrame): _description_
"""
# Now that we have done the train-test-split, we are ready to train a
# machine learning model on the train data, score it on the test data and
# evaluate the performance of our model. In this example, I will use
# PyCaret; an open-source, low-code machine learning library in Python that
# automates machine learning workflows.
numerical_columns = list(train.select_dtypes(include=[np.number]).columns.values)
targets = [col for col in numerical_columns if col.startswith('Number')]
for target_var in targets:
s = setup(data=train,
test_data=test,
target=target_var,
fold_strategy='timeseries',
numeric_features=numerical_columns,
fold=5,
transform_target=True,
session_id=123)
models()
# Now to train machine learning models, you just need to run one line
best = compare_models(sort='MAE')
print(f'Output from compare_models for column {target_var}: \n', best)
print('##############################################################')
I am receiving the following error message:
Traceback (most recent call last):
File "c:/Users/username/OneDrive/Desktop/project/main_script.py", line 64, in <module>
main()
File "c:/Users/username/OneDrive/Desktop/project/main_script.py", line 56, in main
ml_modelling(train, test)
File "c:\Users\username\OneDrive\Desktop\project\utilities.py", line 1076, in ml_modelling
s = setup(data=train,
File "C:\Users\username\Anaconda3\lib\site-packages\pycaret\regression.py", line 571, in setup
return pycaret.internal.tabular.setup(
File "C:\Users\username\Anaconda3\lib\site-packages\pycaret\internal\tabular.py", line 607, in setup
raise ValueError(
ValueError: Column type forced is either target column or doesn't exist in the dataset.
I would appreciate if you let me know what mistakes I am making.
I have to make the following change inside setup
function:
numeric_features=numerical_columns
to
numeric_features=[col for col in numerical_columns if col != target_var]
since, in any given iteration over target variables, the target cannot be considered a numeric_features
anymore.