I'm trying to model a time series for a stock price with the following code:
import opendatasets as od
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.inspection import permutation_importance
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf, pacf
import datetime
import warnings
warnings.filterwarnings('ignore')
and I got the following structure:
date close
0 2021-01-04 04:00:00-05:00 133.740000
1 2021-01-04 05:00:00-05:00 133.600000
2 2021-01-04 06:00:00-05:00 134.220000
3 2021-01-04 07:00:00-05:00 133.750000
4 2021-01-04 08:00:00-05:00 134.020000
... ... ...
4070 2021-12-30 13:30:00-05:00 178.975006
4071 2021-12-30 14:30:00-05:00 178.960007
4072 2021-12-30 15:30:00-05:00 178.250000
4073 2021-12-30 16:00:00-05:00 178.190000
4074 2021-12-30 17:00:00-05:00 177.980000
As a result I have
0 int64
dtype: object
However when I try to split the time series to train and test for an MLP model and fit it like this:
# splitting time series to train and test subsets
y_train = df.iloc[:-8766, :].copy()
y_test = df.iloc[-8766:, :].copy()
# Unobserved Components model definition
model = sm.tsa.UnobservedComponents(y_train,
level='dtrend',
irregular=True,
stochastic_level = False,
stochastic_trend = False,
stochastic_freq_seasonal = [False, False, False],
freq_seasonal=[{'period': 24, 'harmonics': 1},
{'period': 168, 'harmonics': 1},
{'period': 8766, 'harmonics': 2}])
# fitting model to train data
model_results = model.fit()
# printing statsmodels summary for model
print(model_results.summary())
I got the following error:
ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
I tried to convert the date
column from string to date but with no success.
The np.asarray(df)
returns the following
array([['2021-01-04 04:00:00-05:00', 133.740000]
['2021-01-04 05:00:00-05:00', 133.600000]
['2021-01-04 06:00:00-05:00', 134.220000]
...
['2021-31-12 17:00:00-05:00', 177.980000]], dtype = object)
I do not know if it is a problem with the date or close column and what to do to fix it.
If the issue still persists after using df['date'] = pd.to_datetime(df['date'])
, it is possible that there are some values in 'date'
column which are not in 'datetime' dtype
. You can identify those values by the following code:
df['date'] = pd.to_datetime(df['date'], errors='coerce', utc=True)
This will set those values to NaT(Not a Time)
. After this you can handle the missing values.