pythonpandasnumpydatetimetime-series

What is the meaning of: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data) and how can be solved?


I'm trying to model a time series for a stock price with the following code:

import opendatasets as od
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import tensorflow as tf

from sklearn.preprocessing import StandardScaler
from sklearn.inspection import permutation_importance
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf, pacf
import datetime

import warnings
warnings.filterwarnings('ignore')

and I got the following structure:

                     date         close
0   2021-01-04 04:00:00-05:00   133.740000
1   2021-01-04 05:00:00-05:00   133.600000
2   2021-01-04 06:00:00-05:00   134.220000
3   2021-01-04 07:00:00-05:00   133.750000
4   2021-01-04 08:00:00-05:00   134.020000
... ... ...
4070    2021-12-30 13:30:00-05:00   178.975006
4071    2021-12-30 14:30:00-05:00   178.960007
4072    2021-12-30 15:30:00-05:00   178.250000
4073    2021-12-30 16:00:00-05:00   178.190000
4074    2021-12-30 17:00:00-05:00   177.980000

As a result I have

0    int64
dtype: object

However when I try to split the time series to train and test for an MLP model and fit it like this:

# splitting time series to train and test subsets
y_train = df.iloc[:-8766, :].copy()
y_test = df.iloc[-8766:, :].copy()

# Unobserved Components model definition
model = sm.tsa.UnobservedComponents(y_train,
                                    level='dtrend',
                                    irregular=True,
                                    stochastic_level = False,
                                    stochastic_trend = False,
                                    stochastic_freq_seasonal = [False, False, False],
                                    freq_seasonal=[{'period': 24, 'harmonics': 1},
                                                    {'period': 168, 'harmonics': 1},
                                                    {'period': 8766, 'harmonics': 2}])
# fitting model to train data
model_results = model.fit()

# printing statsmodels summary for model
print(model_results.summary())

I got the following error:

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

I tried to convert the date column from string to date but with no success.

The np.asarray(df) returns the following

array([['2021-01-04 04:00:00-05:00', 133.740000]
       ['2021-01-04 05:00:00-05:00', 133.600000]
       ['2021-01-04 06:00:00-05:00', 134.220000]
       ...
       ['2021-31-12 17:00:00-05:00', 177.980000]], dtype = object)

I do not know if it is a problem with the date or close column and what to do to fix it.


Solution

  • If the issue still persists after using df['date'] = pd.to_datetime(df['date']), it is possible that there are some values in 'date' column which are not in 'datetime' dtype. You can identify those values by the following code:

    df['date'] = pd.to_datetime(df['date'], errors='coerce', utc=True)
    

    This will set those values to NaT(Not a Time). After this you can handle the missing values.