I am trying to do time series forecasting on a bunch of classes and date time but my graph looks like this for some reason my full code is below:
from google.colab import drive
drive.mount('/content/gdrive', force_remount = True)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
data = pd.read_csv('gdrive/My Drive/Colab_Notebooks/classproject/classdata.csv', parse_dates=['time_date'], index_col='time_date')
class_id = data['class_id']
time_date = data.index.date
data['date'] = data.index.date
class_id = data['class_id']
time_date = data.index.to_series()
m1 = class_id.ne(class_id.shift())
m2 = time_date.dt.date.ne(time_date.dt.date.shift())
data['count'] = data.groupby((m1 | m2).cumsum()).cumcount().add(1).values
out = data[data.groupby(data.index.date).transform('size').gt(1)]
!pip install pandas-datareader
import pandas_datareader.data as web
import datetime
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.ylabel('Amount of classes')
plt.xlabel('Date')
plt.xticks(rotation=45)
out.index = pd.to_datetime(out['date'], format='%Y-%m-%d')
plt.plot(out.index, out['count'], )
while the blog where I got this time series code from has this kind of result
So I'm not sure if I should proceed or not XD
my input data is this:
timestamp / class_id
2021-09-27 06:00:00 / A
2021-09-27 03:00:00 / A
2021-09-27 01:00:00 / A
2021-09-27 08:29:00 / C
2021-05-23 08:08:49 / B
2021-05-23 03:21:49 / B
2021-05-23 01:22:11 / C
after processing it and adding count and date columns:
count / timestamp / class_id / date
1 / 2021-09-27 06:00:00 / A / 2021-09-27
2 / 2021-09-27 03:00:00 / A / 2021-09-27
3 / 2021-09-27 01:00:00 / A / 2021-09-27
1 / 2021-09-27 08:29:00 / C / 2021-09-27
1 / 2021-05-23 08:08:49 / B / 2021-05-23
2 / 2021-05-23 03:21:49 / B / 2021-05-23
1 / 2021-05-23 01:22:11 / C / 2021-05-23
I tried a code below but for some reason the first graph is empty
plt.ylabel('Amount of classes')
plt.xlabel('date')
plt.xticks(rotation=45)
out.index = pd.to_datetime(out['date'], format='%Y-%m-%d')
out.groupby('class_id').plot()
plt.plot(out.index, out['count'], )
You are plotting all your class_id
's at the same time. Try plotting by class using something like out.groupby('class_id').plot()
to see if the plots per class make sense and look like you expect.