I have a CSV file like this:
Ngày(Date),Số(Number)
07/03/2025,8
07/03/2025,9
...
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
...
(Each day has 27 numbers)
I want to predict a list of 27 numbers on the next day using LSTM. It keeps getting an error on this step:
data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())
with
KeyError: 'Số'
(which means 'Number')
Here is my code:
import numpy as np
import pandas as pd
df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()
grouped_data = df.groupby("Ngày")[["Số"]].apply(lambda x: list(map(int, x["Số"].values))).reset_index()
grouped_data["Số"] = grouped_data["Số"].apply(lambda x: eval(x) if isinstance(x, str) else x)
data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())
First: when it reads data then it should convert values to integers so there is no need to use map(int, ...)
. And apply( ...list ...)
creates lists so there is no need to use eval()
.
Problem is because groupby().apply()
created DataFrame with name 0
instead of "Số"
and later it raised error in grouped_data["Số"].apply(...)
, not grouped_data.loc[:, "Số"]
You can reduce code to
grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index(name="Số")
which will convert to list and set name "Số"
again. I uses ["Số"]
instead of [["Số"]]
Because pandas keep data as numpy.array so you can get
data_matrix = grouped_data["Số"].values
Full code used for tests:
I used io.StringIO
only to create file-like object in memory - so everyone can simply copy and run it - but you can use filename.
import numpy as np
import pandas as pd
text = '''Ngày,Số
07/03/2025,8
07/03/2025,9
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
'''
import io
df = pd.read_csv(io.StringIO(text), encoding="utf-8", sep=",")
#df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()
print('----')
print(df)
print('----')
print(df.dtypes)
grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index(name="Số")
print('---')
print(grouped_data)
print('----')
print('type:', type(grouped_data))
print('---')
print('type:', type(grouped_data["Số"].values))
print('----')
print('values :', grouped_data["Số"].values)
print('np.array:', np.array(grouped_data["Số"]))
data_matrix = grouped_data["Số"].values
#data_matrix = np.array(grouped_data["Số"])
print('----')
print('data_matrix:', data_matrix)
Result:
----
Ngày Số
0 07/03/2025 8
1 07/03/2025 9
2 06/03/2025 6
3 06/03/2025 10
4 06/03/2025 18
5 06/03/2025 14
----
Ngày object
Số int64
dtype: object
---
Ngày Số
0 06/03/2025 [6, 10, 18, 14]
1 07/03/2025 [8, 9]
----
type: <class 'pandas.core.frame.DataFrame'>
---
type: <class 'numpy.ndarray'>
----
values : [list([6, 10, 18, 14]) list([8, 9])]
np.array: [list([6, 10, 18, 14]) list([8, 9])]
----
data_matrix: [list([6, 10, 18, 14]) list([8, 9])]