I have a pandas dataframe like below:
meter IDs | date | 00:00 | 00:30 | 01:00 | 01:30 | ....... | 23:00 | 23:30 |
---|---|---|---|---|---|---|---|---|
1 | 2020-09-01 | 0.30 | 0.40 | 0.41 | 0.42 | ....... | 0.47 | 0.39 |
1 | 2020-09-02 | 0.36 | 0.39 | nan | nan | ....... | 0.53 | 0.41 |
1 | 2020-09-03 | 0.26 | 0.33 | 0.38 | nan | ....... | nan | 0.45 |
1 | 2020-09-04 | 0.36 | 0.40 | 0.41 | 0.42 | ....... | 0.69 | nan |
1 | 2020-09-05 | 0.30 | 0.40 | 0.41 | 0.42 | ....... | 0.69 | 0.62 |
: | : | : | : | : | : | ....... | : | : |
1 | 2020-12-31 | 0.30 | 0.40 | 0.41 | 0.42 | ....... | 0.69 | 0.62 |
2 | 2020-09-01 | 0.55 | 0.57 | nan | 0.60 | ....... | 0.68 | 0.62 |
2 | 2020-09-02 | 0.61 | 0.59 | 0.66 | nan | ....... | nan | nan |
2 | 2020-09-03 | 0.64 | nan | 0.66 | 0 | ....... | 0.66 | 0.67 |
2 | 2020-09-04 | 0.58 | 0.65 | 0.65 | 0.71 | ....... | 0.69 | 0.66 |
2 | 2020-09-05 | 0.49 | 0.52 | 0.55 | nan | ....... | nan | 0.62 |
: | : | : | : | : | : | ....... | : | : |
2 | 2020-12-31 | 0.48 | 0.51 | nan | 0.52 | ....... | 0.63 | 0.61 |
I want to fill the missing values for the individual meter IDs considering mean of the previous and next 2 values on the same day and same time of previous 2 days?
How can i fix this problem? Thanks in advance.
pandas' fillna()
function is a built-in tool for these scenarios. You can use this reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html. I think you might be able to integrate the mean and previous values. But, for a simple example, if you want to replace all NaN elements in columns ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively, and have a df below:
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 NaN NaN NaN NaN
3 NaN 3.0 NaN 4.0
values = {"A": 0, "B": 1, "C": 2, "D": 3}
df.fillna(value=values)
Then you get:
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 2.0 1.0
2 0.0 1.0 2.0 3.0
3 0.0 3.0 2.0 4.0
Or, a custom filling function that iterates over the data frame might help:
import pandas as pd
df.sort_values(by=['meter IDs', 'date'], inplace=True)
def fill_missing(df):
for meter_id, group in df.groupby('meter IDs'):
for col in df.columns[2:]:
for i in range(len(df)):
if pd.isnull(df.at[i, col]):
prev_vals = group[(group['date'] < df.at[i, 'date']) & (group[col].notnull())].tail(2)[col]
next_vals = group[(group['date'] > df.at[i, 'date']) & (group[col].notnull())].head(2)[col]
mean_val = pd.concat([prev_vals, next_vals]).mean()
df.at[i, col] = mean_val
fill_missing(df)
print(df)