My input is a dataframe :
import pandas as pd
df = pd.DataFrame({'ID': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B', 5: 'B', 6: 'B', 7: 'C', 8: 'C', 9: 'C', 10: 'D', 11: 'D', 12: 'D', 13: 'E', 14: 'E', 15: 'E', 16: 'E'}, 'VALUE': {0: None, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0, 5: None, 6: None, 7: 5.0, 8: None, 9: 6.0, 10: None, 11: None, 12: None, 13: None, 14: 7.0, 15: 8.0, 16: None}})
print(df)
ID VALUE
0 A NaN
1 A 1.0
2 A 2.0
3 B 3.0
4 B 4.0
5 B NaN
6 B NaN
7 C 5.0
8 C NaN
9 C 6.0
10 D NaN
11 D NaN
12 D NaN
13 E NaN
14 E 7.0
15 E 8.0
16 E NaN
I want to make a dictionary that will define the type of presence of nan values in each group of ids. I tried with the code below but I'm not sure how to handle nan in id D
and E
. My definition of mixed-none
is when a group has at least two types of nans.
result = {}
for id, df1 in df.groupby('ID'):
if pd.isna(df1.iat[0, 1]):
result[id] = 'leading-none'
elif pd.isna(df1.iat[-1, 1]):
result[id] = 'trailing-none'
elif df1['VALUE'].isna().any():
result[id] = 'inter-none'
print(result)
{'A': 'leading-none',
'B': 'trailing-none',
'C': 'inter-none',
'D': 'leading-none',
'E': 'leading-none'}
My expected output is this :
{'A': 'leading-none',
'B': 'trailing-none',
'C': 'inter-none',
'D': 'full-none',
'E': 'mixed-none'}
Do you guys have an idea how to deal with that ?
You can just use carefully crafted conditionals:
result = {}
for id_, m in df['VALUE'].isna().groupby(df['ID']):
if m.all():
result[id_] = 'full-none'
elif m.iloc[0]:
if m.iloc[-1]:
result[id_] = 'mixed-none'
else:
result[id_] = 'leading-none'
elif m.iloc[-1]:
result[id_] = 'trailing-none'
else:
result[id_] = 'inter-none'
Output:
{'A': 'leading-none',
'B': 'trailing-none',
'C': 'inter-none',
'D': 'full-none',
'E': 'mixed-none'}
Graph of decision: