pandas

Flag how missing values are present in each group


My input is a dataframe :

import pandas as pd

df = pd.DataFrame({'ID': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B', 5: 'B', 6: 'B', 7: 'C', 8: 'C', 9: 'C', 10: 'D', 11: 'D', 12: 'D', 13: 'E', 14: 'E', 15: 'E', 16: 'E'}, 'VALUE': {0: None, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0, 5: None, 6: None, 7: 5.0, 8: None, 9: 6.0, 10: None, 11: None, 12: None, 13: None, 14: 7.0, 15: 8.0, 16: None}})

print(df)
   ID  VALUE
0   A    NaN
1   A    1.0
2   A    2.0
3   B    3.0
4   B    4.0
5   B    NaN
6   B    NaN
7   C    5.0
8   C    NaN
9   C    6.0
10  D    NaN
11  D    NaN
12  D    NaN
13  E    NaN
14  E    7.0
15  E    8.0
16  E    NaN

I want to make a dictionary that will define the type of presence of nan values in each group of ids. I tried with the code below but I'm not sure how to handle nan in id D and E. My definition of mixed-none is when a group has at least two types of nans.

result = {}
for id, df1 in df.groupby('ID'):
    if pd.isna(df1.iat[0, 1]):
        result[id] = 'leading-none'
    elif pd.isna(df1.iat[-1, 1]):
        result[id] = 'trailing-none'
    elif df1['VALUE'].isna().any():
        result[id] = 'inter-none'
print(result)

{'A': 'leading-none',
 'B': 'trailing-none',
 'C': 'inter-none',
 'D': 'leading-none',
 'E': 'leading-none'}

My expected output is this :

{'A': 'leading-none',
 'B': 'trailing-none',
 'C': 'inter-none',
 'D': 'full-none',
 'E': 'mixed-none'}

Do you guys have an idea how to deal with that ?


Solution

  • You can just use carefully crafted conditionals:

    result = {}
    for id_, m in df['VALUE'].isna().groupby(df['ID']):
        if m.all():
            result[id_] = 'full-none'
        elif m.iloc[0]:
            if m.iloc[-1]:
                result[id_] = 'mixed-none'
            else:
                result[id_] = 'leading-none'
        elif m.iloc[-1]:
            result[id_] = 'trailing-none'
        else:
            result[id_] = 'inter-none'
    

    Output:

    {'A': 'leading-none',
     'B': 'trailing-none',
     'C': 'inter-none',
     'D': 'full-none',
     'E': 'mixed-none'}
    

    Graph of decision:

    enter image description here