pythonpandasdataframenonblank

Find next non-blank row in pandas column


Given a Pandas dataframe of

df = pd.DataFrame({'a': [None,1,None,None,2,None], 'b': [None,None,None,3,None,4]})

     a    b
0  NaN  NaN
1  1.0  NaN
2  NaN  NaN
3  NaN  3.0
4  2.0  NaN
5  NaN  4.0

I want to return the position and value of the next non-blank b after a non-blank a, and put this position and value into new columns next to the non-blank a, so:

     a    b position  value
0  NaN  NaN      NaN    NaN
1  1.0  NaN        3    3.0
2  NaN  NaN      NaN    NaN
3  NaN  3.0      NaN    NaN
4  2.0  NaN        5    4.0
5  NaN  4.0      NaN    NaN

If it makes any difference, the index is a Date_Time value and the position result should be the Date_Time index of a.

There will not be an a and b on the same line - they are the minimum and maximum of a (tank level) over time.


Solution

  • The idea is to use bfill, on column b but also on the index to get the position:

    df = pd.DataFrame({'a': [None,1,None,None,2,None], 'b': [None,None,None,3,None,4]})
    df = df.reset_index().rename(columns = {'index':'position'})
    df.loc[df['b'].isna(),'position'] = None
    df.loc[df['a'].notna(),'position'] = df['position'].bfill()
    df.loc[df['a'].isna(),'position'] = None
    df.loc[df['a'].notna(),'value'] = df['b'].bfill()
    df[['a','b', 'position', 'value']]
    

    output

          a    b    position    value
    --  ---  ---  ----------  -------
     0  nan  nan         nan      nan
     1    1  nan           3        3
     2  nan  nan         nan      nan
     3  nan    3         nan      nan
     4    2  nan           5        4
     5  nan    4         nan      nan
    

    Another solution

    More elegant but perhaps slightly less readable. Same idea with bfill but now using where:

    df = pd.DataFrame({'a': [None,1,None,None,2,None], 'b': [None,None,None,3,None,4]})
    df['position'] = df.index.where(df['b'].notna())
    df['position'] = df['position'].bfill().where(df['a'].notna())
    df['value'] = df['b'].bfill().where(df['a'].notna())