python-2.7pandasnan

How to find the first Non-NAN data before Nan in one column in Pandas


For example, I have some data like this:

column = pd.Series([1,2,3,np.nan,4,np.nan,7])
print column

Executing the command, the result looks like:

0    1.0
1    2.0
2    3.0
3    NaN
4    4.0
5    NaN
6    7.0

Now I want to know what is the first value before each NaN value, such as 3.0 which is before the first NaN. And 4.0 is the result before the second NaN value. Is there any built-in function in pandas which can complete this or shall I write a for loop to do this?


Solution

  • Same idea as @jezrael... numpyfied.

    column[np.append(np.isnan(column.values)[1:], False)]
    
    2    3.0
    4    4.0
    dtype: float64
    

    Complete with pd.Series reconstruction

    m = np.append(np.isnan(column.values)[1:], False)
    pd.Series(column.values[m], column.index[m])
    
    2    3.0
    4    4.0
    dtype: float64
    

    Not nearly as quick but intuitive. Group by the the cumsum of isnull and take the last value. Of this result, get rid of last row.

    column.groupby(column.isnull().cumsum()).last().iloc[:-1]
    
    0    3.0
    1    4.0
    dtype: float64