pythonpandasdataframenumpyffill

Forward fill blocks of above values pandas


I'm interested in forward filling both single and multiple values in a column in pandas. With the following dataframe:

import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9 

Forward fill will produce:

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')
df
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

However, I need an ffill-like method that will do this, or alternatively copy all above values if the values above follow one another:

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, 5, 9], [None,None,None])
df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN   5   9 
3 NaN NaN NaN

Resulting in:

df
    0   1   2
0   1   2   3
1   4   2   3
2   1   5   9 
3   4   5   9

Major edit: In my data the values will always be followed by NaNs in an unknown multiple of the length of values. Take df[0], for instance 1,4 would repeat for as long as there are NaNs. The only rule is that they will be a multiple of the length of the values (2)


Solution

  • You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:

    df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
                       [0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])
    
    
    print (df)
         0    1     2
    0  1.0  2.0   3.0
    1  4.0  NaN   8.0
    2  NaN  5.0   9.0
    3  NaN  NaN  10.0
    4  0.0  2.0   NaN
    5  5.0  NaN   NaN
    6  NaN  5.0   NaN
    7  NaN  NaN   NaN
    

    m = df.isna()
    g = m.ne(m.shift()).cumsum()
    for c in df.columns:
        df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()
    
    print (df)
         0    1     2
    0  1.0  2.0   3.0
    1  4.0  2.0   8.0
    2  1.0  5.0   9.0
    3  4.0  5.0  10.0
    4  0.0  2.0   3.0
    5  5.0  2.0   8.0
    6  0.0  5.0   9.0
    7  5.0  5.0  10.0
    

    EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile for generate sequences:

    df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
                       [0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
                       , [None,None,None], [None,None,None]])
    print (df)
         0    1     2
    0  1.0  2.0   3.0
    1  4.0  NaN   8.0
    2  NaN  5.0   9.0
    3  7.0  NaN  10.0
    4  0.0  2.0   NaN
    5  5.0  NaN   NaN
    6  NaN  6.0   NaN
    7  NaN  8.0   NaN
    8  NaN  NaN   NaN
    9  NaN  NaN   NaN
    

    g = (df.notna() & df.shift().isna()).cumsum()
    
    def f(x):
        non_miss = x.dropna()
        return np.tile(non_miss, int(len(x) // len(non_miss) + 2))[:len(x)]
    
    df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
    print (df)
         0    1     2
    0  1.0  2.0   3.0
    1  4.0  2.0   8.0
    2  1.0  5.0   9.0
    3  7.0  5.0  10.0
    4  0.0  2.0   3.0
    5  5.0  2.0   8.0
    6  7.0  6.0   9.0
    7  0.0  8.0  10.0
    8  5.0  6.0   3.0
    9  7.0  8.0   8.0