I'm interested in forward filling both single and multiple values in a column in pandas. With the following dataframe:
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df
0 1 2
0 1 2 3
1 4 NaN NaN
2 NaN NaN 9
Forward fill will produce:
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')
df
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9
However, I need an ffill-like method that will do this, or alternatively copy all above values if the values above follow one another:
df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, 5, 9], [None,None,None])
df
0 1 2
0 1 2 3
1 4 NaN NaN
2 NaN 5 9
3 NaN NaN NaN
Resulting in:
df
0 1 2
0 1 2 3
1 4 2 3
2 1 5 9
3 4 5 9
Major edit: In my data the values will always be followed by NaNs in an unknown multiple of the length of values. Take df[0], for instance 1,4 would repeat for as long as there are NaNs. The only rule is that they will be a multiple of the length of the values (2)
You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:
df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
[0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 NaN NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 5.0 NaN
7 NaN NaN NaN
m = df.isna()
g = m.ne(m.shift()).cumsum()
for c in df.columns:
df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 4.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 0.0 5.0 9.0
7 5.0 5.0 10.0
EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile
for generate sequences:
df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
[0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
, [None,None,None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 7.0 NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 6.0 NaN
7 NaN 8.0 NaN
8 NaN NaN NaN
9 NaN NaN NaN
g = (df.notna() & df.shift().isna()).cumsum()
def f(x):
non_miss = x.dropna()
return np.tile(non_miss, int(len(x) // len(non_miss) + 2))[:len(x)]
df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 7.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 7.0 6.0 9.0
7 0.0 8.0 10.0
8 5.0 6.0 3.0
9 7.0 8.0 8.0