I have the following snippet from a larger dataframe:
pd.DataFrame({
'sig_2':[False, False, True, True, True, True, True, True, True, True, True, False, False, False,
False, True, False, True, True, True, True, True, True, True, True, True, False, False,
False, False, True, True, True, True, False, True, True],
'tr_cnt':[0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2],
'fls_cnt':[53, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 0, 0],
'test1':[0, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 1, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 4, 4, 4, 4, 0, 16, 16],
'test2':[54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, 1, 0, 0],
'check1':[False, False, True, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False, False, False]})
tr_cnt
is simply the count of successive True rows in sig_2
while fls_cnt
does the same for the False rows. test1
and test2
are the total counts of the successive rows. check1
is where the following condition is True:
df['test1'] > 2) & (df['fls_cnt'].shift(1) >= 2)
What i would like to do is forward fill the True values in the check1
column by the corresponding value in the test1
column.
For example, the first True row in the check1
column has a corresponding value of 9 in the test
column. Thus, i would like to forward fill this value by another 8 rows. The next True has a correpsonding value of 4 so i want this to forward fill by another 3 rows.
I have tried the following different methods:
mask = (df['test1'] > 2) & (df['fls_cnt'].shift(1) >= 2)
# Trial 1
groups = mask.ne(mask.shift()).cumsum()
df['test1'] = df.groupby(groups)['test1'].ffill().astype(bool)
# Trial 2
df['test2'] = df.groupby(mask)['test1'].ffill().astype(bool)
# Trial 3
groups = (~mask).cumsum()
df['test3'] = df.groupby(groups)['test1'].ffill().astype(bool)
# Trial 4
df["test4"] = df.groupby(mask.cumsum())['test1'].ffill().fillna(False).astype(bool)
In all of these methods, they seem to do what i want where the condition is True, but they are also including rows that dont satisfy this condition which is odd.
As @mozway points out in the comments to your question, it is a little unclear what you are trying to do. I suggest here my interpretation of your logic. If it isn't what you were expecting, I'll simply remove the answer. But in that case, I also suggest you review you question and elaborate, be more clear:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'sig_2': [False, False, True, True, True, True, True, True, True, True, True, False, False, False,
False, True, False, True, True, True, True, True, True, True, True, True, False, False,
False, False, True, True, True, True, False, True, True],
'tr_cnt': [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2],
'fls_cnt': [53, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 0, 0],
'test1': [0, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 1, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 4, 4, 4, 4, 0, 16, 16],
'test2': [54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, 1, 0, 0],
'check1': [False, False, True, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False, False, False]
})
df['forward_fill'] = False
for i in df.index[df['check1']]:
end_index = i + df.at[i, 'test1']
if end_index > len(df):
end_index = len(df)
df.loc[i:end_index, 'forward_fill'] = True
print(df)
Which gives:
sig_2 tr_cnt fls_cnt test1 test2 check1 forward_fill
0 False 0 53 0 54 False False
1 False 0 54 0 54 False False
2 True 1 0 9 0 True True
3 True 2 0 9 0 False True
4 True 3 0 9 0 False True
5 True 4 0 9 0 False True
6 True 5 0 9 0 False True
7 True 6 0 9 0 False True
8 True 7 0 9 0 False True
9 True 8 0 9 0 False True
10 True 9 0 9 0 False True
11 False 0 1 0 4 False True
12 False 0 2 0 4 False False
13 False 0 3 0 4 False False
14 False 0 4 0 4 False False
15 True 1 0 1 0 False False
16 False 0 1 0 1 False False
17 True 1 0 9 0 False False
18 True 2 0 9 0 False False
19 True 3 0 9 0 False False
20 True 4 0 9 0 False False
21 True 5 0 9 0 False False
22 True 6 0 9 0 False False
23 True 7 0 9 0 False False
24 True 8 0 9 0 False False
25 True 9 0 9 0 False False
26 False 0 1 0 4 False False
27 False 0 2 0 4 False False
28 False 0 3 0 4 False False
29 False 0 4 0 4 False False
30 True 1 0 4 0 True True
31 True 2 0 4 0 False True
32 True 3 0 4 0 False True
33 True 4 0 4 0 False True
34 False 0 1 0 1 False True
35 True 1 0 16 0 False False
36 True 2 0 16 0 False False
Update: An alternative way (more vectorized) would be:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'sig_2': [False, False, True, True, True, True, True, True, True, True, True, False, False, False,
False, True, False, True, True, True, True, True, True, True, True, True, False, False,
False, False, True, True, True, True, False, True, True],
'tr_cnt': [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2],
'fls_cnt': [53, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 0, 0],
'test1': [0, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 1, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 4, 4, 4, 4, 0, 16, 16],
'test2': [54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, 1, 0, 0],
'check1': [False, False, True, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False, False, False]
})
indices = np.repeat(df.index[df['check1']], df['test1'][df['check1']])
ranges = np.concatenate([np.arange(start, start + length) for start, length in zip(df.index[df['check1']], df['test1'][df['check1']])])
mask = np.zeros(len(df), dtype=bool)
mask[ranges] = True
df['forward_fill'] = mask
print(df)
which retunrs the same output:
sig_2 tr_cnt fls_cnt test1 test2 check1 forward_fill
0 False 0 53 0 54 False False
1 False 0 54 0 54 False False
2 True 1 0 9 0 True True
3 True 2 0 9 0 False True
4 True 3 0 9 0 False True
5 True 4 0 9 0 False True
6 True 5 0 9 0 False True
7 True 6 0 9 0 False True
8 True 7 0 9 0 False True
9 True 8 0 9 0 False True
10 True 9 0 9 0 False True
11 False 0 1 0 4 False False
12 False 0 2 0 4 False False
13 False 0 3 0 4 False False
14 False 0 4 0 4 False False
15 True 1 0 1 0 False False
16 False 0 1 0 1 False False
17 True 1 0 9 0 False False
18 True 2 0 9 0 False False
19 True 3 0 9 0 False False
20 True 4 0 9 0 False False
21 True 5 0 9 0 False False
22 True 6 0 9 0 False False
23 True 7 0 9 0 False False
24 True 8 0 9 0 False False
25 True 9 0 9 0 False False
26 False 0 1 0 4 False False
27 False 0 2 0 4 False False
28 False 0 3 0 4 False False
29 False 0 4 0 4 False False
30 True 1 0 4 0 True True
31 True 2 0 4 0 False True
32 True 3 0 4 0 False True
33 True 4 0 4 0 False True
34 False 0 1 0 1 False False
35 True 1 0 16 0 False False
36 True 2 0 16 0 False False