I have a dataframe like below, and would like to fill value from previous row value based on id field, so, any record with 4 in colA get what the previous colA=3 records' colC value, colB stays the same.
I've tried ffill
, and shift
, but couldn't get what I want.
colA | colB | colC |
---|---|---|
3 | air | 00JTHYU1 |
4 | ground | 00JTHYU0 |
4 | ground | 00JTHYU0 |
4 | ground | 00JTHYU0 |
3 | air | 00JTHYU4 |
4 | ground | 00JTHYU0 |
4 | air | 00JTHYU0 |
3 | ground | 00JTHYU7 |
4 | ground | 00JTHYU0 |
Result:
colA | colB | colC |
---|---|---|
3 | air | 00JTHYU1 |
4 | ground | 00JTHYU1 |
4 | ground | 00JTHYU1 |
4 | ground | 00JTHYU1 |
3 | air | 00JTHYU4 |
4 | ground | 00JTHYU4 |
4 | air | 00JTHYU4 |
3 | ground | 00JTHYU7 |
4 | ground | 00JTHYU7 |
Your question requires us to assume a lot, but assuming:
This is an example of a solution to your problem:
from pandas import DataFrame
df = DataFrame({
'colA': [3, 4, 4, 4, 3, 4, 4, 3, 4],
'colB': ['a', 'g', 'g', 'g', 'a', 'g', 'a', 'g', 'g'],
'colC': ['1', '0', '0', '0', '4', '0', '0', '7', '0']
})
marker = df['colC'].where(df['colA'] == 3)
filled = marker.ffill()
df.loc[df['colA'] == 4, 'colC'] = filled
print(df)
Note that I've shortened the values of 'colB'
and 'colC'
for brevity and ease.
Output:
colA colB colC
0 3 a 1
1 4 g 1
2 4 g 1
3 4 g 1
4 3 a 4
5 4 g 4
6 4 a 4
7 3 g 7
8 4 g 7
This works by creating a new series that only contains the value of 'colC'
when 'colA'
has the value 3
, and NaN
otherwise.
marker = df['colC'].where(df['colA'] == 3)
You can then use .ffill()
to forward fill and replace NaN
with the previous non-Nan
values:
filled = marker.ffill()
Finally, just update the rows where 'colA'
has the value 4
:
df.loc[df['colA'] == 4, 'colC'] = filled