[SOLVED] fill row value based on column identifier

fill row value based on column identifier

I have a dataframe like below, and would like to fill value from previous row value based on id field, so, any record with 4 in colA get what the previous colA=3 records' colC value, colB stays the same.

I've tried ffill, and shift, but couldn't get what I want.

colA	colB	colC
3	air	00JTHYU1
4	ground	00JTHYU0
4	ground	00JTHYU0
4	ground	00JTHYU0
3	air	00JTHYU4
4	ground	00JTHYU0
4	air	00JTHYU0
3	ground	00JTHYU7
4	ground	00JTHYU0

Result:

colA	colB	colC
3	air	00JTHYU1
4	ground	00JTHYU1
4	ground	00JTHYU1
4	ground	00JTHYU1
3	air	00JTHYU4
4	ground	00JTHYU4
4	air	00JTHYU4
3	ground	00JTHYU7
4	ground	00JTHYU7

Solution

Your question requires us to assume a lot, but assuming:

you are using Pandas dataframes
your column A contains 3s and 4s, but may contain other values
your column C follows the same pattern as column A, but this is not guaranteed

This is an example of a solution to your problem:

from pandas import DataFrame

df = DataFrame({
    'colA': [3, 4, 4, 4, 3, 4, 4, 3, 4],
    'colB': ['a', 'g', 'g', 'g', 'a', 'g', 'a', 'g', 'g'],
    'colC': ['1', '0', '0', '0', '4', '0', '0', '7', '0']
})

marker = df['colC'].where(df['colA'] == 3)
filled = marker.ffill()

df.loc[df['colA'] == 4, 'colC'] = filled

print(df)

Note that I've shortened the values of 'colB' and 'colC' for brevity and ease.

Output:

   colA colB colC
0     3    a    1
1     4    g    1
2     4    g    1
3     4    g    1
4     3    a    4
5     4    g    4
6     4    a    4
7     3    g    7
8     4    g    7

This works by creating a new series that only contains the value of 'colC' when 'colA' has the value 3, and NaN otherwise.

marker = df['colC'].where(df['colA'] == 3)

You can then use .ffill() to forward fill and replace NaN with the previous non-Nan values:

filled = marker.ffill()

Finally, just update the rows where 'colA' has the value 4:

df.loc[df['colA'] == 4, 'colC'] = filled