pythonpandas

fill row value based on column identifier


I have a dataframe like below, and would like to fill value from previous row value based on id field, so, any record with 4 in colA get what the previous colA=3 records' colC value, colB stays the same.

I've tried ffill, and shift, but couldn't get what I want.

colA colB colC
3 air 00JTHYU1
4 ground 00JTHYU0
4 ground 00JTHYU0
4 ground 00JTHYU0
3 air 00JTHYU4
4 ground 00JTHYU0
4 air 00JTHYU0
3 ground 00JTHYU7
4 ground 00JTHYU0

Result:

colA colB colC
3 air 00JTHYU1
4 ground 00JTHYU1
4 ground 00JTHYU1
4 ground 00JTHYU1
3 air 00JTHYU4
4 ground 00JTHYU4
4 air 00JTHYU4
3 ground 00JTHYU7
4 ground 00JTHYU7

Solution

  • Your question requires us to assume a lot, but assuming:

    This is an example of a solution to your problem:

    from pandas import DataFrame
    
    df = DataFrame({
        'colA': [3, 4, 4, 4, 3, 4, 4, 3, 4],
        'colB': ['a', 'g', 'g', 'g', 'a', 'g', 'a', 'g', 'g'],
        'colC': ['1', '0', '0', '0', '4', '0', '0', '7', '0']
    })
    
    marker = df['colC'].where(df['colA'] == 3)
    filled = marker.ffill()
    
    df.loc[df['colA'] == 4, 'colC'] = filled
    
    print(df)
    

    Note that I've shortened the values of 'colB' and 'colC' for brevity and ease.

    Output:

       colA colB colC
    0     3    a    1
    1     4    g    1
    2     4    g    1
    3     4    g    1
    4     3    a    4
    5     4    g    4
    6     4    a    4
    7     3    g    7
    8     4    g    7
    

    This works by creating a new series that only contains the value of 'colC' when 'colA' has the value 3, and NaN otherwise.

    marker = df['colC'].where(df['colA'] == 3)
    

    You can then use .ffill() to forward fill and replace NaN with the previous non-Nan values:

    filled = marker.ffill()
    

    Finally, just update the rows where 'colA' has the value 4:

    df.loc[df['colA'] == 4, 'colC'] = filled