pythonpandas

How to fill values in a Dataframe depending on values around it


I have a dataframe that looks something like this:

1   2  3  'String'
''  4  X  ''
''  5  X  ''
''  6  7  'String'
''  1  Y  ''

And I want to change the Xs and Ys (put here just to visualize) to the value corresponding to the same column when the last column = 'String'. So, the Xs would become a 3, and the Y would be 7:

1  2  3 'String'
'' 4  3 ''
'' 5  3 ''
'' 6  7 'String'
'' 1  7 ''

The reference value is the same until another 'parent' row comes around. So the first 3 remains until there comes another 'String' parent round.

I tried generating another dataframe containing where there's 'String' and filling from idx to idx+1 with the value, but it's too slow.

This is really similar to a forward fill (pd.ffill()), but not exactly, and I don't really know if it's feasible to turn my problem into a ffill() problem.


Solution

  • Updated solution:

    This situation can be solved using .ffill() but, you just have to replace the random int values with `NaN` values,

    df.loc[df['D'] != 'String', 'C'] = np.nan
    

    What this does is it finds where df['D'] is not 'String' and assigns a NaN value to it.

    Now, the last step is simple, just use .ffill()

    df['C'] = df['C'].ffill()
    

    Here is the final result:

    >>> df
       C    D
    0  3.0  String
    1  3.0        
    2  3.0        
    3  7.0  String
    4  7.0