I have a dataframe that looks something like this:
1 2 3 'String'
'' 4 X ''
'' 5 X ''
'' 6 7 'String'
'' 1 Y ''
And I want to change the Xs and Ys (put here just to visualize) to the value corresponding to the same column when the last column = 'String'. So, the Xs would become a 3, and the Y would be 7:
1 2 3 'String'
'' 4 3 ''
'' 5 3 ''
'' 6 7 'String'
'' 1 7 ''
The reference value is the same until another 'parent' row comes around. So the first 3 remains until there comes another 'String' parent round.
I tried generating another dataframe containing where there's 'String' and filling from idx to idx+1 with the value, but it's too slow.
This is really similar to a forward fill (pd.ffill()), but not exactly, and I don't really know if it's feasible to turn my problem into a ffill() problem.
Updated solution:
This situation can be solved using .ffill()
but, you just have to replace the random int values with `NaN` values,
df.loc[df['D'] != 'String', 'C'] = np.nan
What this does is it finds where df['D']
is not 'String' and assigns a NaN value to it.
Now, the last step is simple, just use .ffill()
df['C'] = df['C'].ffill()
Here is the final result:
>>> df
C D
0 3.0 String
1 3.0
2 3.0
3 7.0 String
4 7.0