Let's assume I have a dataframe D:
D = pd.DataFrame({'a': [1,2,3], 'b': [3,3,4], 'c': [2, 4, 0], 'd': [-1,-1,-1]})
D
a b c d
0 1 3 2 -1
1 2 3 4 -1
2 3 4 0 -1
I want to sort the values in columns a, b, c in, say, first two rows, so the expected result would be
a b c d
0 1 2 3 -1
1 2 3 4 -1
2 3 4 0 -1
To avoid chained assignment warning I try to do it with df.loc:
D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1)
min median max
0 1.0 2.0 3.0
1 2.0 3.0 4.0
D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1)
D
a b c d
0 NaN NaN NaN -1
1 NaN NaN NaN -1
2 3.0 4.0 0.0 -1
For some reason the values don't set. I thought that it could because of different column names but
D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1).rename(columns={"min": "a", "median": "b", "max": "c"})
and
D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1).to_numpy()
didn't work too. Changing the values manually isn't an option because in the real example I have about 800k rows instead of 3 and want to keep the operation vectorised. So what should the correct way of doing it?
Use numpy.sort
:
D.loc[[0,1], ['a', 'b', 'c']] = np.sort(D.loc[[0,1], ['a', 'b', 'c']], axis=1)
print (D)
a b c d
0 1 2 3 -1
1 2 3 4 -1
2 3 4 0 -1
If need assign aggregate function convert ouput to numpy array by DataFrame.to_numpy
:
D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1).to_numpy()
print (D)
a b c d
0 1 2 3 -1
1 2 3 4 -1
2 3 4 0 -1