I have a dataset which is sorted in this order:
col1 | col2 | col3 |
---|---|---|
a | 1 | r |
a | 1 | s |
a | 2 | t |
a | 2 | u |
a | 3 | v |
a | 3 | w |
b | 4 | x |
b | 4 | y |
b | 5 | z |
b | 5 | q |
b | 6 | w |
b | 6 | e |
I want it to be sorted in the following order:
col1 | col2 | col3 |
---|---|---|
a | 1 | r |
a | 2 | t |
a | 3 | v |
a | 1 | s |
a | 2 | u |
a | 3 | w |
b | 4 | x |
b | 5 | z |
b | 6 | w |
b | 4 | y |
b | 5 | q |
b | 6 | e |
I want the col2 to be in repetitive fashion, as in, for col1 'a' values, it should be 1,2,3,4 and then 1,2,3,4 again instead of 1,1,2,2,3,3,4,4. I have used the following code, but it is not working:
import pandas as pd
# Creating the DataFrame
data = {
'col1': ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
'col2': [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6],
'col3': ['r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'q', 'w', 'e']
}
df = pd.DataFrame(data)
# Sort by col1, then reorder col2 within each group
df_sorted = df.sort_values(by=['col1', 'col2']).reset_index(drop=True)
df_sorted = df_sorted.groupby('col1', group_keys=False).apply(lambda x: x.sort_values('col2'))
# Display the sorted dataframe
print(df_sorted)
Use groupby.cumcount
to form a secondary key for sorting:
out = (df.assign(key=lambda x: x.groupby(['col1', 'col2']).cumcount())
.sort_values(by=['col1', 'key', 'col2'])
.drop(columns='key')
)
Note that you can avoid creating the intermediate column using numpy.lexsort
:
import numpy as np
out = df.iloc[np.lexsort([df['col2'],
df.groupby(['col1', 'col2']).cumcount(),
df['col1']])]
Output:
col1 col2 col3
0 a 1 r
2 a 2 t
4 a 3 v
1 a 1 s
3 a 2 u
5 a 3 w
6 b 4 x
8 b 5 z
10 b 6 w
7 b 4 y
9 b 5 q
11 b 6 e
Intermediate (before sorting):
col1 col2 col3 key
0 a 1 r 0
1 a 1 s 1
2 a 2 t 0
3 a 2 u 1
4 a 3 v 0
5 a 3 w 1
6 b 4 x 0
7 b 4 y 1
8 b 5 z 0
9 b 5 q 1
10 b 6 w 0
11 b 6 e 1