pythonpandasdataframe

How to sort python pandas dataframe in repetitive order after groupby?


I have a dataset which is sorted in this order:

col1 col2 col3
a 1 r
a 1 s
a 2 t
a 2 u
a 3 v
a 3 w
b 4 x
b 4 y
b 5 z
b 5 q
b 6 w
b 6 e

I want it to be sorted in the following order:

col1 col2 col3
a 1 r
a 2 t
a 3 v
a 1 s
a 2 u
a 3 w
b 4 x
b 5 z
b 6 w
b 4 y
b 5 q
b 6 e

I want the col2 to be in repetitive fashion, as in, for col1 'a' values, it should be 1,2,3,4 and then 1,2,3,4 again instead of 1,1,2,2,3,3,4,4. I have used the following code, but it is not working:

import pandas as pd

# Creating the DataFrame
data = {
    'col1': ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
    'col2': [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6],
    'col3': ['r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'q', 'w', 'e']
}

df = pd.DataFrame(data)

# Sort by col1, then reorder col2 within each group
df_sorted = df.sort_values(by=['col1', 'col2']).reset_index(drop=True)

df_sorted = df_sorted.groupby('col1', group_keys=False).apply(lambda x: x.sort_values('col2'))

# Display the sorted dataframe
print(df_sorted)


Solution

  • Use groupby.cumcount to form a secondary key for sorting:

    out = (df.assign(key=lambda x: x.groupby(['col1', 'col2']).cumcount())
             .sort_values(by=['col1', 'key', 'col2'])
             .drop(columns='key')
          )
    

    Note that you can avoid creating the intermediate column using numpy.lexsort:

    import numpy as np
    
    out = df.iloc[np.lexsort([df['col2'],
                              df.groupby(['col1', 'col2']).cumcount(),
                              df['col1']])]
    

    Output:

       col1  col2 col3
    0     a     1    r
    2     a     2    t
    4     a     3    v
    1     a     1    s
    3     a     2    u
    5     a     3    w
    6     b     4    x
    8     b     5    z
    10    b     6    w
    7     b     4    y
    9     b     5    q
    11    b     6    e
    

    Intermediate (before sorting):

       col1  col2 col3  key
    0     a     1    r    0
    1     a     1    s    1
    2     a     2    t    0
    3     a     2    u    1
    4     a     3    v    0
    5     a     3    w    1
    6     b     4    x    0
    7     b     4    y    1
    8     b     5    z    0
    9     b     5    q    1
    10    b     6    w    0
    11    b     6    e    1