pythonpandasdataframesortinggroup-by

Sort all rows with a certain value in a group to the last place i the group


I try to sort all rows with a certain value in a group to the last place in every group.

data = {'a':[1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3],
        'b':[100, 300, 200, 222, 500, 300, 222, 100, 200, 222, 300, 500, 400, 100],
        'c':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]}
df1 = pd.DataFrame(data)
df1

Out[29]: 
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
3   1  222   4
4   1  500   5
5   2  300   6
6   2  222   7
7   2  100   8
8   3  200   9
9   3  222  10
10  3  300  11
11  3  500  12
12  3  400  13
13  3  100  14

should be:

Out[31]: 
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
3   1  500   5
4   1  222   4
5   2  300   6
6   2  100   8
7   2  222   7
8   3  200   9
9   3  300  11
10  3  500  12
11  3  400  13
12  3  100  14
13  3  222  10

one of my attempts is:

df1 = df1['b'].eq[222].sort(position='last').groupby(df1['a'])

But I haven't found a solution yet


Solution

  • Use double DataFrame.sort_values - first by b with key parameter and then by a column with kind parameter:

    out = (df1.sort_values('b', key = lambda x: x==222)
              .sort_values('a', ignore_index=True, kind='stable'))
    print (out)
        a    b   c
    0   1  100   1
    1   1  300   2
    2   1  200   3
    3   1  500   5
    4   1  222   4
    5   2  300   6
    6   2  100   8
    7   2  222   7
    8   3  200   9
    9   3  300  11
    10  3  500  12
    11  3  400  13
    12  3  100  14
    13  3  222  10
    

    Solution with helper column should be faster - added by DataFrame.assign and removed by DataFrame.drop:

    out = df1.assign(tmp = df1['b'].eq(222)).sort_values(['a','tmp']).drop('tmp', axis=1)
    print (out)
        a    b   c
    0   1  100   1
    1   1  300   2
    2   1  200   3
    4   1  500   5
    3   1  222   4
    5   2  300   6
    7   2  100   8
    6   2  222   7
    8   3  200   9
    10  3  300  11
    11  3  500  12
    12  3  400  13
    13  3  100  14
    9   3  222  10
    

    Or use np.lexsort for positions and change order by DataFrame.iloc:

    out = df1.iloc[np.lexsort([df1['b'].eq(222), df1.a])]
    print (out)
        a    b   c
    0   1  100   1
    1   1  300   2
    2   1  200   3
    4   1  500   5
    3   1  222   4
    5   2  300   6
    7   2  100   8
    6   2  222   7
    8   3  200   9
    10  3  300  11
    11  3  500  12
    12  3  400  13
    13  3  100  14
    9   3  222  10
    

    For default index add DataFrame.reset_index with drop=True:

    out = out.reset_index(drop=True)
    print (out)
        a    b   c
    0   1  100   1
    1   1  300   2
    2   1  200   3
    3   1  500   5
    4   1  222   4
    5   2  300   6
    6   2  100   8
    7   2  222   7
    8   3  200   9
    9   3  300  11
    10  3  500  12
    11  3  400  13
    12  3  100  14
    13  3  222  10