python pandas dataframe sorting group-by

Sort all rows with a certain value in a group to the last place i the group

I try to sort all rows with a certain value in a group to the last place in every group.

data = {'a':[1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3],
        'b':[100, 300, 200, 222, 500, 300, 222, 100, 200, 222, 300, 500, 400, 100],
        'c':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]}
df1 = pd.DataFrame(data)
df1

Out[29]: 
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
3   1  222   4
4   1  500   5
5   2  300   6
6   2  222   7
7   2  100   8
8   3  200   9
9   3  222  10
10  3  300  11
11  3  500  12
12  3  400  13
13  3  100  14

should be:

Out[31]: 
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
3   1  500   5
4   1  222   4
5   2  300   6
6   2  100   8
7   2  222   7
8   3  200   9
9   3  300  11
10  3  500  12
11  3  400  13
12  3  100  14
13  3  222  10

one of my attempts is:

df1 = df1['b'].eq[222].sort(position='last').groupby(df1['a'])

But I haven't found a solution yet

Solution

Use double DataFrame.sort_values - first by b with key parameter and then by a column with kind parameter:

out = (df1.sort_values('b', key = lambda x: x==222)
          .sort_values('a', ignore_index=True, kind='stable'))
print (out)
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
3   1  500   5
4   1  222   4
5   2  300   6
6   2  100   8
7   2  222   7
8   3  200   9
9   3  300  11
10  3  500  12
11  3  400  13
12  3  100  14
13  3  222  10

Solution with helper column should be faster - added by DataFrame.assign and removed by DataFrame.drop:

out = df1.assign(tmp = df1['b'].eq(222)).sort_values(['a','tmp']).drop('tmp', axis=1)
print (out)
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
4   1  500   5
3   1  222   4
5   2  300   6
7   2  100   8
6   2  222   7
8   3  200   9
10  3  300  11
11  3  500  12
12  3  400  13
13  3  100  14
9   3  222  10

Or use np.lexsort for positions and change order by DataFrame.iloc:

out = df1.iloc[np.lexsort([df1['b'].eq(222), df1.a])]
print (out)
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
4   1  500   5
3   1  222   4
5   2  300   6
7   2  100   8
6   2  222   7
8   3  200   9
10  3  300  11
11  3  500  12
12  3  400  13
13  3  100  14
9   3  222  10

For default index add DataFrame.reset_index with drop=True:

out = out.reset_index(drop=True)
print (out)
    a    b   c
0   1  100   1
1   1  300   2
2   1  200   3
3   1  500   5
4   1  222   4
5   2  300   6
6   2  100   8
7   2  222   7
8   3  200   9
9   3  300  11
10  3  500  12
11  3  400  13
12  3  100  14
13  3  222  10