pythonpandassorting

rearrange columns in dataframe depending on sorting output


I have the following data frame:

df = pd.DataFrame(
    {
        'a':[1,2,3,4,5,6],
        'b':[1,1,3,3,5,5],
        'c':[1,2,3,4,5,6],                
        'd':[1,1,1,1,1,5],
    }
)

In [1051]: df
Out[1051]: 
   a  b  c  d
0  1  1  1  1
1  2  1  2  1
2  3  3  3  1
3  4  3  4  1
4  5  5  5  1
5  6  5  6  5

If I sort the df using all the columns, I get the following:

In [1055]: columns = list(df.columns)
      ...: 
      ...: dfSorted = df.sort_values(by=columns, ascending=False)
      ...: 
      ...: print(dfSorted)
   a  b  c  d
5  6  5  6  5
4  5  5  5  1
3  4  3  4  1
2  3  3  3  1
1  2  1  2  1
0  1  1  1  1

I'd like to re arrange the order of the columns going from the column with the least differences among the rows, being the last column the one with the most differences. In my example, the expected order should be then d,b,c,a.

This is so because column d has only two different values (1 and 5) while columns c and a have all of the values different. Column b lies in between ...

In [1056]: dfSorted[['d','b','c','a']]
Out[1056]: 
   d  b  c  a
5  5  5  6  6
4  1  5  5  5
3  1  3  4  4
2  1  3  3  3
1  1  1  2  2
0  1  1  1  1

Any idea? Thanks!


Solution

  • A possible solution is to sort the columns (with sort_values) by the number of nunique values and use that to reindex:

    df = df.reindex(df.nunique().sort_values().index, axis=1)
    
       d  b  a  c
    5  5  5  6  6
    4  1  5  5  5
    3  1  3  4  4
    2  1  3  3  3
    1  1  1  2  2
    0  1  1  1  1