pandasdataframesortingbin

How to sort a pandas dataframe by a binned column


I have created the following pandas dataframe:

import pandas as pd
ds = {'col1' : ['(-9999999, 550.0]','(13700.0, 23700.0]','(23700.0, 414580.0]','(4000.0, 8000.0]','(414580.0, 9999999]','(550.0, 4000.0]','(8000.0, 13700.0]'],                                                                                                                             
      'col2' : [905317.3,   606156.5,   586349.6,   665779.1,   0,  803824.4,   628475.2]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)
                  col1      col2
0    (-9999999, 550.0]  905317.3
1   (13700.0, 23700.0]  606156.5
2  (23700.0, 414580.0]  586349.6
3     (4000.0, 8000.0]  665779.1
4  (414580.0, 9999999]       0.0
5      (550.0, 4000.0]  803824.4
6    (8000.0, 13700.0]  628475.2

I need to sort the dataframe by column col1 in ascending order. The resulting dataframe would look like this:

                  col1      col2
0    (-9999999, 550.0]  905317.3
1      (550.0, 4000.0]  803824.4
2     (4000.0, 8000.0]  665779.1
3    (8000.0, 13700.0]  628475.2
4   (13700.0, 23700.0]  606156.5
5  (23700.0, 414580.0]  586349.6
6  (414580.0, 9999999]       0.0

Can someone help me please?


Solution

  • Code

    out = df.sort_values(
        'col1', key=lambda x: x.str.extract(r'\(([^,]+),')[0].astype('float')
    )
    

    out:

                      col1      col2
    0    (-9999999, 550.0]  905317.3
    5      (550.0, 4000.0]  803824.4
    3     (4000.0, 8000.0]  665779.1
    6    (8000.0, 13700.0]  628475.2
    1   (13700.0, 23700.0]  606156.5
    2  (23700.0, 414580.0]  586349.6
    4  (414580.0, 9999999]       0.0