pythonpandassortingseriesreindex

Sort dataframe by string length


I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})

Solution

  • You can use reindex of index of Series created by len with sort_values:

    print (df.name.str.len())
    0    5
    1    2
    2    6
    3    4
    Name: name, dtype: int64
    
    print (df.name.str.len().sort_values())
    1    2
    3    4
    0    5
    2    6
    Name: name, dtype: int64
    
    s = df.name.str.len().sort_values().index
    print (s)
    Int64Index([1, 3, 0, 2], dtype='int64')
    
    print (df.reindex(s))
         name  score
    1      Al      4
    3    Greg      3
    0   Steve      2
    2  Markus      2
    

    df1 = df.reindex(s)
    df1 = df1.reset_index(drop=True)
    print (df1)
         name  score
    0      Al      4
    1    Greg      3
    2   Steve      2
    3  Markus      2