pythonpandasdataframefilterswitch-statement

Filter pandas dataframe rows of the most common values in a column


I have a pandas dataframe

import pandas as pd

df =pd.DataFrame({'name':['john','joe','bill','richard','sam'],
                  'cluster':['1','2','3','1','2']})

df['cluster'].value_counts() will give the number of occurrences of items based on the column cluster.

Is it possible to retain only the rows which have the maximum number of occurrences in the column cluster?

The expected output is

enter image description here

The cluster 1 and 2 have the same number of occurrences, so all the rows for cluster 1 and 2 need to be retained.


Solution

  • You can get the max count of cluster value through df['cluster'].value_counts() then use isin to filter cluster column

    c = df['cluster'].value_counts()
    
    out = df[df['cluster'].isin(c[c.eq(c.max())].index)]
    
    print(out)
    
          name cluster
    0     john       1
    1      joe       2
    3  richard       1
    4      sam       2