pythonpandasdataframegroup-by

pandas group by get top 5 percent score


I am working with a dataframe that looks something like this:

score  name  country   date   id
1     name1  country1  date1  id1
5     name1  country1  date2  id2
6     name1  country1  date3  id3
9     name2  country2  date1  id4
5     name2  country2  date2  id5
5     name2  country2  date2  id5
7     name2  country3  date2  id5
2     name2  country3  date2  id5
1     name2  country3  date3  id6
2     name3  country4  date1  id7
6     name3  country4  date1  id8
7     name3  country4  date1  id9

I would like to get the the ids for each name grouped by country, where the score is the top 5% from that group of name. In the end it should look something like this:

name   country   score date  id
name1  country1    6   date2 id2
                   5   date3 id3
name2  country2    9   date2 id2
name2  country3    7   date2 id2
name3  country4    7   date2 id2

Solution

  • IIUC,

    g = df.groupby('country')
    g.apply(lambda r: r[r.score >= r.score.quantile(0.95)])
    
    
                    score   name    country     date    id
    country                     
    country1    2   6      name1    country1    date3   id3
    country2    3   9      name2    country2    date1   id4
    country3    6   7      name2    country3    date2   id5
    country4    11  7      name3    country4    date1   id9