pythonpandas

Count duplicates in column and add them to new col Pandas


im trying to count duplicates and then drop.cuplicates to get dataframe without duplicates but heve extra col in which we have number that represents how many duplicates there were

from

       id  letter
0  665639  AAAAAA
1  665639  AAAAAA
2  665639  AAAAAA
3  665639  AAAAAA
5  702090  BBBBBB
6  702092  CCCCCC

to

       id  letter   count
0  665639  AAAAAA     4
1  665639  AAAAAA     4
2  665639  AAAAAA     4
3  665639  AAAAAA     4
5  702090  BBBBBB     1
6  702092  CCCCCC     1

and after

df.drop.duplicates(subset=['id'] ,inplace=True)

       id  letter   count
0  665639  AAAAAA     4
5  702090  BBBBBB     1
6  702092  CCCCCC     1

my approach was:

cols = ['id','letter']
df['count'] = df.groupby(cols)['id'].transform('size')

but when im doing this i get dataframe that has only 1's in count


Solution

  • I would first create the count of duplicates

    df['Count'] = 1
    df.groupby(['id','letter']).Count.count().reset_index()
    

    And then drop the duplicates

    df.drop_duplicates()