Lets say we have the following Data:
... col1 col2 col3
0 A 1 info
1 A 2 other
2 B 3 blabla
I want to use python pandas to find duplicate entries (in column 1) and add them up based on column 2.
In python I would do something like the following:
l = [('A',1), ('A',2), ('B',3)]
d = {}
for i in l:
if(i[0] not in d.keys()):
d[i[0]]=i[1]
else:
d[i[0]]=d[i[0]]+i[1]
print(d)
So the outcome would be:
{'A': 3, 'B': 3}
Is there an easy way to do the same thing using pandas?
Use DataFrame.groupby().sum()
:
In [1]: import pandas
In [2]: df = pandas.DataFrame({"col1":["A", "A", "B"], "col2":[1,2,3]})
In [3]: df.groupby("col1").sum()
Out[3]:
col2
col1
A 3
B 3
In [4]: df.groupby("col1").sum().reset_index()
Out[4]:
col1 col2
0 A 3
1 B 3
[2 rows x 2 columns]