pythonpandasmultiple-entries

Pandas Dataframe - find sums in column B across each label in column A


Lets say we have the following Data:

...    col1    col2    col3
 0      A      1       info
 1      A      2       other
 2      B      3       blabla

I want to use python pandas to find duplicate entries (in column 1) and add them up based on column 2.

In python I would do something like the following:

l = [('A',1), ('A',2), ('B',3)]
d = {}
for i in l:
    if(i[0] not in d.keys()):
        d[i[0]]=i[1]
    else:
        d[i[0]]=d[i[0]]+i[1]
print(d)

So the outcome would be:

{'A': 3, 'B': 3}

Is there an easy way to do the same thing using pandas?


Solution

  • Use DataFrame.groupby().sum():

    In [1]: import pandas
    
    In [2]: df = pandas.DataFrame({"col1":["A", "A", "B"], "col2":[1,2,3]})
    
    In [3]: df.groupby("col1").sum()
    Out[3]: 
          col2
    col1      
    A        3
    B        3
    
    In [4]: df.groupby("col1").sum().reset_index()
    Out[4]: 
      col1  col2
    0    A     3
    1    B     3
    
    [2 rows x 2 columns]