pythonpandascsvlog-analysis

How to sum same number in a csv file using pandas


I have a csv file and there are Date, count, and service column. There are many date,count and service columns but this is the example I will write below.

Number  Count   Service       Number    Count   service
0        13   NO SERVICE        0        10 
1        14   tcpmux            1        10 
2         9   compressnet       2        14 

So I want the answer like :

Number   Total Count    Service
0            23         NO SERVICE
1            24         tcpmux
2            23         compressnet

How do I do the code in pandas

import pandas as pd
df =pd.read_csv ("/Users/mani/Desktop/monthly report/geoip/2017-20dstipsum12.csv")
hasil =  df.groupby(['NUMBER']).sum()
hasil.to_csv('gotttt.txt', sep='\t', encoding='utf-8')

Solution

  • If columns Number are same in all data:

    #sum all column Count
    df['Total Count'] = df['Count'].sum(axis=1)
    #select first and third column and join Total Count column
    df = df.iloc[:, [0,2]].join(df['Total Count'])
    print (df)
       Number  Total Count   Total Service
    0       0           23      NO SERVICE
    1       1           24          tcpmux  
    2       2           23     compressnet  
    

    In newer version of pandas are deduplicated columns names in read_csv, so need filter for select columns:

    print (df)
       Number  Count      Service  Number.1  Count.1 Service.1
    0       0     13   NO SERVICE         0       10          
    1       1     14       tcpmux         1       10          
    2       2      9  compressnet         2       14 
    
    df['Total Count'] = df.filter(like='Count').sum(axis=1)
    
    df = df[['Number','Total Count','Service']]
    print (df)
       Number  Total Count   Total Service
    0       0           23      NO SERVICE  
    1       1           24          tcpmux 
    2       2           23     compressnet