I have a csv file and there are Date, count, and service column. There are many date,count and service columns but this is the example I will write below.
Number Count Service Number Count service
0 13 NO SERVICE 0 10
1 14 tcpmux 1 10
2 9 compressnet 2 14
So I want the answer like :
Number Total Count Service
0 23 NO SERVICE
1 24 tcpmux
2 23 compressnet
How do I do the code in pandas
import pandas as pd
df =pd.read_csv ("/Users/mani/Desktop/monthly report/geoip/2017-20dstipsum12.csv")
hasil = df.groupby(['NUMBER']).sum()
hasil.to_csv('gotttt.txt', sep='\t', encoding='utf-8')
If columns Number
are same in all data:
#sum all column Count
df['Total Count'] = df['Count'].sum(axis=1)
#select first and third column and join Total Count column
df = df.iloc[:, [0,2]].join(df['Total Count'])
print (df)
Number Total Count Total Service
0 0 23 NO SERVICE
1 1 24 tcpmux
2 2 23 compressnet
In newer version of pandas are deduplicated columns names in read_csv
, so need filter
for select columns:
print (df)
Number Count Service Number.1 Count.1 Service.1
0 0 13 NO SERVICE 0 10
1 1 14 tcpmux 1 10
2 2 9 compressnet 2 14
df['Total Count'] = df.filter(like='Count').sum(axis=1)
df = df[['Number','Total Count','Service']]
print (df)
Number Total Count Total Service
0 0 23 NO SERVICE
1 1 24 tcpmux
2 2 23 compressnet