I have the following dataframe:
ID customer Month Amount
0 026 201707 31,65
1 026 201708 31,65
2 026 201709 31,65
3 026 201710 31,65
4 026 201711 31,65
where Amount
is initially object
type. I want to calculate sum and average amount for each ID. First I tried to convert Amount
column from object
into float
:
df['Amount'] = pd.to_numeric(df['Amount'], errors='coerce')
but I got NaN
for all values in the Amount
column:
ID customer Month Amount
0 026 201707 NaN
How to convert column object type into float
with actual numbers, then aggregate the values for each customer (sum, mean)?
Use Series.str.replace
before pd.to_numeric
to convert ,
in .
Then you can use groupby.agg
agg_df = (df.assign(Amount = pd.to_numeric(df['Amount'].str.replace(',','.'),
errors = 'coerce'))
.groupby('ID').Amount.agg(['mean','sum']))
print(agg_df)
#if you want change the type of Amount previously
#df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
#agg_df = df.groupby('ID').Amount.agg(['mean','sum']))
mean sum
ID
0 31.65 31.65
1 31.65 31.65
2 31.65 31.65
3 31.65 31.65
4 31.65 31.65
If you want aggregate to initial dataframe use GroupBy.transform
:
groups = pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce').groupby(df['ID'])
#if you want change the type of Amount previously
#df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
#groups = df.groupby('ID')['Amount']
df['mean'] = groups.transform('mean')
df['sum'] = groups.transform('sum')
print(df)
ID customer Month Amount mean sum
0 0 26 201707 31,65 31.65 31.65
1 1 26 201708 31,65 31.65 31.65
2 2 26 201709 31,65 31.65 31.65
3 3 26 201710 31,65 31.65 31.65
4 4 26 201711 31,65 31.65 31.65