pythonpandasdataframedtype

Fix dot - comma confusion, producing NaN while converting dataframe column dtype "object" to "float"


I have the following dataframe:

    ID customer Month   Amount
0   026         201707  31,65
1   026         201708  31,65
2   026         201709  31,65
3   026         201710  31,65
4   026         201711  31,65

where Amount is initially object type. I want to calculate sum and average amount for each ID. First I tried to convert Amount column from object into float:

df['Amount'] = pd.to_numeric(df['Amount'], errors='coerce')

but I got NaN for all values in the Amount column:

    ID customer Month   Amount
0   026         201707  NaN 

How to convert column object type into float with actual numbers, then aggregate the values for each customer (sum, mean)?


Solution

  • Use Series.str.replace before pd.to_numeric to convert , in .Then you can use groupby.agg

    agg_df = (df.assign(Amount = pd.to_numeric(df['Amount'].str.replace(',','.'),
                                               errors = 'coerce'))
                .groupby('ID').Amount.agg(['mean','sum']))
    print(agg_df)
    #if you want change the type of Amount previously
    #df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
    #agg_df = df.groupby('ID').Amount.agg(['mean','sum']))
    
         mean    sum
    ID              
    0   31.65  31.65
    1   31.65  31.65
    2   31.65  31.65
    3   31.65  31.65
    4   31.65  31.65
    

    If you want aggregate to initial dataframe use GroupBy.transform:

    groups = pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce').groupby(df['ID'])
    #if you want change the type of Amount previously
    #df['Amount'] =pd.to_numeric(df['Amount'].str.replace(',','.'),errors = 'coerce')
    #groups = df.groupby('ID')['Amount']
    df['mean'] = groups.transform('mean')
    df['sum'] = groups.transform('sum')
    print(df)
       ID  customer   Month Amount   mean    sum  
    0   0        26  201707  31,65  31.65  31.65  
    1   1        26  201708  31,65  31.65  31.65  
    2   2        26  201709  31,65  31.65  31.65  
    3   3        26  201710  31,65  31.65  31.65  
    4   4        26  201711  31,65  31.65  31.65