pythonpandasdataframetype-conversion

Convert Decimal values to float64 when creating a Pandas DataFrame


I'm working with a dictionary that contains a list of decimal.Decimal values as one of its fields:

import pandas as pd
from decimal import Decimal

data = {
    'Item': ['Apple', 'Banana', 'Orange'],
    'Price': [Decimal('1.25'), Decimal('0.75'), Decimal('2.00')],
    'Quantity': [10, 20, 15]
}

When I convert this dictionary into a Pandas DataFrame, the Price column (which contains Decimal objects) is inferred as object:

df = pd.DataFrame(data)
print(df.dtypes)

# Output:
# Item        object
# Price       object
# Quantity     int64
# dtype: object

I would like the Price column to be of type float64 instead.

I tried using pd.DataFrame.from_records(data, coerce_float=True), but it didn’t change the type of the Decimal values.

I’m aware I can convert the column using .astype(float), but in my actual use case, I may not know the column name in advance.

What’s the best way to ensure that all Decimal values are automatically converted to floats when creating the DataFrame, ideally without hardcoding column names?


Solution

  • You can identify the columns to use (for example using the first row), then convert to float:

    convert = dict.fromkeys(df.columns[df.iloc[0].map(type).eq(Decimal)], float)
    # {'Price': float}
    out = df.astype(convert)
    

    If you can't rely only on the first row, you could define a threshold of a number of Decimal values (e.g. at least 1, all, 90%) to convert the column:

    # convert a column if all values are Decimal
    convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).all()], float)
    
    # convert if at least one value is Decimal
    convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).any()], float)
    
    # convert if more than 90% values are Decimal
    convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).mean().gt(0.9)],
                            float)
    out = df.astype(convert)
    

    Output dtypes:

    Item        string[python]
    Price              float64
    Quantity             Int64
    dtype: object