I'm working with a dictionary that contains a list of decimal.Decimal
values as one of its fields:
import pandas as pd
from decimal import Decimal
data = {
'Item': ['Apple', 'Banana', 'Orange'],
'Price': [Decimal('1.25'), Decimal('0.75'), Decimal('2.00')],
'Quantity': [10, 20, 15]
}
When I convert this dictionary into a Pandas DataFrame, the Price column (which contains Decimal objects) is inferred as object:
df = pd.DataFrame(data)
print(df.dtypes)
# Output:
# Item object
# Price object
# Quantity int64
# dtype: object
I would like the Price column to be of type float64 instead.
I tried using pd.DataFrame.from_records(data, coerce_float=True)
, but it didn’t change the type of the Decimal values.
I’m aware I can convert the column using .astype(float), but in my actual use case, I may not know the column name in advance.
What’s the best way to ensure that all Decimal values are automatically converted to floats when creating the DataFrame, ideally without hardcoding column names?
You can identify the columns to use (for example using the first row), then convert to float:
convert = dict.fromkeys(df.columns[df.iloc[0].map(type).eq(Decimal)], float)
# {'Price': float}
out = df.astype(convert)
If you can't rely only on the first row, you could define a threshold of a number of Decimal values (e.g. at least 1, all, 90%) to convert the column:
# convert a column if all values are Decimal
convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).all()], float)
# convert if at least one value is Decimal
convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).any()], float)
# convert if more than 90% values are Decimal
convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).mean().gt(0.9)],
float)
out = df.astype(convert)
Output dtypes:
Item string[python]
Price float64
Quantity Int64
dtype: object