executing a polars.read_database()
resulted in columns with the Decimal
data type, which I'd like to cast to either Int or Float, depending on the value of the scale
parameter in Decimal
. Alternatively, I'd be happy if there is a way to instruct polars to not use the Decimal data type as an option and during schema inference to let it assign the appropriate Float or Int.
Is there a way to use polars.selectors
to conditionally target Decimal
based on whether scale
is zero or not? Or to instruct polars.read_database
to not use Decimal?
Ideally, I'd like to be able to do something like:
df.with_columns(
pl.selectors.decimal(scale="1+").cast(pl.Float64()),
pl.selectors.decimal(scale="0").cast(pl.Int64())
)
Of course, pl.selectors.decimal()
doesn't have any arguments that it can take. An alternative would be some sort of pl.when ...
but I would need to extract the value for scale
first, and not sure how to do that. Or attack this at the read_database
level.
Any ideas?
A fairly explicit solution that works is:
int_dec_cols = [c for c, dt in df.schema.items()
if isinstance(dt, pl.Decimal) and dt.scale == 0]
flt_dec_cols = [c for c, dt in df.schema.items()
if isinstance(dt, pl.Decimal) and dt.scale > 0]
df = df.with_columns(
pl.col(int_dec_cols).cast(pl.Int64),
pl.col(flt_dec_cols).cast(pl.Float64),
)