pythonpython-polars

How to cast polars' Decimal to Int or Float depending on scale parameter


executing a polars.read_database() resulted in columns with the Decimal data type, which I'd like to cast to either Int or Float, depending on the value of the scale parameter in Decimal. Alternatively, I'd be happy if there is a way to instruct polars to not use the Decimal data type as an option and during schema inference to let it assign the appropriate Float or Int.

Is there a way to use polars.selectors to conditionally target Decimal based on whether scale is zero or not? Or to instruct polars.read_database to not use Decimal?

Ideally, I'd like to be able to do something like:


df.with_columns(
    pl.selectors.decimal(scale="1+").cast(pl.Float64()),
    pl.selectors.decimal(scale="0").cast(pl.Int64())
)

Of course, pl.selectors.decimal() doesn't have any arguments that it can take. An alternative would be some sort of pl.when ... but I would need to extract the value for scale first, and not sure how to do that. Or attack this at the read_database level.

Any ideas?


Solution

  • A fairly explicit solution that works is:

    int_dec_cols = [c for c, dt in df.schema.items()
                    if isinstance(dt, pl.Decimal) and dt.scale == 0]
    flt_dec_cols = [c for c, dt in df.schema.items()
                    if isinstance(dt, pl.Decimal) and dt.scale > 0]
    df = df.with_columns(
        pl.col(int_dec_cols).cast(pl.Int64),
        pl.col(flt_dec_cols).cast(pl.Float64),
    )