pythondecimalpython-polars

Polars DataFrame - Decimal Precision doubles on mul with Integer


I have a Polars (v1.5.0) dataframe with 4 columns as shown in example below. When I multiply decimal columns with an integer column, the scale of the resultant decimal column doubles.

from decimal import Decimal
import polars as pl

df = pl.DataFrame({
    "a": [1, 2],
    "b": [Decimal('3.45'), Decimal('4.73')],
    "c": [Decimal('2.113'), Decimal('4.213')],
    "d": [Decimal('1.10'), Decimal('3.01')]
})
shape: (2, 4)
┌─────┬──────────────┬──────────────┬──────────────┐
│ a   ┆ b            ┆ c            ┆ d            │
│ --- ┆ ---          ┆ ---          ┆ ---          │
│ i64 ┆ decimal[*,2] ┆ decimal[*,3] ┆ decimal[*,2] │
╞═════╪══════════════╪══════════════╪══════════════╡
│ 1   ┆ 3.45         ┆ 2.113        ┆ 1.10         │
│ 2   ┆ 4.73         ┆ 4.213        ┆ 3.01         │
└─────┴──────────────┴──────────────┴──────────────┘
df.with_columns(pl.col("c", "d").mul(pl.col("a")))
shape: (2, 4)
┌─────┬──────────────┬──────────────┬──────────────┐
│ a   ┆ b            ┆ c            ┆ d            │
│ --- ┆ ---          ┆ ---          ┆ ---          │
│ i64 ┆ decimal[*,2] ┆ decimal[*,6] ┆ decimal[*,4] │
╞═════╪══════════════╪══════════════╪══════════════╡
│ 1   ┆ 3.45         ┆ 2.113000     ┆ 1.1000       │
│ 2   ┆ 4.73         ┆ 8.426000     ┆ 6.0200       │
└─────┴──────────────┴──────────────┴──────────────┘

I don't know why the scale doubles, when I am just multiplying a decimal with an integer. What do I do so that the scale does not change?


Solution

  • The scale indeed seems to double. You could cast back to the original dtype:

    cols = ['c', 'd', 'e']
    df.with_columns(pl.col(c).mul(pl.col('a')).cast(df[c].dtype) for c in cols)
    

    Note that there currently doesn't seem to be a way to access the dtype in an Expr, but this is a discussed feature.

    Example:

    ┌─────┬─────┬──────────────┬──────────────┬──────────────┐
    │ a   ┆ b   ┆ c            ┆ d            ┆ e            │
    │ --- ┆ --- ┆ ---          ┆ ---          ┆ ---          │
    │ i64 ┆ i64 ┆ decimal[*,2] ┆ decimal[*,3] ┆ decimal[*,4] │
    ╞═════╪═════╪══════════════╪══════════════╪══════════════╡
    │ 1   ┆ 3   ┆ 2.11         ┆ 1.100        ┆ 1.1001       │
    │ 2   ┆ 4   ┆ 8.42         ┆ 6.022        ┆ 6.0004       │
    └─────┴─────┴──────────────┴──────────────┴──────────────┘
    

    Used input:

    from decimal import Decimal
    df = pl.DataFrame({
        "a": [1, 2],
        "b": [3, 4],
        "c": [Decimal('2.11'), Decimal('4.21')],
        "d": [Decimal('1.10'), Decimal('3.011')],
        "e": [Decimal('1.1001'), Decimal('3.0002')],
    })