pythonpython-polars

How to convert float Columns without decimal to Int in Polars?


pandas code like this, If a float column with 1.0, 2.0, 3.0, remove all the .0

df = pd.DataFrame({
    "date": ["2025-01-01", "2025-01-02"],
    "a": [1.0, 2.0],
    "c": [1.0, 2.1],
})
print(df)
columns = df.columns.difference(["date"])
df[columns] = df[columns].map(lambda x: int(x) if x.is_integer() else x)
print(df)
         date    a    c
0  2025-01-01  1.0  1.0
1  2025-01-02  2.0  2.1
         date  a    c
0  2025-01-01  1  1.0
1  2025-01-02  2  2.1

Solution

  • Something like this does the trick.

    Note that it is not typically advised to have the schema depend on the data itself. We can, however, avoid any for-by-row iteration and used a vectorised UDF with map_batches

    def maybe_cast_int(s: pl.Series) -> pl.Series:
        """Cast the Series to an Int64 type if all values are whole numbers."""
        s2 = s.cast(pl.Int64)
        return s2 if (s2 == s).all() else s
    
    df = pl.DataFrame({
        "date": ["2025-01-01", "2025-01-02"],
        "a": [1.0, 2.0],
        "c": [1.0, 2.1],
    })
    
    df.with_columns(pl.col("a", "c").map_batches(maybe_cast_int))
    
    shape: (2, 3)
    ┌────────────┬─────┬─────┐
    │ date       ┆ a   ┆ c   │
    │ ---        ┆ --- ┆ --- │
    │ str        ┆ i64 ┆ f64 │
    ╞════════════╪═════╪═════╡
    │ 2025-01-01 ┆ 1   ┆ 1.0 │
    │ 2025-01-02 ┆ 2   ┆ 2.1 │
    └────────────┴─────┴─────┘
    

    This example shows it a bit better by not overwriting original columns

    df.select(
        "a",
        pl.col("a").map_batches(maybe_cast_int).alias("b"),
        "c",
        pl.col("c").map_batches(maybe_cast_int).alias("d"),
    )
    
    shape: (2, 4)
    ┌─────┬─────┬─────┬─────┐
    │ a   ┆ b   ┆ c   ┆ d   │
    │ --- ┆ --- ┆ --- ┆ --- │
    │ f64 ┆ i64 ┆ f64 ┆ f64 │
    ╞═════╪═════╪═════╪═════╡
    │ 1.0 ┆ 1   ┆ 1.0 ┆ 1.0 │
    │ 2.0 ┆ 2   ┆ 2.1 ┆ 2.1 │
    └─────┴─────┴─────┴─────┘