pandas code like this, If a float column with 1.0, 2.0, 3.0, remove all the .0
df = pd.DataFrame({
"date": ["2025-01-01", "2025-01-02"],
"a": [1.0, 2.0],
"c": [1.0, 2.1],
})
print(df)
columns = df.columns.difference(["date"])
df[columns] = df[columns].map(lambda x: int(x) if x.is_integer() else x)
print(df)
date a c
0 2025-01-01 1.0 1.0
1 2025-01-02 2.0 2.1
date a c
0 2025-01-01 1 1.0
1 2025-01-02 2 2.1
Something like this does the trick.
Note that it is not typically advised to have the schema depend on the data itself. We can, however, avoid any for-by-row iteration and used a vectorised UDF with map_batches
def maybe_cast_int(s: pl.Series) -> pl.Series:
"""Cast the Series to an Int64 type if all values are whole numbers."""
s2 = s.cast(pl.Int64)
return s2 if (s2 == s).all() else s
df = pl.DataFrame({
"date": ["2025-01-01", "2025-01-02"],
"a": [1.0, 2.0],
"c": [1.0, 2.1],
})
df.with_columns(pl.col("a", "c").map_batches(maybe_cast_int))
shape: (2, 3)
┌────────────┬─────┬─────┐
│ date ┆ a ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞════════════╪═════╪═════╡
│ 2025-01-01 ┆ 1 ┆ 1.0 │
│ 2025-01-02 ┆ 2 ┆ 2.1 │
└────────────┴─────┴─────┘
This example shows it a bit better by not overwriting original columns
df.select(
"a",
pl.col("a").map_batches(maybe_cast_int).alias("b"),
"c",
pl.col("c").map_batches(maybe_cast_int).alias("d"),
)
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ i64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ 1.0 ┆ 1 ┆ 1.0 ┆ 1.0 │
│ 2.0 ┆ 2 ┆ 2.1 ┆ 2.1 │
└─────┴─────┴─────┴─────┘