When applying pandas.to_numeric(), the return dtype is float64 or int64 depending on the data supplied.
Is there an equivalent to do this in polars?
import pandas as pd
import polars as pl
df = pl.from_repr("""
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ 1 ┆ 3.5 │
│ 2 ┆ 4.6 │
└──────┴──────┘
""")
pl.from_pandas(df.to_pandas().apply(pd.to_numeric))
# shape: (2, 2)
# ┌──────┬──────┐
# │ col1 ┆ col2 │
# │ --- ┆ --- │
# │ i64 ┆ f64 │
# ╞══════╪══════╡
# │ 1 ┆ 3.5 │
# │ 2 ┆ 4.6 │
# └──────┴──────┘
Unlike Pandas, Polars is quite picky about datatypes and tends to be rather unaccommodating when it comes to automatic casting. (Among the reasons is performance.)
You can create a feature request for a to_numeric method (but I'm not sure how enthusiastic the response will be.)
That said, here's some easy ways to accomplish this.
Perhaps the simplest way is to write a method that attempts the cast to integer and then catches the exception. For convenience, you can even attach this method to the Series class itself.
def to_numeric(s: pl.Series) -> pl.Series:
try:
result = s.cast(pl.Int64)
except pl.exceptions.InvalidOperationError:
result = s.cast(pl.Float64)
return result
pl.Series.to_numeric = to_numeric
Then to use it:
(
pl.select(
s.to_numeric()
for s in df
)
)
shape: (2, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞══════╪══════╡
│ 1 ┆ 3.5 │
│ 2 ┆ 4.6 │
└──────┴──────┘
Another method is to write your columns to a csv file (in a string buffer), and then have read_csv try to infer the types automatically. You may have to tweak the infer_schema_length parameter in some situations.
from io import StringIO
pl.read_csv(StringIO(df.write_csv()))
shape: (2, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞══════╪══════╡
│ 1 ┆ 3.5 │
│ 2 ┆ 4.6 │
└──────┴──────┘