Consider having two pl.DataFrame
s with identical schema. One of the columns has dtype=pl.Enum
.
import polars as pl
enum_col1 = pl.Enum(["type1"])
enum_col2 = pl.Enum(["type2"])
df1 = pl.DataFrame(
{"enum_col": "type1", "value": 10},
schema={"enum_col": enum_col1, "value": pl.Int64},
)
df2 = pl.DataFrame(
{"enum_col": "type2", "value": 200},
schema={"enum_col": enum_col2, "value": pl.Int64},
)
print(df1)
print(df2)
shape: (1, 2)
┌──────────┬───────┐
│ enum_col ┆ value │
│ --- ┆ --- │
│ enum ┆ i64 │
╞══════════╪═══════╡
│ type1 ┆ 10 │
└──────────┴───────┘
shape: (1, 2)
┌──────────┬───────┐
│ enum_col ┆ value │
│ --- ┆ --- │
│ enum ┆ i64 │
╞══════════╪═══════╡
│ type2 ┆ 200 │
└──────────┴───────┘
If I try to do a simple pl.concat([df1, df2])
, I get the following error:
polars.exceptions.SchemaError: type Enum(Some(local), Physical) is incompatible with expected type Enum(Some(local), Physical)
You can get around this issue by "enlarging" the enums like this:
pl.concat(
[
df1.with_columns(pl.col("enum_col").cast(pl.Enum(["type1", "type2"]))),
df2.with_columns(pl.col("enum_col").cast(pl.Enum(["type1", "type2"]))),
]
)
shape: (2, 2)
┌──────────┬───────┐
│ enum_col ┆ value │
│ --- ┆ --- │
│ enum ┆ i64 │
╞══════════╪═══════╡
│ type1 ┆ 10 │
│ type2 ┆ 200 │
└──────────┴───────┘
I guess, there is a more pythonic way to do this?
you can cast enum_col
to combined enum type:
enum_col = enum_col1 | enum_col2
pl.concat(
df.with_columns(pl.col.enum_col.cast(enum_col)) for df in [df1, df2]
)
shape: (2, 2)
┌──────────┬───────┐
│ enum_col ┆ value │
│ --- ┆ --- │
│ enum ┆ i64 │
╞══════════╪═══════╡
│ type1 ┆ 10 │
│ type2 ┆ 200 │
└──────────┴───────┘
You can also create new enum_col
dynamically, for example:
from functools import reduce
enum_col = reduce(lambda x,y: x | y, [df.schema["enum_col"] for df in [df1, df2]])
Enum(categories=['type1', 'type2'])