I want to concat three list columns in a pl.LazyFrame
. However the Lists often contain NULL values. Resulting in NULL for pl.concat_list
import polars as pl
# Create the data with some NULLs
data = {
"a": [["apple", "banana"], None, ["cherry"]],
"b": [None, ["dog", "elephant"], ["fish"]],
"c": [["grape"], ["honeydew"], None],
}
# Create a LazyFrame
lazy_df = pl.LazyFrame(data)
list_cols = ["a", "b", "c"]
print(lazy_df.with_columns(pl.concat_list(pl.col(list_cols)).alias("merge")).collect())
┌─────────────────────┬─────────────────────┬──────────────┬───────────┐
│ a ┆ b ┆ c ┆ merge │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[str] ┆ list[str] ┆ list[str] ┆ list[str] │
╞═════════════════════╪═════════════════════╪══════════════╪═══════════╡
│ ["apple", "banana"] ┆ null ┆ ["grape"] ┆ null │
│ null ┆ ["dog", "elephant"] ┆ ["honeydew"] ┆ null │
│ ["cherry"] ┆ ["fish"] ┆ null ┆ null │
└─────────────────────┴─────────────────────┴──────────────┴───────────┘
How can I concat the lists even when some values are NULL?
I've tried to fill the null values via expr.fill_null("")
or expr.fill_null(pl.List(""))
or expr.fill_null(pl.List([]))
but could not get it to run through. How do I fill an empty list instead of NULL in cols of type pl.List[str]
. And is there a better way to concat the three list columns?
You can use pl.Expr.fill_null()
as follows:
lazy_df.with_columns(
pl.concat_list(
pl.col(list_cols).fill_null([])
).alias("merge")
)