pythonlistpython-polars

concat_list with NULL values, or how to fill NULL in pl.List[str]


I want to concat three list columns in a pl.LazyFrame. However the Lists often contain NULL values. Resulting in NULL for pl.concat_list

MRE

import polars as pl

# Create the data with some NULLs
data = {
    "a": [["apple", "banana"], None, ["cherry"]],
    "b": [None, ["dog", "elephant"], ["fish"]],
    "c": [["grape"], ["honeydew"], None],
}

# Create a LazyFrame
lazy_df = pl.LazyFrame(data)
list_cols = ["a", "b", "c"]
print(lazy_df.with_columns(pl.concat_list(pl.col(list_cols)).alias("merge")).collect())
┌─────────────────────┬─────────────────────┬──────────────┬───────────┐
│ a                   ┆ b                   ┆ c            ┆ merge     │
│ ---                 ┆ ---                 ┆ ---          ┆ ---       │
│ list[str]           ┆ list[str]           ┆ list[str]    ┆ list[str] │
╞═════════════════════╪═════════════════════╪══════════════╪═══════════╡
│ ["apple", "banana"] ┆ null                ┆ ["grape"]    ┆ null      │
│ null                ┆ ["dog", "elephant"] ┆ ["honeydew"] ┆ null      │
│ ["cherry"]          ┆ ["fish"]            ┆ null         ┆ null      │
└─────────────────────┴─────────────────────┴──────────────┴───────────┘

Question

How can I concat the lists even when some values are NULL?

Tried solutions

I've tried to fill the null values via expr.fill_null("") or expr.fill_null(pl.List("")) or expr.fill_null(pl.List([])) but could not get it to run through. How do I fill an empty list instead of NULL in cols of type pl.List[str]. And is there a better way to concat the three list columns?


Solution

  • You can use pl.Expr.fill_null() as follows:

    lazy_df.with_columns(
        pl.concat_list(
            pl.col(list_cols).fill_null([])
        ).alias("merge")
    )