I have a large (~300M rows x 44 cols) dataframe and I need to fill in null values in certain ways depending on the characteristics of each group.
For example, say we have
lf = pl.LazyFrame(
{'group':(1,1,1,2,2,2,3,3,3),
'val':('yes', None, 'no', '2', '2', '2', 'answer', None, 'answer')
}
)
┌───────┬────────┐
│ group ┆ val │
│ --- ┆ --- │
│ i64 ┆ str │
╞═══════╪════════╡
│ 1 ┆ yes │
│ 1 ┆ null │
│ 1 ┆ no │
│ 2 ┆ 2 │
│ 2 ┆ 2 │
│ 2 ┆ 2 │
│ 3 ┆ answer │
│ 3 ┆ null │
│ 3 ┆ answer │
└───────┴────────┘
I want to fill in nulls if and only if the group contains a single non-null unique value in the other cells, since in my context that's the expectation of the data and the presense of more than one unique value (or all nulls) in the group signals another issue that will be handled differently.
I'm able to fill null values for each group with the following:
filled_lf = (
lf
.with_columns(
pl.col('val')
.fill_null(pl.col('val').unique().first().over('group')).alias('filled_val')
)
)
However, for one, it seems that pl.col('val').unique() includes 'null' as one of the values, and the ordering is stochastic so choosing the first value on the list has inconsistent results. Secondly, it doesn't include the condition I need.
Desired result:
┌───────┬────────┬────────────┐
│ group ┆ val ┆ filled_val │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═══════╪════════╪════════════╡
│ 1 ┆ yes ┆ yes │
│ 1 ┆ null ┆ null │
│ 1 ┆ no ┆ no │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 3 ┆ answer ┆ answer │
│ 3 ┆ null ┆ answer │
│ 3 ┆ answer ┆ answer │
└───────┴────────┴────────────┘
Pandas 3.12 Polars 0.20.1
Thanks in advance for your advice!
You can add:
.drop_nulls()
.unique()
with maintain_order=True
when/then/otherwise
to implement the conditional count/len logicunique = pl.col("val").drop_nulls().unique(maintain_order=True)
df.with_columns(
pl.when(unique.len().over("group") == 1)
.then(pl.col("val").fill_null(unique.first().over("group")))
.otherwise(pl.col("val"))
.alias("filled")
)
shape: (9, 3)
┌───────┬────────┬────────┐
│ group ┆ val ┆ filled │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═══════╪════════╪════════╡
│ 1 ┆ yes ┆ yes │
│ 1 ┆ null ┆ null │
│ 1 ┆ no ┆ no │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 2 ┆ 2 ┆ 2 │
│ 3 ┆ answer ┆ answer │
│ 3 ┆ null ┆ answer │
│ 3 ┆ answer ┆ answer │
└───────┴────────┴────────┘