I need to backfill a column in a python polars dataframe over one of three possible columns, based on which one matches the non-null cell in the column to be backfilled.
My dataframe looks something like this:
┌─────┬─────┬─────┬─────────┐
│ id1 ┆ id2 ┆ id3 ┆ call_id │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════════╡
│ 1 ┆ 4 ┆ 9 ┆ null │
│ 1 ┆ 5 ┆ 9 ┆ null │
│ 1 ┆ 5 ┆ 9 ┆ null │
│ 2 ┆ 5 ┆ 9 ┆ null │
│ 2 ┆ 6 ┆ 9 ┆ 2 │
│ 2 ┆ 7 ┆ 10 ┆ null │
│ 3 ┆ 7 ┆ 11 ┆ null │
│ 3 ┆ 7 ┆ 12 ┆ null │
│ 3 ┆ 7 ┆ 13 ┆ 7 │
│ 3 ┆ 8 ┆ 13 ┆ null │
└─────┴─────┴─────┴─────────┘
And I want it to look like this:
┌─────┬─────┬─────┬─────────┐
│ id1 ┆ id2 ┆ id3 ┆ call_id │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════════╡
│ 1 ┆ 4 ┆ 9 ┆ null │
│ 1 ┆ 5 ┆ 9 ┆ null │
│ 1 ┆ 5 ┆ 9 ┆ null │
│ 2 ┆ 5 ┆ 9 ┆ 2 │
│ 2 ┆ 6 ┆ 9 ┆ 2 │
│ 2 ┆ 7 ┆ 10 ┆ 7 │
│ 3 ┆ 7 ┆ 11 ┆ 7 │
│ 3 ┆ 7 ┆ 12 ┆ 7 │
│ 3 ┆ 7 ┆ 13 ┆ 7 │
│ 3 ┆ 8 ┆ 13 ┆ null │
└─────┴─────┴─────┴─────────┘
If I knew which column matched, I would have used something to the effect of .with_columns(pl.col('call_id').backfill().over('id1'), but for the life of me I can't figure out how to systematically choose which column to backfill over.
Assuming that your dataframe is in a variable called df
and your backfill value is always the same (which is the case in your example)
backfill = pl.col("call_id").backward_fill()
df.with_columns(
call_id=pl.coalesce(
pl.when((pl.col(c) == pl.col("call_id")).backward_fill().over(c)).then(c)
for c in ("id1", "id2", "id3")
)
)
# shape: (10, 4)
# ┌─────┬─────┬─────┬─────────┐
# │ id1 ┆ id2 ┆ id3 ┆ call_id │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╪═════════╡
# │ 1 ┆ 4 ┆ 9 ┆ null │
# │ 1 ┆ 5 ┆ 9 ┆ null │
# │ 1 ┆ 5 ┆ 9 ┆ null │
# │ 2 ┆ 5 ┆ 9 ┆ 2 │
# │ 2 ┆ 6 ┆ 9 ┆ 2 │
# │ 2 ┆ 7 ┆ 10 ┆ 7 │
# │ 3 ┆ 7 ┆ 11 ┆ 7 │
# │ 3 ┆ 7 ┆ 12 ┆ 7 │
# │ 3 ┆ 7 ┆ 13 ┆ 7 │
# │ 3 ┆ 8 ┆ 13 ┆ null │
# └─────┴─────┴─────┴─────────┘