I have a dataframe with 2 columns, where first column contains lists, and second column integer indexes.
How to get elements from first column by index specified in second column? Or even better, put that element in 3rd column.
Input example
df = pl.DataFrame({
"lst": [[1, 2, 3], [4, 5, 6]],
"ind": [1, 2]
})
┌───────────┬─────┐
│ lst ┆ ind │
│ --- ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 1 │
│ [4, 5, 6] ┆ 2 │
└───────────┴─────┘
Expected output.
res = df.with_columns(pl.Series("list[ind]", [2, 6]))
┌───────────┬─────┬───────────┐
│ lst ┆ ind ┆ list[ind] │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ i64 │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1 ┆ 2 │
│ [4, 5, 6] ┆ 2 ┆ 6 │
└───────────┴─────┴───────────┘
Thanks.
Update: This can now be done more easily by
df.with_columns(pl.col("lst").list.get(pl.col("ind")).alias("list[ind]"))
Original answer
You can use with_row_index()
to add a row index column for grouping, then explode()
the list so each list element is on each row. Then call gather()
over the row index column using over()
to select the element from the subgroup.
df = pl.DataFrame({"lst": [[1, 2, 3], [4, 5, 6]], "ind": [1, 2]})
df = (
df.with_row_index()
.with_columns(
pl.col("lst").explode().gather(pl.col("ind")).over(pl.col("index")).alias("list[ind]")
)
.drop("index")
)
shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst ┆ ind ┆ list[ind] │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ i64 │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1 ┆ 2 │
│ [4, 5, 6] ┆ 2 ┆ 6 │
└───────────┴─────┴───────────┘