dataframerustexplodepython-polarsrust-polars

Rust Polars: Is it possible to explode a list column into multiple columns?


I have a function which returns a list type column. Hence, one of my columns is a list. I'd like to turn this list column into multiple columns. For example:

use polars::prelude::*;
use polars::df;

fn main() {
    let s0 = Series::new("a", &[1i64, 2, 3]);
    let s1 = Series::new("b", &[1i64, 1, 1]);
    let s2 = Series::new("c", &[Some(2i64), None, None]);
    // construct a new ListChunked for a slice of Series.
    let list = Series::new("foo", &[s0, s1, s2]);

    // construct a few more Series.
    let s0 = Series::new("Group", ["A", "B", "A"]);
    let s1 = Series::new("Cost", [1, 1, 1]);
    let df = DataFrame::new(vec![s0, s1, list]).unwrap();

    dbg!(df);

At this stage DF looks like this:

┌───────┬──────┬─────────────────┐
│ Group ┆ Cost ┆ foo             │
│ ---   ┆ ---  ┆ ---             │
│ str   ┆ i32  ┆ list [i64]      │
╞═══════╪══════╪═════════════════╡
│ A     ┆ 1    ┆ [1, 2, 3]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ [1, 1, 1]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ [2, null, null] │

Question From here, I'd like to get:

┌───────┬──────┬─────┬──────┬──────┐
│ Group ┆ Cost ┆ a   ┆ b    ┆ c    │
│ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  │
│ str   ┆ i32  ┆ i64 ┆ i64  ┆ i64  │
╞═══════╪══════╪═════╪══════╪══════╡
│ A     ┆ 1    ┆ 1   ┆ 2    ┆ 3    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ 1   ┆ 1    ┆ 1    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ 2   ┆ null ┆ null │

So I need something like .explode() but column-wise orient. Is there an existent funciton for this or a workaround potentially?

Many thanks


Solution

  • Yes you can. Via polars lazy, we get access the to the expression API and we can use the list() namespace, to get elements by index.

    let out = df
        .lazy()
        .select([
            all().exclude(["foo"]),
            col("foo").list().get(0).alias("a"),
            col("foo").list().get(1).alias("b"),
            col("foo").list().get(2).alias("c"),
        ])
        .collect()?;
    dbg!(out);
    
    ┌───────┬──────┬─────┬──────┬──────┐
    │ Group ┆ Cost ┆ a   ┆ b    ┆ c    │
    │ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  │
    │ str   ┆ i32  ┆ i64 ┆ i64  ┆ i64  │
    ╞═══════╪══════╪═════╪══════╪══════╡
    │ A     ┆ 1    ┆ 1   ┆ 2    ┆ 3    │
    │ B     ┆ 1    ┆ 1   ┆ 1    ┆ 1    │
    │ A     ┆ 1    ┆ 2   ┆ null ┆ null │
    └───────┴──────┴─────┴──────┴──────┘