rustcountingrust-polars

Polars add column with counted items from series of list[str]


After splitting a string in multiple 'words', I want to add a new column with the amount of counted items .alias("count").

let df = df! [
        "keys" => ["a ab", "a ab abc", "b ba abc abcd", "b ba bbc abcd bbcd"],
        "groups" => ["A", "A", "B", "C"],
    ]?;

First I split the string:

let out = df.lazy().with_column(col("keys").str().split(lit(" ")));

And attempt the count:

let out_2 = out.with_columns([col("keys")
      .apply(|s| Ok(s.len()), GetOutput::from_type(DataType::Int32))
      .alias("count")]).collect().unwrap();

Which results in error message:

mismatched types
expected struct `polars::prelude::Series`, found `usize`

No idea how to proceed.


Solution

  • You can use the .list() method to get a ListNameSpace, which provides len.

    let out_2 = out
            .with_columns([col("keys").list().len().alias("count")])
            .collect()
            .unwrap();
    
    ┌─────────────────────────┬────────┬───────┐
    │ keys                    ┆ groups ┆ count │
    │ ---                     ┆ ---    ┆ ---   │
    │ list[str]               ┆ str    ┆ u32   │
    ╞═════════════════════════╪════════╪═══════╡
    │ ["a", "ab"]             ┆ A      ┆ 2     │
    │ ["a", "ab", "abc"]      ┆ A      ┆ 3     │
    │ ["b", "ba", ... "abcd"] ┆ B      ┆ 4     │
    │ ["b", "ba", ... "bbcd"] ┆ C      ┆ 5     │
    └─────────────────────────┴────────┴───────┘