After splitting a string in multiple 'words', I want to add a new column with the amount of counted items .alias("count")
.
let df = df! [
"keys" => ["a ab", "a ab abc", "b ba abc abcd", "b ba bbc abcd bbcd"],
"groups" => ["A", "A", "B", "C"],
]?;
First I split the string:
let out = df.lazy().with_column(col("keys").str().split(lit(" ")));
And attempt the count:
let out_2 = out.with_columns([col("keys")
.apply(|s| Ok(s.len()), GetOutput::from_type(DataType::Int32))
.alias("count")]).collect().unwrap();
Which results in error message:
mismatched types
expected struct `polars::prelude::Series`, found `usize`
No idea how to proceed.
You can use the .list()
method to get a ListNameSpace
, which provides len
.
let out_2 = out
.with_columns([col("keys").list().len().alias("count")])
.collect()
.unwrap();
┌─────────────────────────┬────────┬───────┐
│ keys ┆ groups ┆ count │
│ --- ┆ --- ┆ --- │
│ list[str] ┆ str ┆ u32 │
╞═════════════════════════╪════════╪═══════╡
│ ["a", "ab"] ┆ A ┆ 2 │
│ ["a", "ab", "abc"] ┆ A ┆ 3 │
│ ["b", "ba", ... "abcd"] ┆ B ┆ 4 │
│ ["b", "ba", ... "bbcd"] ┆ C ┆ 5 │
└─────────────────────────┴────────┴───────┘