Given a dataset like this:
Row │ name num_fruits fruits
│ String Int64 Array…
─────┼────────────────────────────────────────────────────────
1 │ Alice 2 ["cherry", "apple"]
2 │ Bob 3 ["grape", "apple", "elderberry"]
3 │ Charlie 5 ["apple", "apple", "elderberry",…
How would one turn each value in the array column "fruits" into it's own row, similar to pivot_longer in R? I tried stack
, but it didn't do anything, except creating a column where every value was the string "fruits"
Is there something in Julia to make the data longer, or do I have to use a for loop + append?
Desired output:
10×3 DataFrame
Row │ name num_fruits fruit
│ String Int64 String
─────┼─────────────────────────────────
1 │ Alice 2 cherry
2 │ Alice 2 apple
3 │ Bob 3 grape
4 │ Bob 3 apple
5 │ Bob 3 elderberry
6 │ Charlie 5 apple
7 │ Charlie 5 apple
8 │ Charlie 5 elderberry
9 │ Charlie 5 cherry
10 │ Charlie 5 apple
Code to reproduce the data:
using Random, DataFrames
Random.seed!(0)
df = DataFrame(name = ["Alice", "Bob", "Charlie"],
num_fruits = [2, 3, 5])
fruits = ["apple", "banana", "cherry", "durian", "elderberry", "fig", "grape"]
df[!, :fruits] = [rand(fruits, n) for n in df.num_fruits]
stack(df, :fruits) # thing I tried
You want to flatten
your data frame:
julia> flatten(df, :fruits)
10×3 DataFrame
Row │ name num_fruits fruits
│ String Int64 String
─────┼─────────────────────────────────
1 │ Alice 2 cherry
2 │ Alice 2 apple
3 │ Bob 3 grape
4 │ Bob 3 apple
5 │ Bob 3 elderberry
6 │ Charlie 5 apple
7 │ Charlie 5 apple
8 │ Charlie 5 elderberry
9 │ Charlie 5 cherry
10 │ Charlie 5 apple
Note that:
stack
stacks multiple columns into a one columnflatten
flattens a single column containing iterable values into the same columnNote that the same is done by Iterators.flatten
in Base Julia:
julia> collect(Iterators.flatten([1:2, 3:4, 5:6]))
6-element Vector{Int64}:
1
2
3
4
5
6