dataframejuliadataframes.jl

Turn each value in array column into it's own row (Equivalent of pivot_longer for Dataframes.jl)


Given a dataset like this:

 Row │ name     num_fruits  fruits                            
     │ String   Int64       Array…                            
─────┼────────────────────────────────────────────────────────
   1 │ Alice             2  ["cherry", "apple"]
   2 │ Bob               3  ["grape", "apple", "elderberry"]
   3 │ Charlie           5  ["apple", "apple", "elderberry",…

How would one turn each value in the array column "fruits" into it's own row, similar to pivot_longer in R? I tried stack, but it didn't do anything, except creating a column where every value was the string "fruits"

Is there something in Julia to make the data longer, or do I have to use a for loop + append?

Desired output:

10×3 DataFrame
 Row │ name     num_fruits  fruit      
     │ String   Int64       String     
─────┼─────────────────────────────────
   1 │ Alice             2  cherry
   2 │ Alice             2  apple
   3 │ Bob               3  grape
   4 │ Bob               3  apple
   5 │ Bob               3  elderberry
   6 │ Charlie           5  apple
   7 │ Charlie           5  apple
   8 │ Charlie           5  elderberry
   9 │ Charlie           5  cherry
  10 │ Charlie           5  apple

Code to reproduce the data:

using Random, DataFrames
Random.seed!(0)

df = DataFrame(name = ["Alice", "Bob", "Charlie"],
               num_fruits = [2, 3, 5])


fruits = ["apple", "banana", "cherry", "durian", "elderberry", "fig", "grape"]
df[!, :fruits] = [rand(fruits, n) for n in df.num_fruits]

stack(df, :fruits) # thing I tried

Solution

  • You want to flatten your data frame:

    julia> flatten(df, :fruits)
    10×3 DataFrame
     Row │ name     num_fruits  fruits
         │ String   Int64       String
    ─────┼─────────────────────────────────
       1 │ Alice             2  cherry
       2 │ Alice             2  apple
       3 │ Bob               3  grape
       4 │ Bob               3  apple
       5 │ Bob               3  elderberry
       6 │ Charlie           5  apple
       7 │ Charlie           5  apple
       8 │ Charlie           5  elderberry
       9 │ Charlie           5  cherry
      10 │ Charlie           5  apple
    

    Note that:

    Note that the same is done by Iterators.flatten in Base Julia:

    julia> collect(Iterators.flatten([1:2, 3:4, 5:6]))
    6-element Vector{Int64}:
     1
     2
     3
     4
     5
     6