
Is there any function in Julia to repeat each row of Julia data frame n times (where n varies across all rows)?

I have a Julia data frame:

df=DataFrame("Category" => ["A", "B", "C"], "n" => [1,2,3])
3×2 DataFrame
 Row │ Category  n     
     │ String    Int64 
   1 │ A             1
   2 │ B             2
   3 │ C             3

and I would like to generate a data frame, where each row of df is repeated n times like this:

df2=DataFrame("Category" => ["A", "B","B","C","C","C"])
6×1 DataFrame
 Row │ Category 
     │ String   
   1 │ A
   2 │ B
   3 │ B
   4 │ C
   5 │ C
   6 │ C

I wrote a function that works fine, but I assume there is a more elegant way to do this. Here is my function:

function repeat_df_rows(df)
    @eval function Base.repeat(df_row::DataFrameRow{DataFrame, DataFrames.Index}; inner::Int64)
        rows = repeat(DataFrame(df_row), inner)

    dfs = map(x -> repeat(x; inner = x.:n), eachrow(df))
    result = vcat(dfs..., cols=:union)
    result = result[:,Not(:n)]

Another problem with this function is that it always throws error at first attempt when run in script - I assume it is because expression after @eval macro is not executing immediately.


  • Using @eval is not recommended for regular data wrangling tasks. Here is an alternative method:


    spread(vals,cnts) = 
      [v for (v,c) in zip(vals, cnts) for i in 1:c]

    and now:

    julia> combine(df, [:Category, :n] => spread => :Cateogry)
    6×1 DataFrame
     Row │ Cateogry 
         │ String   
       1 │ A
       2 │ B
       3 │ B
       4 │ C
       5 │ C
       6 │ C

    or (for all columns including n):

    julia> combine(df, All() .=> (x -> spread(x, df.n)) .=> All())
    6×2 DataFrame
     Row │ Category  n     
         │ String    Int64 
       1 │ A             1
       2 │ B             2
       3 │ B             2
       4 │ C             3
       5 │ C             3
       6 │ C             3