I have a Julia data frame:
df=DataFrame("Category" => ["A", "B", "C"], "n" => [1,2,3])
3×2 DataFrame
Row │ Category n
│ String Int64
─────┼─────────────────
1 │ A 1
2 │ B 2
3 │ C 3
and I would like to generate a data frame, where each row of df is repeated n times like this:
df2=DataFrame("Category" => ["A", "B","B","C","C","C"])
6×1 DataFrame
Row │ Category
│ String
─────┼──────────
1 │ A
2 │ B
3 │ B
4 │ C
5 │ C
6 │ C
I wrote a function that works fine, but I assume there is a more elegant way to do this. Here is my function:
function repeat_df_rows(df)
@eval function Base.repeat(df_row::DataFrameRow{DataFrame, DataFrames.Index}; inner::Int64)
rows = repeat(DataFrame(df_row), inner)
end
dfs = map(x -> repeat(x; inner = x.:n), eachrow(df))
result = vcat(dfs..., cols=:union)
result = result[:,Not(:n)]
end
Another problem with this function is that it always throws error at first attempt when run in script - I assume it is because expression after @eval
macro is not executing immediately.
Using @eval
is not recommended for regular data wrangling tasks. Here is an alternative method:
Define:
spread(vals,cnts) =
[v for (v,c) in zip(vals, cnts) for i in 1:c]
and now:
julia> combine(df, [:Category, :n] => spread => :Cateogry)
6×1 DataFrame
Row │ Cateogry
│ String
─────┼──────────
1 │ A
2 │ B
3 │ B
4 │ C
5 │ C
6 │ C
or (for all columns including n
):
julia> combine(df, All() .=> (x -> spread(x, df.n)) .=> All())
6×2 DataFrame
Row │ Category n
│ String Int64
─────┼─────────────────
1 │ A 1
2 │ B 2
3 │ B 2
4 │ C 3
5 │ C 3
6 │ C 3