dataframejuliadataframes.jl

vcat DataFrame columns based on multiple columns in Julia


I have 3 DataFrames, each containing 3 columns: A, B, and C.

using DataFrames    
common_data = Dict("A" => [1, 2, 3], "B" => [10, 20, 30])
df1 = DataFrame(merge(common_data, Dict("C" => [100, 200, 300])))
df2 = DataFrame(merge(common_data, Dict("C" => [400, 500, 600])))
df3 = DataFrame(merge(common_data, Dict("C" => [700, 800, 900])))

I consider columns A and B as indices and want to perform an inner join on 3 DataFrames based on column C. This should be done only when the values in columns A and B of each DataFrame are the same. The column of the final output should be [A,B,C_df1,C_df2,C_df3]. How can I achieve this?


Solution

  • Just do innerjoin:

    julia> innerjoin(df1, df2, df3, on=[:A, :B], makeunique=true)
    3×5 DataFrame
     Row │ A      B      C      C_1    C_2
         │ Int64  Int64  Int64  Int64  Int64
    ─────┼───────────────────────────────────
       1 │     1     10    100    400    700
       2 │     2     20    200    500    800
       3 │     3     30    300    600    900
    
    

    it will auto-generate different column names than you want though.

    So you can either preprocess your data frames:

    julia> innerjoin(rename.([df1, df2, df3], :C .=> [:C_df1, :C_df2, :C_df3])...,
                     on=[:A, :B])
    3×5 DataFrame
     Row │ A      B      C_df1  C_df2  C_df3
         │ Int64  Int64  Int64  Int64  Int64
    ─────┼───────────────────────────────────
       1 │     1     10    100    400    700
       2 │     2     20    200    500    800
       3 │     3     30    300    600    900
    

    or post-process the output:

    julia> rename(innerjoin(df1, df2, df3, on=[:A, :B], makeunique=true),
                  :C => :C_df1, :C_1 => :C_df2, :C_2 => :C_df3)
    3×5 DataFrame
     Row │ A      B      C_df1  C_df2  C_df3
         │ Int64  Int64  Int64  Int64  Int64
    ─────┼───────────────────────────────────
       1 │     1     10    100    400    700
       2 │     2     20    200    500    800
       3 │     3     30    300    600    900