I have 3 DataFrames, each containing 3 columns: A, B, and C.
using DataFrames
common_data = Dict("A" => [1, 2, 3], "B" => [10, 20, 30])
df1 = DataFrame(merge(common_data, Dict("C" => [100, 200, 300])))
df2 = DataFrame(merge(common_data, Dict("C" => [400, 500, 600])))
df3 = DataFrame(merge(common_data, Dict("C" => [700, 800, 900])))
I consider columns A and B as indices and want to perform an inner join on 3 DataFrames based on column C. This should be done only when the values in columns A and B of each DataFrame are the same. The column of the final output should be [A,B,C_df1,C_df2,C_df3]
. How can I achieve this?
Just do innerjoin:
julia> innerjoin(df1, df2, df3, on=[:A, :B], makeunique=true)
3×5 DataFrame
Row │ A B C C_1 C_2
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 10 100 400 700
2 │ 2 20 200 500 800
3 │ 3 30 300 600 900
it will auto-generate different column names than you want though.
So you can either preprocess your data frames:
julia> innerjoin(rename.([df1, df2, df3], :C .=> [:C_df1, :C_df2, :C_df3])...,
on=[:A, :B])
3×5 DataFrame
Row │ A B C_df1 C_df2 C_df3
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 10 100 400 700
2 │ 2 20 200 500 800
3 │ 3 30 300 600 900
or post-process the output:
julia> rename(innerjoin(df1, df2, df3, on=[:A, :B], makeunique=true),
:C => :C_df1, :C_1 => :C_df2, :C_2 => :C_df3)
3×5 DataFrame
Row │ A B C_df1 C_df2 C_df3
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 10 100 400 700
2 │ 2 20 200 500 800
3 │ 3 30 300 600 900