My dataframe has 3 columns A
, B
and C
and for each row only one of these columns contains a value.
I want a MERGE
column that contains the values from A
or B
or C
using DataFrames
df = DataFrame(NAME = ["a", "b", "c"], A = [1, missing, missing], B = [missing, 2, missing], C = [missing, missing, 3])
3×4 DataFrame
│ Row │ NAME │ A │ B │ C │
│ │ String │ Int64? │ Int64? │ Int64? │
├─────┼────────┼─────────┼─────────┼─────────┤
│ 1 │ a │ 1 │ missing │ missing │
│ 2 │ b │ missing │ 2 │ missing │
│ 3 │ c │ missing │ missing │ 3 │
How the best julia way to get the MERGE
column?
3×5 DataFrame
│ Row │ NAME │ A │ B │ C │ MERGE │
│ │ String │ Int64? │ Int64? │ Int64? │ Int64 │
├─────┼────────┼─────────┼─────────┼─────────┼───────┤
│ 1 │ a │ 1 │ missing │ missing │ 1 │
│ 2 │ b │ missing │ 2 │ missing │ 2 │
│ 3 │ c │ missing │ missing │ 3 │ 3 │
What I was able to work out so far is:
select(df, :, [:A, :B, :C] => ByRow((a,b,c) -> sum(skipmissing([a, b, c]))) => :MERGE)
What about a scenario for which there is a variable range of columns?
select(df, range => ??? => :MERGE)
You can write it like this:
julia> transform!(df, [:A, :B, :C] => ByRow(coalesce) => :MERGE)
3×5 DataFrame
│ Row │ NAME │ A │ B │ C │ MERGE │
│ │ String │ Int64? │ Int64? │ Int64? │ Int64 │
├─────┼────────┼─────────┼─────────┼─────────┼───────┤
│ 1 │ a │ 1 │ missing │ missing │ 1 │
│ 2 │ b │ missing │ 2 │ missing │ 2 │
│ 3 │ c │ missing │ missing │ 3 │ 3 │
Instead of [:A, :B, :C]
you can put any selector, like All()
, Between(:A, :C)
, 1:3
etc.