juliajulia-dataframe

How to get a new column that depends of a subset of dataframe columns


My dataframe has 3 columns A, B and C and for each row only one of these columns contains a value.

I want a MERGE column that contains the values from A or B or C

using DataFrames

df = DataFrame(NAME = ["a", "b", "c"], A = [1, missing, missing], B = [missing, 2, missing], C = [missing, missing, 3])

3×4 DataFrame
│ Row │ NAME   │ A       │ B       │ C       │
│     │ String │ Int64?  │ Int64?  │ Int64?  │
├─────┼────────┼─────────┼─────────┼─────────┤
│ 1   │ a      │ 1       │ missing │ missing │
│ 2   │ b      │ missing │ 2       │ missing │
│ 3   │ c      │ missing │ missing │ 3       │

How the best julia way to get the MERGE column?

3×5 DataFrame
│ Row │ NAME   │ A       │ B       │ C       │ MERGE │
│     │ String │ Int64?  │ Int64?  │ Int64?  │ Int64 │
├─────┼────────┼─────────┼─────────┼─────────┼───────┤
│ 1   │ a      │ 1       │ missing │ missing │ 1     │
│ 2   │ b      │ missing │ 2       │ missing │ 2     │
│ 3   │ c      │ missing │ missing │ 3       │ 3     │

What I was able to work out so far is:

select(df, :, [:A, :B, :C] => ByRow((a,b,c) -> sum(skipmissing([a, b, c]))) => :MERGE)

What about a scenario for which there is a variable range of columns?

select(df, range => ??? => :MERGE)

Solution

  • You can write it like this:

    julia> transform!(df, [:A, :B, :C] => ByRow(coalesce) => :MERGE)
    3×5 DataFrame
    │ Row │ NAME   │ A       │ B       │ C       │ MERGE │
    │     │ String │ Int64?  │ Int64?  │ Int64?  │ Int64 │
    ├─────┼────────┼─────────┼─────────┼─────────┼───────┤
    │ 1   │ a      │ 1       │ missing │ missing │ 1     │
    │ 2   │ b      │ missing │ 2       │ missing │ 2     │
    │ 3   │ c      │ missing │ missing │ 3       │ 3     │
    

    Instead of [:A, :B, :C] you can put any selector, like All(), Between(:A, :C), 1:3 etc.