juliadataframes.jl

How to subset rows with an OR condition in Julia DataFrames


I have a DataFrame and I want to filter the rows where column During_Cabg OR column During_Pci have a value of 1. Here's what I'm doing:

pci_or_cabg = @chain df begin
    select([:During_Cabg, :During_Pci] .=> ByRow(x -> coalesce.(x, 0)); renamecols=false)
    subset(:During_Cabg => ByRow(==(1)), :During_Pci => ByRow(==(1)))
end

The problem is that this line: ByRow(==(1)), :During_Pci => ByRow(==(1) seems to imply an AND not OR. The result I'm getting is values where BOTH columns are 1 (not what I want).

How to subset a DataFrame with multiple conditions (AND or OR) with multiple columns?

Thank you!


Solution

  • In subset AND condition is used. If you want to use OR you need to pass it in a single condition. A general way to do it would be:

    subset(df, AsTable(columns) => ByRow(x -> any(predicate, x)))
    

    or

    subset(df, columns => ByRow((x...) -> any(predicate, x)))
    

    if you want to apply the same predicate to all columns.

    If you want a shorter syntax consider using one of the meta-packages, for example with DataFramesMeta.jl you can write):

    @rsubset(df, :During_Cabg == 1 || :During_Pci == 1)
    

    which works nice with chaining.