I have a DataFrame and I want to filter the rows where column During_Cabg
OR column During_Pci
have a value of 1. Here's what I'm doing:
pci_or_cabg = @chain df begin
select([:During_Cabg, :During_Pci] .=> ByRow(x -> coalesce.(x, 0)); renamecols=false)
subset(:During_Cabg => ByRow(==(1)), :During_Pci => ByRow(==(1)))
end
The problem is that this line: ByRow(==(1)), :During_Pci => ByRow(==(1)
seems to imply an AND not OR. The result I'm getting is values where BOTH columns are 1 (not what I want).
How to subset a DataFrame with multiple conditions (AND or OR) with multiple columns?
Thank you!
In subset
AND condition is used. If you want to use OR you need to pass it in a single condition. A general way to do it would be:
subset(df, AsTable(columns) => ByRow(x -> any(predicate, x)))
or
subset(df, columns => ByRow((x...) -> any(predicate, x)))
if you want to apply the same predicate to all columns
.
If you want a shorter syntax consider using one of the meta-packages, for example with DataFramesMeta.jl you can write):
@rsubset(df, :During_Cabg == 1 || :During_Pci == 1)
which works nice with chaining.