set.seed(0)
data = data.frame(ID = 1:1000, X1=runif(1000), X2=runif(1000), DROP1=sample(0:1,r=T),DROP2=sample(0:1,r=T),DROP3=sample(0:1,r=T))
Say this is my data. I wish to do this: count the number of values of DROP1 that equals to 1; then count the number of values of DROP2 among the cases when DROP1 equals to 1; then count the number of values of DROP3 equals to 1 among the cases when DROP2 equals to 1 and DROP1 equals to 1. I can do this manually but our real data file is large and has 80+ DROP variables. The desirable output is just a print out that looks:
DROP1, #
DROP2 (AFTER DROP1), #
DROP3 (AFTER DROP1 & DROP2), #
Here is an option with base R
where we get the 'DROP' column names ('nm1') using grep
. Then loop over the sequence of those, get the seq
of those, subset the data columns, use Reduce
to get a logical vector with &
(only TRUE if we have all the columns with 1 for a row, i.e 1=> TRUE, 0 => FALSE), and get the sum
of those elements to return the count
nm1 <- grep('^DROP', names(data), value = TRUE)
sapply(seq_along(nm1), function(i) {i1 <- seq(i)
sum(Reduce(`&`, data[nm1[i1]])) })
#[1] 503 249 137
Or with data.table
library(data.table)
setDT(data)
lapply(seq_along(nm1), function(i) {
i1 <- seq(i)
data[, sum(Reduce(`&`, .SD)), .SDcols = nm1[i1]]
})
set.seed(0)
data <- data.frame(ID = 1:1000, X1=runif(1000), X2=runif(1000),
DROP1=sample(0:1,1000, replace = TRUE),
DROP2=sample(0:1,1000, replace = TRUE),
DROP3=sample(0:1,1000,replace = TRUE))