I would like to perform a rowSums()
over specified columns. My trouble is that the desired columns to sum over are different for each row and given by another variable. The motivation is to check whether the three values before a given column are NA.
An example data frame can be built by
toy <- data.frame(id = c(rep(1, 10), rep(2, 10), rep(3, 10)),
day = c(1:10, 1:10, 1:10),
response = c(1, NA, NA, NA, 1, 1, 1, NA, NA, 1,
1, 1, NA, 1 ,1 ,1 ,1 ,1 ,1, 1,
NA, 1, NA, 1, 1, 1, NA, NA, NA, NA),
colnum = c(rep(5, 10), rep(7, 10), rep(10, 10)))
toy <- toy %>% pivot_wider(names_from = 2, values_from = 3, names_prefix = "day")
toy
In this example, the colnum
variable gives the day for which to check the previous three days. Looking at the data,
toy
# A tibble: 3 x 12
id colnum day1 day2 day3 day4 day5 day6 day7 day8 day9 day10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 5 1 NA NA NA 1 1 1 NA NA 1
2 2 7 1 1 NA 1 1 1 1 1 1 1
3 3 10 NA 1 NA 1 1 1 NA NA NA NA
I would want to create a new column in the data frame which would have values of FALSE
, TRUE
, FALSE
for ids 1, 2, and 3, respectively. Let's call it three_miss
. E.g., for id=1, we want to check if days 2 through 4 are all NA since colnum=5.
I tried a simple implementation of rowSums()
toy <- toy %>% mutate(three_miss = rowSums(select(., (colnum-1):(colnum+1)), na.rm = TRUE) == 3)
Using colnum-1:colnum+1
was deduced by which column numbers we would need. The code will run but gives FALSE
for all values of three_miss
, which is not the correct or desired output. I receive warning messages:
There were 2 warnings in `mutate()`.
The first warning was:
i In argument: `three_miss = ==...`.
Caused by warning in `x:y`:
! numerical expression has 3 elements: only the first used
I get the sense from this that either the select()
is causing the issue, or rowSums()
is producing more than one sum per row. I have tried adding group_by(id)
which does not help. I want to stay within the context of piping, but I am open to other solutions if this is not possible.
p.s. I understand there may be issues if colnum
is less than 4 or greater than 10. I was planning to amend this by using case_when()
. E.g.,
toy <- toy %>% mutate(three_miss =
case_when(
colnum <=3 ~ NA,
colnum > 10 ~ NA,
TRUE ~ code_that_works
)
)
If toy0
is the data before pivoting, you can do
toy0 %>%
filter(between(day, colnum-3, colnum-1)) %>%
summarize(three_wise = all(!is.na(response)), .by = id) %>%
left_join(toy, ., by = "id")
# # A tibble: 3 × 13
# id colnum day1 day2 day3 day4 day5 day6 day7 day8 day9 day10 three_wise
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
# 1 1 5 1 NA NA NA 1 1 1 NA NA 1 FALSE
# 2 2 7 1 1 NA 1 1 1 1 1 1 1 TRUE
# 3 3 10 NA 1 NA 1 1 1 NA NA NA NA FALSE