Let's say I have this dataframe
> df
mr_daterd mr_daterd_fu1 mr_daterd_fu2
1 2018-03-05 2018-03-05 <NA>
2 2019-05-04 <NA> 2020-03-05
3 2020-01-03 2020-06-06 2021-04-02
Each row represent a patient and the dates represent MRI scans. I want to count the number of MRI scans per row, i.e. rowSums()
of non-missing values. However, some patient had several scans at the same date. Therefore, the rowSums()
should only count unique non-missing values.
Eg, using
df_new <- df %>%
mutate(
n_mri = rowSums(!is.na(select(., contains('mr_daterd'))))
)
Gives
> df_new
mr_daterd mr_daterd_fu1 mr_daterd_fu2 n_mri
1 2018-03-05 2018-03-05 <NA> 2
2 2019-05-04 <NA> 2020-03-05 2
3 2020-01-03 2020-06-06 2021-04-02 3
The n_mri
for row 1 should be 1, and not 2, because 2018-03-05
is duplicated in mr_daterd
and mr_daterd_fu1
.
Expected output:
> df_new
mr_daterd mr_daterd_fu1 mr_daterd_fu2 n_mri
1 2018-03-05 2018-03-05 <NA> 1
2 2019-05-04 <NA> 2020-03-05 2
3 2020-01-03 2020-06-06 2021-04-02 3
Data
df <- structure(list(mr_daterd = structure(c(17595, 18020, 18264), class = "Date"),
mr_daterd_fu1 = structure(c(17595, NA, 18419), class = "Date"),
mr_daterd_fu2 = structure(c(NA, 18326, 18719), class = "Date")), class = "data.frame", row.names = c(NA,
-3L))
dplyr
solution using n_distinct
and c_across
.
df %>%
rowwise %>%
mutate(n_mri = n_distinct(
c_across(contains('mr_daterd')),
na.rm=TRUE)) %>%
ungroup()
# A tibble: 3 × 4
# Rowwise:
mr_daterd mr_daterd_fu1 mr_daterd_fu2 n_mri
<date> <date> <date> <int>
1 2018-03-05 2018-03-05 NA 1
2 2019-05-04 NA 2020-03-05 2
3 2020-01-03 2020-06-06 2021-04-02 3