Count occurrences of multiple values across multiple columns

I am having issues with making a special count variable in r. I have seen that I can use something like this to just do a general count:

 D <- D %>% 
    group_by(X1) %>% 
    mutate(S1 = n())

I would like to have four different variables (S1, S2, S3 and S4) using dplyr. Each variable must count occurrences by scanning the columns (X1-X5) and count an occurrence as ONLY 0 or 1. For example, I would like S1 to be a variable that tells me that R has scanned X1 to X5 and counts if the value of '1' occurred at all, and computes 1 for S1 if an occurrence happened in the X1 to X5 columns, and computes 0 if the occurrence did NOT happen.

Here is an example of my data:

  structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), X1 = c(1, 
NA, NA, NA, NA, NA, 1, 1, 1, 1), X2 = c(2, 1, 1, 1, 1, NA, 2, 
NA, NA, 2), X3 = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA), X4 = c(4, 
2, NA, NA, NA, NA, 3, NA, 2, 3), X5 = c(5, NA, 2, 2, 2, NA, NA, 
NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

Solution

Adapting from the approach here: https://stackoverflow.com/a/74732921/6851825

df |>
  rowwise() |>
  mutate(S1 = 1*any(na.omit(c_across(X1:X5)) == 1),
         S2 = 1*any(na.omit(c_across(X1:X5)) == 2),
         S3 = 1*any(na.omit(c_across(X1:X5)) == 3),
         S4 = 1*any(na.omit(c_across(X1:X5)) == 4)) |>
  ungroup()

Result


# A tibble: 10 × 10
      ID    X1    X2    X3    X4    X5    S1    S2    S3    S4
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1     2    NA     4     5     1     1     0     1
 2     2    NA     1    NA     2    NA     1     1     0     0
 3     3    NA     1    NA    NA     2     1     1     0     0
 4     4    NA     1    NA    NA     2     1     1     0     0
 5     5    NA     1    NA    NA     2     1     1     0     0
 6     6    NA    NA     1    NA    NA     1     0     0     0
 7     7     1     2    NA     3    NA     1     1     1     0
 8     8     1    NA    NA    NA    NA     1     0     0     0
 9     9     1    NA    NA     2    NA     1     1     0     0
10    10     1     2    NA     3    NA     1     1     1     0

Alternatively, we could reshape with tidyr::pivot_longer/wider to avoid needing to make any formulas for specific values. This version creates an S5 column since 5 exists in the data.


left_join(df, df |>
  tidyr::pivot_longer(-ID, values_drop_na = TRUE) |>
  count(ID, value = paste0("S", value)) |>
  arrange(value) |>
  tidyr::pivot_wider(names_from = value, values_fn = ~1, values_fill = 0) |>
  select(-n))

Result


Joining with `by = join_by(ID)`
# A tibble: 10 × 11
      ID    X1    X2    X3    X4    X5    S1    S2    S3    S4    S5
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1     2    NA     4     5     1     1     0     1     1
 2     2    NA     1    NA     2    NA     1     1     0     0     0
 3     3    NA     1    NA    NA     2     1     1     0     0     0
 4     4    NA     1    NA    NA     2     1     1     0     0     0
 5     5    NA     1    NA    NA     2     1     1     0     0     0
 6     6    NA    NA     1    NA    NA     1     0     0     0     0
 7     7     1     2    NA     3    NA     1     1     1     0     0
 8     8     1    NA    NA    NA    NA     1     0     0     0     0
 9     9     1    NA    NA     2    NA     1     1     0     0     0
10    10     1     2    NA     3    NA     1     1     1     0     0