I am having issues with making a special count variable in r. I have seen that I can use something like this to just do a general count:
D <- D %>%
group_by(X1) %>%
mutate(S1 = n())
I would like to have four different variables (S1, S2, S3 and S4) using dplyr. Each variable must count occurrences by scanning the columns (X1-X5) and count an occurrence as ONLY 0 or 1. For example, I would like S1 to be a variable that tells me that R has scanned X1 to X5 and counts if the value of '1' occurred at all, and computes 1 for S1 if an occurrence happened in the X1 to X5 columns, and computes 0 if the occurrence did NOT happen.
Here is an example of my data:
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), X1 = c(1,
NA, NA, NA, NA, NA, 1, 1, 1, 1), X2 = c(2, 1, 1, 1, 1, NA, 2,
NA, NA, 2), X3 = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA), X4 = c(4,
2, NA, NA, NA, NA, 3, NA, 2, 3), X5 = c(5, NA, 2, 2, 2, NA, NA,
NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L))
Adapting from the approach here: https://stackoverflow.com/a/74732921/6851825
df |>
rowwise() |>
mutate(S1 = 1*any(na.omit(c_across(X1:X5)) == 1),
S2 = 1*any(na.omit(c_across(X1:X5)) == 2),
S3 = 1*any(na.omit(c_across(X1:X5)) == 3),
S4 = 1*any(na.omit(c_across(X1:X5)) == 4)) |>
ungroup()
Result
# A tibble: 10 × 10
ID X1 X2 X3 X4 X5 S1 S2 S3 S4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 NA 4 5 1 1 0 1
2 2 NA 1 NA 2 NA 1 1 0 0
3 3 NA 1 NA NA 2 1 1 0 0
4 4 NA 1 NA NA 2 1 1 0 0
5 5 NA 1 NA NA 2 1 1 0 0
6 6 NA NA 1 NA NA 1 0 0 0
7 7 1 2 NA 3 NA 1 1 1 0
8 8 1 NA NA NA NA 1 0 0 0
9 9 1 NA NA 2 NA 1 1 0 0
10 10 1 2 NA 3 NA 1 1 1 0
Alternatively, we could reshape with tidyr::pivot_longer/wider to avoid needing to make any formulas for specific values. This version creates an S5 column since 5 exists in the data.
left_join(df, df |>
tidyr::pivot_longer(-ID, values_drop_na = TRUE) |>
count(ID, value = paste0("S", value)) |>
arrange(value) |>
tidyr::pivot_wider(names_from = value, values_fn = ~1, values_fill = 0) |>
select(-n))
Result
Joining with `by = join_by(ID)`
# A tibble: 10 × 11
ID X1 X2 X3 X4 X5 S1 S2 S3 S4 S5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 NA 4 5 1 1 0 1 1
2 2 NA 1 NA 2 NA 1 1 0 0 0
3 3 NA 1 NA NA 2 1 1 0 0 0
4 4 NA 1 NA NA 2 1 1 0 0 0
5 5 NA 1 NA NA 2 1 1 0 0 0
6 6 NA NA 1 NA NA 1 0 0 0 0
7 7 1 2 NA 3 NA 1 1 1 0 0
8 8 1 NA NA NA NA 1 0 0 0 0
9 9 1 NA NA 2 NA 1 1 0 0 0
10 10 1 2 NA 3 NA 1 1 1 0 0