A dataset I am using recorded respondents' ethnicity. Responses are recorded across multiple variables, and respondents were allowed to pick more than one. Example:
Black White Asian Hispanic
1 NA NA NA
NA 1 NA NA
NA NA NA 1
NA NA 1 1
^^^In the last row, the respondent would have chosen Asian and Hispanic.
What I want to do is:
A) collapse these columns into a single ethnicity variable, with different numbers representing different ethnicities (i.e., black would be 1, white would be 2 etc.)
B) have it so that anyone who reported multiple columns gets designated "multiple".
I'm a bit of an R novice, so any help would be greatly appreciated!
One way to do this is to pivot your column names as a column, group values by respondent, then drop the NA values. Then just choose the ethnicity value that remains for each group, switching to "multiple" when necessary. Here's a way to do that with tidyverse:
library(tidyverse)
df %>%
rownames_to_column("respondent") %>%
pivot_longer(-respondent) %>%
group_by(respondent) %>%
filter(!is.na(value)) %>%
summarise(eth = ifelse(n() == 1, name, "multiple"))
# A tibble: 4 x 2
respondent eth
<chr> <chr>
1 1 Black
2 2 White
3 3 Hispanic
4 4 multiple
You won't be able to store numbers, as numeric types, with a string like "variable" - so you have a choice. Either stick with the ethnicity labels (like the solution above), or convert labels to numbers and then numbers to the string representations of those numbers. That seems a little unwieldy, but if you want to do that, here's how:
df %>%
rownames_to_column("respondent") %>%
pivot_longer(-respondent) %>%
mutate(eth_num = as.character(as.numeric(fct_inorder(name)))) %>%
group_by(respondent) %>%
filter(!is.na(value)) %>%
summarise(eth = ifelse(n() == 1, eth_num, "multiple"))
# A tibble: 4 x 2
respondent eth
<chr> <chr>
1 1 1
2 2 2
3 3 4
4 4 multiple