I have a survey dataset that includes self-reported ethnicity. Participants were allowed to select as many ethnicities as they wanted to. The data structure looks like this:
Hispanic English Indian
1 NA NA
NA 1 NA
NA NA 1
NA 1 1
1 1 1
What I want to do is create a new categorical ethnicity variable where the column names take the place of the 1s above. In addition, if someone selected more than one ethnicity, then the categorical ethnicity variable should include both, like this:
Hispanic English Indian Ethnicity
1 NA NA Hispanic
NA 1 NA English
NA NA 1 Indian
NA 1 1 English_Indian
1 1 1 Hispanic_English_Indian
We can use apply
to loop over the rows (MARGIN = 1
), then paste
the names
of the row values that are not an NA
df1$Ethnicity <- apply(df1, 1, function(x)
paste(names(x)[!is.na(x)], collapse= "_"))
-output
df1
Hispanic English Indian Ethnicity
1 1 NA NA Hispanic
2 NA 1 NA English
3 NA NA 1 Indian
4 NA 1 1 English_Indian
5 1 1 1 Hispanic_English_Indian
df1 <- structure(list(Hispanic = c(1L, NA, NA, NA, 1L),
English = c(NA,
1L, NA, 1L, 1L), Indian = c(NA, NA, 1L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-5L))