I've got data which saved all the survey question responses to the same cell. I'm trying to find a way to split the data into multiple columns and use the part of each string until a set delimiter as the column name. The table looks like this at the moment
ID | Response |
---|---|
1 | "gender":"Female","occupation":"student","handed":"Left" |
2 | "gender":"Female","occupation":"Reporter","handed":"Right" |
df <- structure(list(ID = 1:2, Response = c("\"gender\":\"Female\",\"occupation\":\"student\",\"handed\":\"Left\"",
"\"gender\":\"Female\",\"occupation\":\"Reporter\",\"handed\":\"Right\""
)), class = "data.frame", row.names = c(NA, -2L))
and I would like it to look like this
ID | gender | occupation | handed |
---|---|---|---|
1 | "Female" | "student" | "Left" |
2 | "Female" | "Reporter" | "Right" |
I've managed to separate the question and response into one column with this
df<- df%>%
mutate(NUM = row_number()) #row number is used as a key for merge
dfdelimted<-
str_split_fixed(df$response, ',"', 50)
dfdelimted<-data.frame(dfdelimted)
dfdelimted<- dfdelimted%>%
mutate(NUM = row_number()) #row number is used as a key for merge
df <- merge(df,dfdelimted, by="NUM")
Which outputs this:
ID | v1 | v2 | v3 |
---|---|---|---|
1 | "gender":"Female" | "occupation":"student" | "handed":"Left" |
2 | "gender":"Female" | "occupation":"Reporter" | "handed":"Right" |
I can't seem to figure out how to word this for a google search so any help would be great!
Here's how you can do it with separate_longer_delim()
. Basically what we're doing is:
library("tidyverse")
df |>
separate_longer_delim(Response, delim = ",") |>
separate_wider_regex(Response, patterns = c("\"", "name" = ".*?", "\":\"","value" = ".*?", "\"")) |>
pivot_wider()
Output:
# A tibble: 2 × 4
ID gender occupation handed
<int> <chr> <chr> <chr>
1 1 Female student Left
2 2 Female Reporter Right
Maybe there's a way of creating the columns directly. Of course, you could extract them with regex, but I didn't want to presume that the values in the Response column will never change, but if are constant, then that would be the more straightforward approach.