Question: Is there a way for me to separate multiple nested columns and clean them all at the same time using base R and tidyverse? I've checked the other nested responses here, but nothing seems to fit.
My dataframe has nested data that follows the following format.
v1 is type = double, v2 is type = character
my_df
v1 v2
1 1234 {"484391115444":192,"484391115442":61}
2 1235 {"484391115432":52,"484391115443":24}
3 1236 {"484391115302":11,"484391115161":10}
I am able to separate (up to 10) using tidyr::separate as:
my_df <- tidyr::separate(my_df, col=v2, into=c("v2_01", "v2_02"), sep = ",")
In the actual data, v2 is of varying length, meaning amount the nested data is different for each row. This is not an issue as I really only need the first 10 values of v2 stored in the column, after I separate, the rest is truncated, along with the ending } character. So far so good.
This produces:
v1 v2_01 v2_02
1 1234 {"484391115444":192 "484391115442":61
2 1235 {"484391115432":52 "484391115443":24
3 1236 {"484391115302":11 "484391115161":10
Now I need to clean it (remove the { and " characters) and separate again on the colon. I can do this as individual commands:
my_df$v2_01 <- gsub("\\{","", as.character(my_df$v2_01))
my_df$v2_01 <- gsub('"',"", as.character(my_df$v2_01))
#separating on the ":" character
my_df <- tidyr::separate(my_df, col=v2_01, into=c("v2_01", "v2_01_Num" ,sep = ":"))
This produces:
v1 v2_01 v2_01_Num v2_02 v2_02_Num :
1 1234 484391115444 92 484391115442 61 :
2 1235 484391115432 52 484391115443 24 :
3 1236 484391115302 11 484391115161 10 :
So far, meh, I do get the ":" column now. This is functional at best.
My question is the following: Is there a way for me to separate multiple columns and clean them all at the same time? I'm also getting the ":" column now too as output. I'm a little unfamiliar with looping in R, but going through and separating 10 variables out column by column manually seems horribly tedious and inefficient. The above code does work, but I have a other dataframes where the columns with nested data are in the dozens.
maybe tidyverse's separate_rows?
Thank you in advance.
You essentially have JSON data in strings of dat$V2
, so you can use a package like jsonlite to import them in a sensible format:
library(jsonlite)
si <- stream_in(textConnection(dat$v2), simplifyVector=FALSE)
dat[paste("v2", seq_along(si[[1]]), sep="_")] <- t(sapply(si, names))
dat[paste("v2", seq_along(si[[1]]), "num", sep="_")] <- do.call(Map, c(c, si))
dat
## v1 v2 v2_1 v2_2 v2_1_num v2_2_num
##1 1234 {"484391115444":192,"484391115442":61} 484391115444 484391115442 192 61
##2 1235 {"484391115432":52,"484391115443":24} 484391115432 484391115443 52 24
##3 1236 {"484391115302":11,"484391115161":10} 484391115302 484391115161 11 10