rvectorsapply

String vector to named string vector in R? Names are part of the strings


I have a string vector in R:

c("apple pie {we have some text here}", "banana{something{something}}", "cherry {asd9asdjsaf}", "banana {monkey})
[1] "apple pie {we have some text here}" "banana {something{something}}"     
[3] "cherry {asd9asdjsaf}"               "banana {monkey}"           

I would like to make this into a named string vector so that the FIRST opening curly bracket acts as a separator character between the name and the corresponding element, but it is also part of the element AND if there are duplicated names the contents under the same name would be joined with newline so that:

                          apple pie                               banana  
"apple pie {we have some text here}"   "{something{something}}\n{monkey}" 
                             cherry  
              "cherry {asd9asdjsaf}"

This can be achieved using a regular expressions and iteration (such as sapply, loop etc.):

library(dplyr)

elemNames <- originalvector %>% gsub("\\{.*", "", .) #remove "{"-character and everything after it
elems <- originalvector %>% sub(".*?\\{", "{", .) #replace "{"-character and everything before it with just "{"-character

names(elems) <- elemNames

newvector <- sapply(unique(elemNames), \(elemName) {
    elems[grep(elemName, names(elems))] %>% {paste(.,collapse = "\n")}
  }) %>% setNames(unique(elemNames))

However, I was wondering whether there is a more elegant solutions (possibly a one-liner) to do this? My initial solution looks so ugly and complicated. :)


Solution

  • You can simplify this using tapply():

    elemNames <- gsub("\\s?\\{.*", "", originalvector)
    elems <- sub(".*?\\{", "{", originalvector)
    tapply(elems, elemNames, paste, collapse='\n')
    #                  apple pie                             banana 
    # "{we have some text here}" "{something{something}}\n{monkey}" 
    #                     cherry 
    #            "{asd9asdjsaf}" 
    

    I slightly modified your first regular expression so that a space is removed after the element name when present.