rdplyrtidyversestringrtidytable

Repeating certain part of string conditionally


I want to repeat certain part of string between ] and ; as the number of elements separated by ; preceding within []. So the desired output for [A1, AB11; A2, AB22] I1, C1 would be [A1, AB11] I1, C1; [A2, AB22] I1, C1. Any hints to start with. Thanks

df1 <-
  data.frame(
   String = c(
    "[A1, AB11; A2, AB22] I1, C1; [A3, AB33] I3, C1"
  , "[A4, AB44] I4, C4; [A5, AB55; A6, AB66; A7, AB77] I7, C7"
  )
  )
df1

                                                    String
1           [A1, AB11; A2, AB22] I1, C1; [A3, AB33] I3, C1
2 [A4, AB44] I4, C4; [A5, AB55; A6, AB66; A7, AB77] I7, C7


df2 <-
  data.frame(
   String = c(
    "[A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1"
  , "[A4, AB44] I4, C4; [A5, AB55] I7, C7;[A6, AB66] I7, C7; [A7, AB77] I7, C7"
  )
  )

df2

                                                                     String
1                   [A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1
2 [A4, AB44] I4, C4; [A5, AB55] I7, C7;[A6, AB66] I7, C7; [A7, AB77] I7, C7

Solution

  • Not the tidiest solution however it's using stringr

    str_split(df1$String, ";(?= *\\[)") %>%
      map(str_match, "\\[(.+?)\\] (.+)") %>%
       map( ~ paste(unlist(map2(paste0(str_split(.x[,2], "; ?")), .x[,3], ~ paste0("[", .x,"] ",.y ))), collapse="; ")) 
    

    somewhat nicer solution:

    as_tibble(df1) %>%
      mutate(splits=str_split(String, "; *(?=\\[)")) %>%
       unnest_longer(col=splits) %>%
        mutate(splits=map(str_split(splits,"\\[|\\] ?"), str_split, "; ?"))  %>%
         unnest_wider(splits) %>%
          mutate(val=map2(...2, ...3, ~ paste0("[", .x ,"] ", .y, collapse="; ") )) %>%
           group_by(String) %>%
            summarise(val=paste0(val, collapse="; "))
    # A tibble: 2 x 2
      String                             val                                        
      <fct>                              <chr>
    1 [A1, AB11; A2, AB22] I1, C1; [A3,… [A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1
    2 [A4, AB44] I4, C4; [A5, AB55; A6,… [A4, AB44] I4, C4; [A5, AB55] I7, C7; [A6, AB66] I7, C7; [A7, AB77] I7, C7