rregexstringdataframetidyr

Separting alphanumeric string using tidyr separate wider regex


I have the following data,

id <- c("case1", "case19", "case88", "case77")
vec <- c("One_20 (19)",
         "tWo_20 (290)",
         "Three_38 (399)",
         NA)

df <- data.frame(id, vec)

> df
      id            vec
1  case1    One_20 (19)
2 case19   tWo_20 (290)
3 case88 Three_38 (399)
4 case77           <NA>

I want to separte the vec vector into two variables, namely: txt and num. I am preferring to use tidyr in this way,

df |> tidyr::separate_wider_regex(vec, 
                                   c(txt = "[A-Za-z]+", num = "\\d+"),
                                   too_few = "align_start")
# A tibble: 4 × 3
  id     txt   num  
  <chr>  <chr> <chr>
1 case1  One   NA   
2 case19 tWo   NA   
3 case88 Three NA   
4 case77 NA    NA  

However, it is not what I want. I have the following expection:

      id      txt num
1  case1   One_20  19
2 case19   tWo_20 290
3 case88 Three_38 399
4 case77     <NA>  NA

I am doing mistakes in the regex part. Any help to correct those mistakes so that I can have the expected table as output?


Solution

  • Try

    > df %>%
    +     separate_wider_regex(vec,
    +         c(txt = "\\w+", "\\s+\\(", num = "\\d+","\\)"),
    +         too_few = "align_start"
    +     )
    # A tibble: 4 × 3
      id     txt      num  
      <chr>  <chr>    <chr>
    1 case1  One_20   19
    2 case19 tWo_20   290
    3 case88 Three_38 399
    4 case77 NA       NA