I have the following data,
id <- c("case1", "case19", "case88", "case77")
vec <- c("One_20 (19)",
"tWo_20 (290)",
"Three_38 (399)",
NA)
df <- data.frame(id, vec)
> df
id vec
1 case1 One_20 (19)
2 case19 tWo_20 (290)
3 case88 Three_38 (399)
4 case77 <NA>
I want to separte the vec
vector into two variables, namely: txt
and num
. I am preferring to use tidyr
in this way,
df |> tidyr::separate_wider_regex(vec,
c(txt = "[A-Za-z]+", num = "\\d+"),
too_few = "align_start")
# A tibble: 4 × 3
id txt num
<chr> <chr> <chr>
1 case1 One NA
2 case19 tWo NA
3 case88 Three NA
4 case77 NA NA
However, it is not what I want. I have the following expection:
id txt num
1 case1 One_20 19
2 case19 tWo_20 290
3 case88 Three_38 399
4 case77 <NA> NA
I am doing mistakes in the regex part. Any help to correct those mistakes so that I can have the expected table as output?
Try
> df %>%
+ separate_wider_regex(vec,
+ c(txt = "\\w+", "\\s+\\(", num = "\\d+","\\)"),
+ too_few = "align_start"
+ )
# A tibble: 4 × 3
id txt num
<chr> <chr> <chr>
1 case1 One_20 19
2 case19 tWo_20 290
3 case88 Three_38 399
4 case77 NA NA