I have a dataset in R
that contains a column that contains a string that I want to separate into multiple columns using separate_wider_delim from the tidyr
package.
What I want to do is to pad the column names so that they will always result in a number that is 2 digits (ie. '01' instead of '1').
However, the resulting dataframe had the new column names ending in a single digit number.
Does anyone know how to implement number padding in separate_wider_delim?
Below is an example code to demonstrate what I currently am trying and my desired output.
Code:
library(tidyr)
#data
df <- data.frame(Group = c("A","B","C"),
fruits_selected = c(
"'Apple'+'Banana'+'Cherry'",
"'Peach'+'Banana'+'Apple'",
"'Orange'+'Banana'+'Cherry'")
)
#Separate the vectors in the "fruits_selected" column into multiple columns
df2 <- df %>%
separate_wider_delim(fruits_selected, delim="+", names_sep = "_")
Current output:
#Current output of the result
print(df2)
#> Group fruits_selected_1 fruits_selected_2 fruits_selected_3
#> <chr> <chr> <chr> <chr>
#> 1 A 'Apple' 'Banana' 'Cherry'
#> 2 B 'Peach' 'Banana' 'Apple'
#> 3 C 'Orange' 'Banana' 'Cherry'
Desired Output:
print(df2)
#> Group fruits_selected_01 fruits_selected_02 fruits_selected_03
#> <chr> <chr> <chr> <chr>
#> 1 A 'Apple' 'Banana' 'Cherry'
#> 2 B 'Peach' 'Banana' 'Apple'
#> 3 C 'Orange' 'Banana' 'Cherry'
Thank you so much for your assistance!
You could use the names_repair
argument of tidyr::separate_wider_delim()
along with a little regular expression magic.
In this example, sub()
is doing a single find and replace for each column name. It is looking for the pattern fruits_selected_(\\d)
where ()
denotes a "capture group" and \\d
is a single digit [0-9]
. If this pattern is found, it is replaced by fruits_selected_0\\1
where \\1
indicates to use whatever was matched in the first (and only in this example) capture group.
library(tidyr)
data.frame(
Group = c("A","B","C"),
fruits_selected = c(
"'Apple'+'Banana'+'Cherry'",
"'Peach'+'Banana'+'Apple'",
"'Orange'+'Banana'+'Cherry'"
)
) %>%
separate_wider_delim(
fruits_selected,
delim = "+",
names_sep = "_",
names_repair = ~ sub("fruits_selected_(\\d)", "fruits_selected_0\\1", .)
)
#> # A tibble: 3 × 4
#> Group fruits_selected_01 fruits_selected_02 fruits_selected_03
#> <chr> <chr> <chr> <chr>
#> 1 A 'Apple' 'Banana' 'Cherry'
#> 2 B 'Peach' 'Banana' 'Apple'
#> 3 C 'Orange' 'Banana' 'Cherry'
Created on 2024-07-12 with reprex v2.1.0.9000
Reprex files hosted with on GitHub