I have the following function that has two actions: 1) takes a column character and converts all characters to lowercase, 2) removes any special characters that may be present in the column.
clean_string<-function(data,variable){
data <- data |> dplyr::mutate({{variable}} := tolower({{variable}}))
x <- data |> dplyr::mutate({{variable}} := gsub("[^a-z]", "", {{variable}}))
return (x)
}
Here you have some dummy data to test it:
var_1<-rep(c("A","B%","C v","B","A","C"),10)
var_2<-rep(c("VAron","v Aron","muJER","Muj3er"),15)
var_3<-c(rep(c("1","0"),10),rep("0",5),rep(c("0","1","0"),10),rep("1",5) )
dat<-data.frame(var_1,var_2,var_3)
D.D_clean_1<- dat |> clean_string( variable = var_1) |> clean_string( variable = var_2)
Now I want to be able to use this function with several columns. So I tried to pass as an argument a vector with the name of several columns. First I tried to use the quasiquotation inside a loop:
clean_string<-function(data,variables){
data.table::setDT(data)
for (j in variables){
data <- data |> dplyr::mutate({{j}} := tolower({{j}}))
x <- data |> dplyr::mutate({{j}} := gsub("[^a-z]", "", {{j}}))
}
return (x)
}
But it does not work since I cannot produce a vector with the name of the columns without ""
)
So, then I tried to change the quasiquotation to [[]]
. This supposed me to create a vector with the name of the columns as strings. However I got the following error:
Error: unexpected '[[' in:
" for (j in variables){
data <- data |> dplyr::mutate([["
Why any of my approaches are not working? How should I do it.
Using dplyr::across
and ...
you could do:
library(dplyr, warn = FALSE)
clean_string <- function(data, ...) {
data |>
dplyr::mutate(
dplyr::across(c(...), tolower),
dplyr::across(c(...), ~ gsub("[^a-z]", "", .x))
)
}
dat |>
clean_string(var_1, var_2)
#> var_1 var_2 var_3
#> 1 a varon 1
#> 2 b varon 0
#> 3 cv mujer 1
#> 4 b mujer 0
#> 5 a varon 1
#> 6 c varon 0
#> 7 a mujer 1
#> 8 b mujer 0
#> 9 cv varon 1
#> 10 b varon 0