rdatabasemutate

mutate columns in R that start with a particular letter - converting character to numeric


I have a dataset with many columns.

Most of them are numeric but some of them are characters. In particular, I have a character column "names" that contains string elements (names of people) and some other character columns the names of which start with "w33" (f0r example "w33094", "w33194", etc) and with "w35".

These columns that start with "w33" and "w35" contain numbers as characters or empty cells (NAs).

I want to convert to numeric only the columns that start with "w33" and "w35".

I want to use dplyr and I tried mutate in R:

mutate(across(c(where(is.character), -c("names")), as.numeric))

but with no success. Is there a correct way to do that?


Solution

  • You can use the helper function matches, which uses a regular expression to select the desired columns:

    library(dplyr)
    library(tibble)
    
    set.seed(20240916)
    d <- as_tibble(matrix(rnorm(1000), nrow = 10), .name_repair = "minimal")
    names(d) <- paste0("w", 1:100)
    
    ## make w35 and w33 character
    d <- d %>% mutate(across(c(w33, w35), as.character))
    str(d %>% select(30:40)) ## columns w33 and w35 are indeed character
    # tibble [10 × 11] (S3: tbl_df/tbl/data.frame)
    #  $ w30: num [1:10] 1.8 -1.351 -0.909 -1.152 1.757 ...
    #  $ w31: num [1:10] 1.905 -0.0867 1.4317 -0.3107 -1.0555 ...
    #  $ w32: num [1:10] -0.4768 -0.0128 0.6862 1.1219 -0.1504 ...
    #  $ w33: chr [1:10] "-1.85680860102702" "-0.880885732095891" "0.768542254181911" "1.16839416879203" ...
    #  $ w34: num [1:10] 2.236 -0.521 0.784 1.428 -0.224 ...
    #  $ w35: chr [1:10] "-0.655761219546925" "-0.368498485014586" "-0.624157310474289" "1.02168903165015" ...
    #  $ w36: num [1:10] 1.3655 1.156 -0.0905 -0.9582 -1.0596 ...
    #  $ w37: num [1:10] 0.913 0.228 -0.244 0.791 -0.575 ...
    #  $ w38: num [1:10] -0.299 0.238 0.891 0.859 -1.849 ...
    #  $ w39: num [1:10] 0.36 -1.097 0.857 1.035 0.926 ...
    #  $ w40: num [1:10] -1.448 1.01 1.733 -1.833 0.848 ...
    
    ## use matches to select only columns whihc start with w33 or w35
    dn <- d %>% mutate(across(matches("^w3[35]"), as.numeric))
    str(dn %>% select(30:40))
    # tibble [10 × 11] (S3: tbl_df/tbl/data.frame)
    #  $ w30: num [1:10] 1.8 -1.351 -0.909 -1.152 1.757 ...
    #  $ w31: num [1:10] 1.905 -0.0867 1.4317 -0.3107 -1.0555 ...
    #  $ w32: num [1:10] -0.4768 -0.0128 0.6862 1.1219 -0.1504 ...
    #  $ w33: num [1:10] -1.8568 -0.8809 0.7685 1.1684 0.0127 ...
    #  $ w34: num [1:10] 2.236 -0.521 0.784 1.428 -0.224 ...
    #  $ w35: num [1:10] -0.656 -0.368 -0.624 1.022 -0.498 ...
    #  $ w36: num [1:10] 1.3655 1.156 -0.0905 -0.9582 -1.0596 ...
    #  $ w37: num [1:10] 0.913 0.228 -0.244 0.791 -0.575 ...
    #  $ w38: num [1:10] -0.299 0.238 0.891 0.859 -1.849 ...
    #  $ w39: num [1:10] 0.36 -1.097 0.857 1.035 0.926 ...
    #  $ w40: num [1:10] -1.448 1.01 1.733 -1.833 0.848 ...