rstringzero-padding

Need to pad numbers inside a semicolon separated vector in r


I have the following dataframe, and I need to manipulate column a to get to column a_clean:

df=data.frame(a=c("1234-12;23456-123","12345-1234",NA,"1234-013;1234-014"),a_clean=c("01234-0012;23456-0123","12345-1234",NA,"1234-0013;1234-0014"))

I need to pad the numbers before the hyphen so it's five digits and after the hyphen so it's 4 digits.

I don't want to separate a to different rows, and then concat back together. My dataframe is very big and I want to do the string manipulation as fast as possible.


Solution

  • A base R solution, using strsplit to get the ; separated, then gsub to access the - strings, replaceing the NAs, finally unsing paste with Map to construct the result.

    data.frame(df, a_clean_new = unlist(Map(paste, collapse=";", 
      lapply(strsplit(df$a, ";"), function(x){
        res <- paste0(sprintf("%05d", as.numeric(gsub("-.*", "", x))), "-", 
                 sprintf("%04d", as.numeric(gsub(".*-", "", x))))
        replace(res, grep("NA", res), NA)}))))
                      a               a_clean           a_clean_new
    1 1234-12;23456-123 01234-0012;23456-0123 01234-0012;23456-0123
    2        12345-1234            12345-1234            12345-1234
    3              <NA>                  <NA>                    NA
    4 1234-013;1234-014   1234-0013;1234-0014 01234-0013;01234-0014