I have the following dataframe, and I need to manipulate column a to get to column a_clean:
df=data.frame(a=c("1234-12;23456-123","12345-1234",NA,"1234-013;1234-014"),a_clean=c("01234-0012;23456-0123","12345-1234",NA,"1234-0013;1234-0014"))
I need to pad the numbers before the hyphen so it's five digits and after the hyphen so it's 4 digits.
I don't want to separate a to different rows, and then concat back together. My dataframe is very big and I want to do the string manipulation as fast as possible.
A base R solution, using strsplit
to get the ;
separated, then gsub
to access the -
strings, replace
ing the NA
s, finally unsing paste
with Map
to construct the result.
data.frame(df, a_clean_new = unlist(Map(paste, collapse=";",
lapply(strsplit(df$a, ";"), function(x){
res <- paste0(sprintf("%05d", as.numeric(gsub("-.*", "", x))), "-",
sprintf("%04d", as.numeric(gsub(".*-", "", x))))
replace(res, grep("NA", res), NA)}))))
a a_clean a_clean_new
1 1234-12;23456-123 01234-0012;23456-0123 01234-0012;23456-0123
2 12345-1234 12345-1234 12345-1234
3 <NA> <NA> NA
4 1234-013;1234-014 1234-0013;1234-0014 01234-0013;01234-0014