rnumbersletters-and-numbers

Letters to numbers in data frame in R


I have found the below code. It is working nicely but a bit more error prone when you have the full alphabet involved.

ID = c(1,2,3)
POS1 = c('AG','GC','TT')
POS2 = c('GT','CC','TC')
POS3 = c('GG','CT','AT')
DF = data.frame(ID,POS1,POS2,POS3)
DF$POS1X <- chartr('ACGT','1234',DF$POS1)

but looking for something that won't require typing all letters and numbers into the code? Let's use the same data frame and what I am after is a loop that will covert "a" into 1, "b" into 2 etc...

Update: I have tried below in order not to create another column and just modify existing POS1. I did not work thou.

ID = c(1,2,3)
POS1 = c('AG','GC','TT')
POS2 = c('GT','CC','TC')
POS3 = c('GG','CT','AT')
DF = data.frame(ID,POS1,POS2,POS3)

just changing factor to character for POS1

DF$POS1  <- as.character(as.factor(DF$POS1))

map<-data.frame(LETTERS,as.character(1:26))
names(map)<-c("letters","numbers")

let2nums <- function(string){
  splitme <- unlist(strsplit(string,""))
  returnme <- as.integer(map[map$letters %in% splitme,]$numbers)
  return(as.numeric(returnme))
}

DF$POS1 <- mapply(let2nums, DF$POS1)

The oucome is rather interesing :) any idea why?


Solution

  • If you're really looking to process it through a loop as you said, you can do something like this.

    for(i in 1:nrow(DF))
    {
      DF$POS1X[i] <- paste(match(strsplit(toupper(DF$POS1[i]), "")[[1]], LETTERS), collapse = "")
    }
    

    You could alternatively apply this as a function using mapply as below.

    letter.to.number <- function(x)
    {
      num <- paste(match(strsplit(toupper(x), "")[[1]],LETTERS), collapse = "")
      return(num)
    }
    
    DF$POS1X <- mapply(letter.to.number, DF$POS1)