rmovies

Converting Movie Box Office to Numbers


I have a data frame in R with box office number listed like $121.5M and $0.014M and I'd like to convert them to straight numbers. I'm thinking of striping the $ and M and then using basic multiplication. Is there a better way to do this?


Solution

  • You could do this either by matching the non-numeric elements ([^0-9.]*) and replace it by ''

     as.numeric(gsub("[^0-9.]*", '', "$121.5M"))
     #[1] 121.5
    

    Or by specifically matching the $ and M ([$M]) and replace it with ''

     as.numeric(gsub("[$M]", '',"$121.5M"))
     #[1] 121.5
    

    Update

    If you have a vector like below

    v1 <- c("$1.21M", "$0.5B", "$100K", "$1T", "$0.9P", "$1.5K") 
    

    Create another vector with the numbers and set the names with the corresponding abbrevations

    v2 <- setNames(c(1e3, 1e6, 1e9, 1e12, 1e15), c('K', 'M', 'B', 'T', 'P'))
    

    Use that as index to replace the abbrevation and multiply it with the numeric part of the vector.

     as.numeric(gsub("[^0-9.]*", '',v1))* v2[sub('[^A-Z]*', '', v1)]