rdata.tablegsub2-digit-year

gsub for dealing with dates in r in data


I am dealing with very large data set of university students where dates are in the form

%d/%m/%y

I need to work out ages.

My data looks something like this as it was pulled from a database:

data <- data.table(DOB=c("12/12/01", "8/05/80", "2/11/99"), 
                  started =c("5/10/10", "4/01/12", "27/08/11"))

The problem is that for calculating ages the whole year is not specified.

I have tried changing the years to numeric:

data$DOB<-as.Date(data$DOB, "%d/%m/%y")
data$start<-as.Date(data$start, "%d/%m/%y")
data$DOB<-as.numeric(format(data$DOB,"%Y"))
data$start<-as.numeric(format(data$start,"%Y"))
data$age<-data$start-data$dob

Obviously this does not work as I need to add in the 20 and 19.

Is there a way I can use gsub to put a '20' in front of all the where the dob is less than or equal to 15 and a '19' in front of all the dob is more than 15.

I don't think there are any 85 year olds in my dataset.


Solution

  • And a similar approach using the substr and nchar functions of base R.

    library(data.table)
    
    dt <-data.table(DOB=c("12/12/01", "8/05/80", "2/11/99"), 
                    started =c("5/10/10", "4/01/12", "27/08/11"))
    
    dt
    
    #         DOB  started
    # 1: 12/12/01  5/10/10
    # 2:  8/05/80  4/01/12
    # 3:  2/11/99 27/08/11
    
    
    WholeYear = function(x){
    
                v1 = substr(x, 1, nchar(x)-2)
                v2 = substr(x, nchar(x)-1, nchar(x))
    
                ifelse(as.numeric(v2) <= 15, paste0(v1,"20",v2), paste0(v1,"19",v2)) 
    
                            }
    
    
    dt$DOB = sapply(dt$DOB, WholeYear)
    dt$started = sapply(dt$started, WholeYear)
    
    dt
    
    
    #           DOB    started
    # 1: 12/12/2001  5/10/2010
    # 2:  8/05/1980  4/01/2012
    # 3:  2/11/1999 27/08/2011