I am dealing with very large data set of university students where dates are in the form
%d/%m/%y
I need to work out ages.
My data looks something like this as it was pulled from a database:
data <- data.table(DOB=c("12/12/01", "8/05/80", "2/11/99"),
started =c("5/10/10", "4/01/12", "27/08/11"))
The problem is that for calculating ages the whole year is not specified.
I have tried changing the years to numeric:
data$DOB<-as.Date(data$DOB, "%d/%m/%y")
data$start<-as.Date(data$start, "%d/%m/%y")
data$DOB<-as.numeric(format(data$DOB,"%Y"))
data$start<-as.numeric(format(data$start,"%Y"))
data$age<-data$start-data$dob
Obviously this does not work as I need to add in the 20 and 19.
Is there a way I can use gsub to put a '20' in front of all the where the dob is less than or equal to 15 and a '19' in front of all the dob is more than 15.
I don't think there are any 85 year olds in my dataset.
And a similar approach using the substr
and nchar
functions of base R.
library(data.table)
dt <-data.table(DOB=c("12/12/01", "8/05/80", "2/11/99"),
started =c("5/10/10", "4/01/12", "27/08/11"))
dt
# DOB started
# 1: 12/12/01 5/10/10
# 2: 8/05/80 4/01/12
# 3: 2/11/99 27/08/11
WholeYear = function(x){
v1 = substr(x, 1, nchar(x)-2)
v2 = substr(x, nchar(x)-1, nchar(x))
ifelse(as.numeric(v2) <= 15, paste0(v1,"20",v2), paste0(v1,"19",v2))
}
dt$DOB = sapply(dt$DOB, WholeYear)
dt$started = sapply(dt$started, WholeYear)
dt
# DOB started
# 1: 12/12/2001 5/10/2010
# 2: 8/05/1980 4/01/2012
# 3: 2/11/1999 27/08/2011