Process date and time imported from excel in R

I have data from hard copy papers that must be first manually inputted in excel files to be later processed in R. The data contain multiple readings of a variable (Reading) for different subjects (by ID) at different time points (e.g. 08:10, 08:20 etc.) on different dates (18.08.2014, 19.08.2014). A reference starting time (e.g., 08:00) and reference starting date (e.g., 18.08.2014) for each reading series is available.

The excel file that contains the data will look like this

ID  Reading Date        Time    Ref/Start Time  Ref/Start Date
1   12.1    18.08.2014  7:59    8:00            18.08.2014
1   26.34   18.08.2014  8:10    8:00            18.08.2014
1   35.2    18.08.2014  8:20    8:00            18.08.2014
1   30      18.08.2014  8:30    8:00            18.08.2014
1   12      19.08.2014  8:00    8:00            18.08.2014
1   13      19.08.2014  20:00   8:00            18.08.2014
1   12      20.08.2014  8:00    8:00            18.08.2014

These data have to be later processed in R. My goal is to generate a new column that contains the time of each reading in hours after the starting time point of each reading series. so to say to get (y) versus (x) where (x) is the time in hours from the start. I imported this excel file to R now (saved previously as .csv), but I don't know how should I proceed now to generate the new column in R! should I even insert data in another way in excel first?

I hope that I was successful in clarifying what I need and that I can find a helping hand from somebody.

Many thanks in advance.

Solution

There are many ways to accomplish this. Here is one.

Assuming that you have the data in a csv file called time_d.csv you can do this:

time_d.csv looks like this:

ID  Reading Date    Time    Ref_time    Ref_date
1   12.1    18.08.2014  07:59   08:00   18.08.2014
1   26.34   18.08.2014  08:10   08:00   18.08.2014
1   35.2    18.08.2014  08:20   08:00   18.08.2014
1   30      18.08.2014  08:30   08:00   18.08.2014
1   12      19.08.2014  08:00   08:00   18.08.2014
1   13      19.08.2014  20:00   08:00   18.08.2014
1   12      20.08.2014  08:00   08:00   18.08.2014

You can see that I have changed the column headings slightly. Then, with the .csv in this format you can do this:

a1=read.csv("time_d.csv") #reads data into R data frame
a1$date_read=paste(a1$Date, a1$Time, sep=" ") #adds a new col to data frame
#by merging two existing cols

a1$date_ref=paste(a1$Ref_date, a1$Ref_time, sep=" ") #adds new col
a1=subset(a1,select=-c(Date,Time)) #removes the no longer needed cols
a1=subset(a1,select=-c(Ref_date,Ref_time)) #removes the no longer needed cols

a1$date_read=as.POSIXct(strptime(a1$date_read,"%d.%m.%Y %H:%M" )) #convert 
#to date/time objects

a1$date_ref=as.POSIXct(strptime(a1$date_ref,"%d.%m.%Y %H:%M" ))

a1$Duration=difftime(a1$date_read,a1$date_ref, units="hours") #adds new col 
#calculating the time difference in hours

For your particular data the format of date is important for this line: as.POSIXct(strptime(a1$date_read,"%d.%m.%Y %H:%M" )) If you change the format of date, then you should change this line code in R, also.

The end result looks like this:

  ID Reading           date_read            date_ref          Duration
1  1   12.10 2014-08-18 07:59:00 2014-08-18 08:00:00 -0.01666667 hours
2  1   26.34 2014-08-18 08:10:00 2014-08-18 08:00:00  0.16666667 hours
3  1   35.20 2014-08-18 08:20:00 2014-08-18 08:00:00  0.33333333 hours
4  1   30.00 2014-08-18 08:30:00 2014-08-18 08:00:00  0.50000000 hours
5  1   12.00 2014-08-19 08:00:00 2014-08-18 08:00:00 24.00000000 hours
6  1   13.00 2014-08-19 20:00:00 2014-08-18 08:00:00 36.00000000 hours
7  1   12.00 2014-08-20 08:00:00 2014-08-18 08:00:00 48.00000000 hours