I have data from hard copy papers that must be first manually inputted in excel files to be later processed in R. The data contain multiple readings of a variable (Reading) for different subjects (by ID) at different time points (e.g. 08:10, 08:20 etc.) on different dates (18.08.2014, 19.08.2014). A reference starting time (e.g., 08:00) and reference starting date (e.g., 18.08.2014) for each reading series is available.
The excel file that contains the data will look like this
ID Reading Date Time Ref/Start Time Ref/Start Date
1 12.1 18.08.2014 7:59 8:00 18.08.2014
1 26.34 18.08.2014 8:10 8:00 18.08.2014
1 35.2 18.08.2014 8:20 8:00 18.08.2014
1 30 18.08.2014 8:30 8:00 18.08.2014
1 12 19.08.2014 8:00 8:00 18.08.2014
1 13 19.08.2014 20:00 8:00 18.08.2014
1 12 20.08.2014 8:00 8:00 18.08.2014
These data have to be later processed in R. My goal is to generate a new column that contains the time of each reading in hours after the starting time point of each reading series. so to say to get (y) versus (x) where (x) is the time in hours from the start. I imported this excel file to R now (saved previously as .csv), but I don't know how should I proceed now to generate the new column in R! should I even insert data in another way in excel first?
I hope that I was successful in clarifying what I need and that I can find a helping hand from somebody.
Many thanks in advance.
There are many ways to accomplish this. Here is one.
Assuming that you have the data in a csv file called time_d.csv
you can do this:
time_d.csv looks like this:
ID Reading Date Time Ref_time Ref_date
1 12.1 18.08.2014 07:59 08:00 18.08.2014
1 26.34 18.08.2014 08:10 08:00 18.08.2014
1 35.2 18.08.2014 08:20 08:00 18.08.2014
1 30 18.08.2014 08:30 08:00 18.08.2014
1 12 19.08.2014 08:00 08:00 18.08.2014
1 13 19.08.2014 20:00 08:00 18.08.2014
1 12 20.08.2014 08:00 08:00 18.08.2014
You can see that I have changed the column headings slightly. Then, with the .csv in this format you can do this:
a1=read.csv("time_d.csv") #reads data into R data frame
a1$date_read=paste(a1$Date, a1$Time, sep=" ") #adds a new col to data frame
#by merging two existing cols
a1$date_ref=paste(a1$Ref_date, a1$Ref_time, sep=" ") #adds new col
a1=subset(a1,select=-c(Date,Time)) #removes the no longer needed cols
a1=subset(a1,select=-c(Ref_date,Ref_time)) #removes the no longer needed cols
a1$date_read=as.POSIXct(strptime(a1$date_read,"%d.%m.%Y %H:%M" )) #convert
#to date/time objects
a1$date_ref=as.POSIXct(strptime(a1$date_ref,"%d.%m.%Y %H:%M" ))
a1$Duration=difftime(a1$date_read,a1$date_ref, units="hours") #adds new col
#calculating the time difference in hours
For your particular data the format of date is important for this line:
as.POSIXct(strptime(a1$date_read,"%d.%m.%Y %H:%M" ))
If you change the format of date, then you should change this line code in R, also.
The end result looks like this:
ID Reading date_read date_ref Duration
1 1 12.10 2014-08-18 07:59:00 2014-08-18 08:00:00 -0.01666667 hours
2 1 26.34 2014-08-18 08:10:00 2014-08-18 08:00:00 0.16666667 hours
3 1 35.20 2014-08-18 08:20:00 2014-08-18 08:00:00 0.33333333 hours
4 1 30.00 2014-08-18 08:30:00 2014-08-18 08:00:00 0.50000000 hours
5 1 12.00 2014-08-19 08:00:00 2014-08-18 08:00:00 24.00000000 hours
6 1 13.00 2014-08-19 20:00:00 2014-08-18 08:00:00 36.00000000 hours
7 1 12.00 2014-08-20 08:00:00 2014-08-18 08:00:00 48.00000000 hours