rtimestrptimedifftime

What is the most efficient way to calculate difftime of two time which are string format


I have a large data set (as a data.table,more than 100 million rows) and wish to calculate the difftime between two column.and the time string format if "%H%M%OS",without leading zeros.eg,data

ID  time0      time1
 1  93005220  93005320
 2  93551060  93553940
 3  93717740  93717740
 4  94840800  94840800
 5  94959510  94959510
 6 101030460 101127870
 7 101425010 101425010
 8 104728320 104728980
 9 105704050 105705890
10 135109830 135402830
11 93005220 150001330

Take the first row as example,I can not use difftime,strptime to calculate the difftime between "93005320"("09:30:05.320") and "93005220"("09:30:05.220"),I have to change "93005220" to the format like "093005.220",then I tried as:

difftime(strptime("093005.220","%H%M%OS"),strptime("093005.320","%H%M%OS"))

But if do like this, is too slow for more than 100 million rows. Is there an efficient way to do this in R?


Solution

  • One approach is to consider that each digit contains information about how many seconds have passed, but some are expressed differently (i.e hours, minutes). If all units of the digit were seconds, we could compare them directly using -.

    f = function(x){
      c(36000, 3600, 600, 60, 10, 1, .1, .01, .001) *
        x %% c(1e9, 1e8, 1e7, 1e6, 1e5, 1e4, 1e3, 1e2, 1e1) %/%
        c(1e8, 1e7, 1e6, 1e5, 1e4, 1e3, 1e2, 1e1, 1e0)
    }
    ff = Vectorize(function(x, y) sum(f(x) - f(y)))
    

    To then get the difference in seconds, here using dplyr:

    df %>%
        mutate(diff = ff(time1, time0))
           time0     time1     diff
           <dbl>     <dbl>    <dbl>
     1  93005220  93005320     0.1 
     2  93551060  93553940     2.88
     3  93717740  93717740     0   
     4  94840800  94840800     0   
     5  94959510  94959510     0   
     6 101030460 101127870    57.4 
     7 101425010 101425010     0   
     8 104728320 104728980     0.66
     9 105704050 105705890     1.84
    10 135109830 135402830   173   
    11  93005220 150001330 19796.
    

    Explanation: Using Anirban's approach, f transforms an input integer into a vector of its components (padding with 0 where necessary). These components are then converted to seconds by multiplication, a vectorized operation. That is, c(1,2) * c(3,4) evaluates to c(3, 8).