rlubridatestringrchron

How can I get my time strings in a right format?


I am stuck with converting strings to times. I am aware that there are many topics on Stack regarding converting strings-to-times, however I couldn't fix this problem with the solutions.

Situation I have a file with times like this:

> dput(df$Time[1:50])
c("1744.3", "2327.54", "1718.51", "2312.3200000000002", "1414.16", 
"2046.15", "1442.5", "1912.22", "2303.2199999999998", "2146.3200000000002", 
"1459.02", "1930.15", "1856.23", "2319.15", "1451.05", "25.460000000000036", 
"1453.25", "2309.02", "2342.48", "2322.5300000000002", "2101.5", 
"2026.07", "1245.04", "1945.15", "5.4099999999998545", "1039.5", 
"1731.37", "2058.41", "2030.36", "1814.31", "1338.18", "1858.33", 
"1731.36", "2343.38", "1733.27", "2304.59", "1309.47", "1916.11", 
"1958.3", "1929.54", "1756.4", "1744.23", "1731.26", "1844.47", 
"1353.25", "1958.3", "1746.44", "1857.53", "2047.15", "2327.2199999999998", "1915"
)

In this example, the times should be like this:

"1744.3"   = 17:44:30
"2327.54"  = 23:27:54
"1718.51"  = 17:18:51
"2312.3200000000002" = 23:12:32
...
"25.460000000000036" = 00:25:46 # as you can see, the first two 00 are missing.

"1915" = 19:15:00

However, I tried multiple things (and now I am even stuck with str_replace()). Hopefully some one knows how I can transform this.

What have I tried?

format(df$Time, "%H%M.%S") # Yes I know... 

# So therefore I thought, lets replace the strings to get them in a proper format
# like HH:MM:SS. First step was to replace the "." for a ":" 

str_replace("." , ":", df$Time) # this was leading to "." (don't know why) 

And that was the point that I was so frustrated that I posted it on Stack. Hope that you guys can help me.

Many thanks in advance!


Solution

  • The main problem is the time "25.460000000000036". But I think I found a clear though somewhat verbose solution:

    library(tidyverse)
    
    df %>% 
      mutate(hours = formatC(as.numeric(Time), width = 4, format = "d", flag = "0"),
             seconds = as.numeric(str_extract(Time, "[.].+")) * 100) %>% 
      mutate(Time_new = stringi::stri_datetime_parse(paste0(hours, seconds), format = "HHmm.ss"))
    #> # A tibble: 51 x 4
    #>    Time               hours seconds Time_new           
    #>    <chr>              <chr>   <dbl> <dttm>             
    #>  1 25.460000000000036 0025     46.  2020-02-19 00:25:46  # I changed the order of the times so the weird format is on top
    #>  2 1744.3             1744     30   2020-02-19 17:44:30
    #>  3 2327.54            2327     54   2020-02-19 23:27:54
    #>  4 1718.51            1718     51   2020-02-19 17:18:51
    #>  5 2312.3200000000002 2312     32.  2020-02-19 23:12:32
    #>  6 1414.16            1414     16   2020-02-19 14:14:16
    #>  7 2046.15            2046     15   2020-02-19 20:46:15
    #>  8 1442.5             1442     50   2020-02-19 14:42:50
    #>  9 1912.22            1912     22   2020-02-19 19:12:22
    #> 10 2303.2199999999998 2303     22.0 2020-02-19 23:03:21
    #> # ... with 41 more rows
    

    If you also have times without fractions (i.e., without the dot) you could use this approach:

    normalize_time <- function(t) {
      formatC(as.numeric(t) * 100, width = 6, format = "d", flag = "0")
    }
    
    df %>% 
      mutate(Time_new = as.POSIXct(normalize_time(Time), format = "%H%M%S"))