rdatetimedplyraggregategeometric-mean

Calculating geometric mean every 10 min using dplyr or aggregte function


I am trying to calculate geometric mean of a column every 10 min.

My sample data is..

TimeDate    diam    ratio
2016-05-11 8:25 134.491 1.83074
2016-05-11 8:25 117.777 1.34712
2016-05-11 8:25 104.27  0.927635
2016-05-11 8:25 204.085 1.43079
2016-05-11 8:25 96.8011 0.991716
2016-05-11 8:25 119.152 1.09884
2016-05-11 8:25 113.871 0.932493
2016-05-11 8:26 150.468 0.710525
2016-05-11 8:26 116.576 1.11207
2016-05-11 8:26 192.257 1.61558
2016-05-11 8:26 128.071 0.756608
2016-05-11 8:26 177.667 0.73309
2016-05-11 8:27 97.7377 0.862858
2016-05-11 8:27 98.3195 1.00681
2016-05-11 8:27 91.3603 0.95051
2016-05-11 8:27 152.95  0.842145
2016-05-11 8:27 133.125 1.28365
2016-05-11 8:27 95.2516 0.573588

I've tried it using dplyr function, but the code below does not result in values every 10 min, but one geometric mean value and one geometric sd value.

mydata$TimeDate <- as.POSIXct(strptime(mydata$TimeDate, format = "%Y-%m-%d %H:%M","GMT"))

mydata %>%
    group_by(by10 = cut(TimeDate, breaks="10 min")) %>%
   summarize(Geo.Mean=exp(mean(log(diam))),
          Geo.SD=exp(sd(log(diam)))) 

Data format itself is ok, as aggregate function as below makes nicely, although it does not create geometric mean.

aggregate(mydata["diam"], 
               list(TimeDate=cut(mydata$TimeDate, "10 mins")),
                median, na.rm=T)

Solution

  • One option is use lubridate::floor_date function to create group for every 10 mins round the clock. All data between 20-30 mins will be grouped as 20th mins and so on.

    library(dplyr)
    library(lubridate)
    
    mydata %>% mutate(TimeDate = as.POSIXct(TimeDate, format = "%Y-%m-%d %H:%M")) %>%
      group_by(Diff_10 = floor_date(TimeDate, "10minute")) %>%
      summarise(Geo.Mean=exp(mean(log(diam))),
                Geo.SD=exp(sd(log(diam))))
    
    # # A tibble: 1 x 3
    #     Diff_10             Geo.Mean Geo.SD
    #     <dttm>                 <dbl>  <dbl>
    #   1 2016-05-11 08:20:00      125   1.28
    
    #Result with modified data
    # # A tibble: 6 x 3
    #   Diff_10             Geo.Mean Geo.SD
    #   <dttm>                 <dbl>  <dbl>
    # 1 2016-05-11 08:20:00    118     1.14
    # 2 2016-05-11 08:30:00    141     1.69
    # 3 2016-05-11 08:40:00    127     1.16
    # 4 2016-05-11 08:50:00    150     1.28
    # 5 2016-05-11 09:10:00     98.0   1.00
    # 6 2016-05-11 09:20:00    115     1.29
    

    cut can be used if groups data to be grouped every 10 mins from starting time. In OP, groups will be as 2016-05-11 08:25, 2016-05-11 08:35 and so on.

    Modified OP's data:

    mydata <- read.table(text = 
    "TimeDate    diam    ratio
    '2016-05-11 8:25' 134.491 1.83074
    '2016-05-11 8:25' 117.777 1.34712
    '2016-05-11 8:25' 104.27  0.927635
    '2016-05-11 8:35' 204.085 1.43079
    '2016-05-11 8:35' 96.8011 0.991716
    '2016-05-11 8:42' 119.152 1.09884
    '2016-05-11 8:45' 113.871 0.932493
    '2016-05-11 8:46' 150.468 0.710525
    '2016-05-11 8:56' 116.576 1.11207
    '2016-05-11 8:56' 192.257 1.61558
    '2016-05-11 8:56' 128.071 0.756608
    '2016-05-11 8:59' 177.667 0.73309
    '2016-05-11 9:17' 97.7377 0.862858
    '2016-05-11 9:17' 98.3195 1.00681
    '2016-05-11 9:27' 91.3603 0.95051
    '2016-05-11 9:27' 152.95  0.842145
    '2016-05-11 9:27' 133.125 1.28365
    '2016-05-11 9:27' 95.2516 0.573588",
    header = TRUE, stringsAsFactors = FALSE)