rggplot2dplyrtimeserieschartgroup-summaries

calculate and plot time interval means


I would like to calculate and plot changing numbers of differently colored animals over time using dplyr and ggplot2.

I have observations of different animals on random dates and so I would first like to group those observations into 4-day brackets and then calculate mean color for each 4-day bracket. I created the column Bracket.mean with a gimmick result for the first few just to show what I have in mind. I would like to add those means in the same data frame (as opposed to creating a new data.frame or vectors) for a later analysis and plotting, if possible.

And for the plot I’m hoping to show the bracket means with some measure of variance around it (SD or boxplots) as well as the daily observations (perhaps a faded overlay of the observations in the background) over time.

Below is a part of the dataset I'm using (with a made up 'Bracket.mean' column I’m hoping to calulcate). 'Count' is the number of animals on a given 'Date' of a specific 'Color'.

    Date    Julian  Count   Color   Bracket.color
4/19/16 110 1   50  mean of 4/19-4/22
4/19/16 110 1   50  mean of 4/19-4/22
4/19/16 110 1   100 mean of 4/19-4/22
4/20/16 111 4   50  mean of 4/19-4/22
4/20/16 111 1   0   mean of 4/19-4/22
4/20/16 111 2   100 mean of 4/19-4/22
4/20/16 111 1   50  mean of 4/19-4/22
4/20/16 111 2   100 mean of 4/19-4/22
4/21/16 112 1   100 mean of 4/19-4/22
4/21/16 112 2   50  mean of 4/19-4/22
4/21/16 112 4   50  mean of 4/19-4/22
4/21/16 112 1   100 mean of 4/19-4/22
4/21/16 112 2   50  mean of 4/19-4/22
4/21/16 112 1   0   mean of 4/19-4/22
4/22/16 113 2   0   mean of 4/19-4/22
4/22/16 113 4   50  mean of 4/23-4/26
4/23/16 114 6   0   mean of 4/23-4/26
4/23/16 114 1   50  mean of 4/23-4/26
4/24/16 115 2   0   mean of 4/23-4/26
4/26/16 117 5   0   mean of 4/23-4/26
4/30/16 121 1   50  
5/2/16  123 1   NA  
5/2/16  123 1   50  
5/7/16  128 2   0   
5/7/16  128 3   0   
5/7/16  128 3   0   
5/8/16  129 4   0   
5/8/16  129 1   0   
5/10/16 131 1   50  
5/10/16 131 4   50  
5/12/16 133 1   0   
5/13/16 134 1   50  
5/14/16 135 1   0   
5/14/16 135 2   50  
5/14/16 135 2   0   
5/14/16 135 1   0   
5/17/16 138 1   0   
5/17/16 138 2   0   
5/23/16 144 1   0   
5/24/16 145 4   0   
5/24/16 145 1   0   
5/24/16 145 1   0   
5/27/16 148 3   NA  
5/27/16 148 1   0   
5/27/16 148 1   50  

Any help would be greatly appreciated. Thanks very much in advance!


Solution

  • Something like this should get you started.

    library(dplyr)
    df <- df %>% mutate(Date = as.Date(Date, format='%m/%d/%y'),
                        Start = as.Date(cut(Date, breaks= seq(min(Date), max(Date)+4, by = 4)))) %>%
        mutate(End = Start+3) %>%
        group_by(Start,End) %>%
        summarise(meanColor = mean(Color, na.rm=T),
                  sdColor = sd(Color, na.rm=T))
    df
    #Source: local data frame [10 x 4]
    #Groups: Start [?]
    #        Start        End meanColor  sdColor
    #        <date>     <date>     <dbl>    <dbl>
    #1  2016-04-19 2016-04-22  56.25000 35.93976
    #2  2016-04-23 2016-04-26  12.50000 25.00000
    #3  2016-04-27 2016-04-30  50.00000       NA
    #4  2016-05-01 2016-05-04  50.00000       NA
    #5  2016-05-05 2016-05-08   0.00000  0.00000
    #6  2016-05-09 2016-05-12  33.33333 28.86751
    #7  2016-05-13 2016-05-16  20.00000 27.38613
    #8  2016-05-17 2016-05-20   0.00000  0.00000
    #9  2016-05-21 2016-05-24   0.00000  0.00000
    #10 2016-05-25 2016-05-28  25.00000 35.35534
    

    Then plot using,

    library(ggplot)
    ggplot(df) + geom_line(aes(Start,meanColor))