rdatetimeggstatsplot

Auto-Listing comparisons in ggbetweenstats over dates (or datetimes)


I'm measuring some physical measures over time then i would like to plot using violinplots in ggbetweenstats. I want to compare only first group with all followings, without writing, for each graph and for each day, the comparisons list and updating it every day.

My idea was to select all unique dates, create a database with two columns where in first column was earlier date then in the second column all the other dates. Then coerce as a list...but it didn't work as I was hoping.

Here my reproducible example using a flight database.

library(nycflights13)

CarrierList<-unique(flights$carrier)
i=12
a<-flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i]& departureDay<="2013-01-10"& departureDay>="2013-01-02") %>% select(departureDay) %>% unique() %>% arrange(departureDay) %>% slice(1) 
aa<-flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i]& departureDay<="2013-01-10"& departureDay>="2013-01-02") %>% select(departureDay) %>% unique() %>% arrange(departureDay)%>% slice(2:n())

ggbetweenstats(
    data = flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i] & departureDay<="2013-01-10" & departureDay>="2013-01-02"),
    x=departureDay,
    y = arr_delay,
    pairwise.display = "none",
    p.adjust.method = "holm", 
    type = "nonparametric",   
    ggtheme  = jtools::theme_apa()) +
    ggsignif::geom_signif(map_signif_level = c("***"=0.001, "**"=0.01, "*"=0.05),
                          comparisons = list(data.frame(a$departureDay,aa$departureDay)
                          ))

This will produce following error output:

    `Computation failed in `stat_signif()`
Caused by error in `mapped_discrete()`:
! Can't convert `x` <data.frame> to <double>.`

What I want to produce is something like the output of the following code:

ggbetweenstats(
    data = flights %>% mutate(departureDay = lubridate::make_date(year, month, day)) %>% dplyr::filter(origin =="JFK" & carrier==CarrierList[i] & departureDay<="2013-01-10" & departureDay>="2013-01-02"),
    x=departureDay,
    y = arr_delay,
    pairwise.display = "none",
    p.adjust.method = "holm", 
    type = "nonparametric",   
    ggtheme  = jtools::theme_apa()) +
    ggsignif::geom_signif(y_position = c(300, 310, 320, 330, 340, 350, 360, 370),
                            map_signif_level = c("***"=0.001, "**"=0.01, "*"=0.05),
                          comparisons = list(c("2013-01-02", "2013-01-03"),
                                             c("2013-01-02", "2013-01-04"),
                                             c("2013-01-02", "2013-01-05"),
                                             c("2013-01-02", "2013-01-06"),
                                             c("2013-01-02", "2013-01-07"),
                                             c("2013-01-02", "2013-01-08"),
                                             c("2013-01-02", "2013-01-09"),
                                             c("2013-01-02", "2013-01-10"))
        )

Desired output

I've tried also another way as follow, but it didn't maintain date format

b<-lapply(
    1:7,
    function(i) c(
        combn(a$departureDay, 2)[2, i],
        combn(aa$departureDay, 2)[1, i]
    )
)

Any hints or suggestions?


Solution

  • I'm a little confused at how you are trying to create your comparison list. I think I would create a filtered data frame of everything you want to plot, and convert the dates to factors:

    library(tidyverse)
    library(ggstatsplot)
    library(ggsignif)
    
    df <- nycflights13::flights %>% 
      mutate(departureDay = lubridate::make_date(year, month, day)) %>% 
      filter(origin =="JFK" & carrier == "9E" & 
             departureDay <= "2013-01-10" & departureDay >= "2013-01-02") %>% 
      select(departureDay, arr_delay) %>% 
      arrange(departureDay)%>% 
      mutate(departureDay = factor(departureDay)) 
    

    Now you can create a list of all the comparisons you want using lapply with a single one-liner rather than repeating all the dplyr code:

    cmp <- lapply(levels(df$departureDay)[-1], \(x) c(levels(df$departureDay)[1],x))
    

    Your plotting code is then much simpler too:

    ggbetweenstats(data = df, x = departureDay, y = arr_delay,
                   pairwise.comparisons = FALSE, ggtheme = jtools::theme_apa()) +
      geom_signif(map_signif_level = c("***" = 0.001, "**" = 0.01, "*" = 0.05),
                  comparisons = cmp, step_increase = 0.05)
    

    enter image description here