rr-plotlycrosstalk

My 'recent' line is being hidden and the chart is not reverting to the original state. Is this a bug with bscol?


I'm trying to add some interactivity to my plotly charts using crosstalk (specifically using filter_checkbox and filter_select) and I've run into a bit of snag. I produce my plots firstly through ggplot then I convert it to plotly using ggplot function.

While I can generate the chart fine (and there's plenty of interactivity on the markdown), I have a couple of problems. Firstly, when I wish to filter (either via filter_select or filter_checkbox), the 'recent' data disappears from the chart entirely, and cannot be recovered without refreshing the html. A similar thing happens with the actual data I'm filtering; I cannot revert the chart to its original state without refreshing the page.

Does anyone know why this might be? Copy of my code + data is below.

Below is a snippet of my data (data=historic):

structure(list(date = c("23-03-2019", "23-03-2019", "23-03-2019", 
"23-03-2019", "05-05-2020", "05-05-2020", "05-05-2020", "05-05-2020", 
"17-06-2021", "17-06-2021", "17-06-2021", "17-06-2021"), cumvol = c(0.004, 
0.034, 0.054, 0.057, 0.005, 0.048, 0.068, 0.075, 2.009, 2.029, 
2.049, 2.064), time = structure(c(26457, 26636, 26658, 27216, 
25152, 25614, 25667, 25668, 56966, 57268, 57303, 58986), units = "secs", class = c("hms", 
"difftime")), Year = c("2019", "2019", "2019", "2019", "2020", 
"2020", "2020", "2020", "2021", "2021", "2021", "2021"))

On top of this I plot another line from a separate df (data=recent).

structure(list(date = structure(c(19038, 19038, 19038, 19038), class = "Date"), 
cumvol = c(0.029, 0.034, 0.07, 0.075), time = structure(c(29674, 29674, 29691, 29719), 
class = c("hms", "difftime"), units = "secs")), Year = c("2022", "2022", "2022", "2022"))

I then convert the data to Shared data, create a ggplot using that data, then covert that plot to ggplot as follows (the variable "most_recent" refers the to the most recent entry in the 'recent' dataframe, produced by recent[nrow(recent),]):

sharedhistoric <- SharedData$new(historic, key = ~Date)
sharedrecent <- SharedData$new(recent, key = ~Date)

plot <- ggplot()+geom_line(data=sharedhistoric,aes(x=time, y=cumvol, group=date),color='#BAB0AC', alpha=0.5)+
      geom_line(data=sharedrecent ,aes(x=time, y=cumvol, group=date),size=1.2,color='#E15758')+
      geom_point(data=most_recent, aes(x=time,y=cumvol), color='#E15759',size=3)+geom_hline(yintercept = 0)+  theme(title=element_text(size=12),panel.background = element_rect(fill='white',color='black'),legend.position='right')+
        labs(title = "Vol",subtitle = "Cum Vol so far", x = "Time", y = "Vol")

Finally, I convert the chart to plotly and use the following bcols:

chartyplot <- plotly::ggplotly(plot)
bscols(widths = c(4, 9),
       list(
         crosstalk::filter_checkbox("Year", 
                         label = "Select Year",
                       sharedhistoric, 
                        group = ~Year),
         crosstalk::filter_select("Date", 
                       label = "Date",
                      sharedhistoric, 
                       group = ~Date)
         ), chartyplot)

Thanks for any assistance/advice.


Solution

  • As far as I can tell, there are two effects that contribute to this behavior

    1. (non-)unique keys of the SharedData objects
    2. anything not selected from crosstalk::filter_* is dropped from the plot

    TL;DR: The way to make this work, is by ensuring unique keys, and assigning the different datasets to the same group. Any data not part of a SharedData object is lost as soon as any filter is applied. And we can cheat some data to always remain on the plot by fixing some data via HTML tags.

    1 Keys

    Looking at the keys section of the crosstalk documentation, the keys should be unique within the dataset. Therefore date may not be a good choice for this in the given dataset. Instead, we can simply create keys based on the row number (which is also the default behavior when no key is supplied)

    sharedhistoric <- SharedData$new(historic %>% mutate(key = as.character(row_number())), key = ~key)
    sharedrecent <- SharedData$new(recent %>% mutate(key = as.character(row_number())), key = ~key)
    

    ...but now only the "recent" data shows up (for year 2019, which should select rows 1-4 of the "historic" data) (left plot). Switching the order of the geom_line statements (first "recent", then "historic") leads to correct behavior for the "historic" data, but the "recent" is gone (right plot). This essentially means, that the filtered keys are only applied to the last SharedData object added to ggplot.

    enter image description here

    The next step to making interactive use of both datasets, is assigning them to the same group, which per documentation can be used to link multiple instances of datasets.

    sharedhistoric <- SharedData$new(..., key = ~key, group = "mydata")
    sharedrecent <- SharedData$new(..., key = ~key, group = "mydata")
    

    This already looks better and we keep filtering and recovering both "historic", and "recent" data - but 2019 is now associated to the first 4 rows of both datasets (because of duplicate keys):

    enter image description here

    A possible workaround is to globally define unique keys and assign the sub-datasets to the same group:

    historic <- as.data.frame(historic) %>%
      dplyr::mutate(date = as.character(date), 
                    set = "historic")
    recent <- as.data.frame(recent) %>%
      dplyr::mutate(date = as.character(date), 
                    set = "recent")
    all <- bind_rows(historic, recent) %>%
      dplyr::mutate(key = as.character(row_number()))
    
    sharedall <- SharedData$new(all, key = ~key, group = "mydata")
    sharedhistoric <- SharedData$new(all %>% dplyr::filter(set == "historic"), key = ~key, group = "mydata")
    sharedrecent <- SharedData$new(all %>% dplyr::filter(set == "recent"), key = ~key, group = "mydata")
    
    #--- no changes to the plot ---
    # but choose filter options from full dataset
    
    bscols(widths = c(4, 8),
           list(
             crosstalk::filter_checkbox("id_year", 
                                        label = "Select Year",
                                        sharedall, 
                                        group = ~Year),
             crosstalk::filter_select("id_date", 
                                      label = "Date",
                                      sharedall, 
                                      group = ~date)
           ),
           chartyplot)
    

    Now we can correctly (de)select the historic and recent data.

    enter image description here

    2 "Lost" Data points

    If the previous example had worked with filter selection only based on "historic" data, then the "recent" data (from a different year) appears to not be recoverable after the first filter is selected. Similarly for the "most_recent" data point, which is not a SharedData object. This data point is plotted initially, but is dropped as soon as the first filter is set.

    However, this can be adressed the same way, by defining a "most recent" SharedData object belonging to the same group:

    sharedmostrecent <- SharedData$new(all %>% tail(1), key = ~key, group = "mydata")
    
    #plot adjustment
    ...
    geom_line(...) +
    geom_point(data=sharedmostrecent, aes(x=time,y=cumvol), color='#E15759',size=3) +
    ...
    

    enter image description here

    With this, we can select and deselect all data in the plot, without loosing any.

    3 Fix some data

    In order to ensure the "recent" data to stick around regardless of (manual filter values), we can manipulate the HTML output. First we separate the datasets by not filtering by year, but by recent/historic & year (or any other suitable subsets of the data):

    all <- bind_rows(historic, recent) %>%
      dplyr::mutate(key = as.character(row_number()),
                    dataset = paste0(set, " ", Year))
    ...
    out <- bscols(widths = c(4, 8),
           list(
             crosstalk::filter_checkbox("id_year", 
                                        label = "Select dataset",
                                        sharedall, 
                                        group = ~dataset)
           ),
           chartyplot) 
    

    Then we check and disable the checkbox for our recent dataset - this can probably be done more elegantly, but it works in my Rmd:

    library(htmltools)
    out_tags <- htmltools::renderTags(out)
    out_tags$html <- stringr::str_replace(
      out_tags$html, 
      '<input type="checkbox" name="id_year" value="recent 2022"/>',
      '<input type="checkbox" name="id_year" value="recent 2022" disabled="disabled" checked="checked"/>'
      )
    out_tags$html <- HTML(out_tags$html)
    as.tags(out_tags)
    

    enter image description here