rggplot2plotlyggplotlyparallel-coordinates

Compare data on hover only shows tooltips for one point per colour (in a ggplotly plot with a both a colour and a group aesthetic)


enter image description here

The image is of a simple parallel coordinates plot of 9 records from the iris dataset, created in ggplot2 and flipped to plotly via ggplotly. The plot is coloured by Species and grouped by an observation ID (so that each observation has its own line series), and the custom tooltip text attribute contains this ID (but in reality could contain all sorts of information).

In "Show closest data on hover" mode, all 9 tooltips are available for individual selection. But in "Compare data on hover" mode (illustrated), only one point per colour is shown with a tooltip, and specifically always the last observation in each colour group (3, 6 and 9).

My expectation/experience is that "Compare data on hover" mode displays the tooltips for all data points in the plot that share an x coordinate (and this is the behaviour I want). However, this expectation is clearly wrong in this case. I surmise that this has to do with the presence of two "grouping" aesthetics (colour and group) in the ggplot2 call, and their translation by ggplotly into a plotly object, but I don't have the knowledge to go further and searching has drawn a blank.

Code to reproduce this example is below. I'd be grateful for any explanation of the observed behaviour, and ideally a solution or workaround for generating the desired behaviour instead.

# data: first 3 rows of each iris species, add observation ID, reshape to long
library(data.table)
df <- data.table(iris)[, head(.SD, 3), by=Species][, ID := seq(.N)] |> melt(id.vars=c("ID", "Species")) |> as.data.frame()

# manual parallel coordinates plot in ggplot2
library(ggplot2)
gg <- ggplot(df, aes(x=variable, y=value, colour=Species, group=ID)) +
  geom_point(aes(text=ID)) + geom_line()

# flip to plotly
library(plotly)
gg |> ggplotly(tooltip="text")

Solution

  • enter image description here

    TL;DR

    It turns out that a ggplotly() plotly chart only ever shows one tooltip per colour in "Compare data on hover" mode, since it is the colour aesthetic that determines the number of traces, and at most one tooltip is shown per trace in this mode. I explain this regular behaviour below (which I struggled to find documented in one place) and show a workaround for the case in the question.

    Explanation of the observed behaviour

    After experimenting with various ggplot() calls and inspecting the resulting plotly object visually and with plotly_json(), the behaviour is consistent with the following explanation of a) how ggplotly translates ggplot2 aesthetics and b) how plotly objects display tooltips:

    1. the ggplot2 colour aesthetic translates to a distinct plotly trace per colour
    2. the ggplot2 group aesthetic does not translate to distinct traces, but instead, if the trace is a line (type: 'scatter', mode: 'lines'), a (NULL, NULL) coordinate is inserted between groups in order to prevent a line segment from connecting data from different groups
    3. "Compare data on hover" displays one tooltip per trace (as in fact clearly stated here, albeit in a Python context), apparently always that of the last point in the trace that has the relevant x-value

    For a very simple illustration of (3), consider this scatterplot of four points with IDs A,B,C,E, where B and C share the same x-value of 2. There is only one trace and "Compare data on hover" at x=2 displays only the tooltip for point C.

    df <- data.frame(x = c(1,2,2,3),
                     y = c(2,1,3,2),
                     ID = LETTERS[1:4])
    (ggplot(df, aes(x=x, y=y)) + geom_point(aes(text=ID))) |> ggplotly(tooltip="text")
    

    In the parallel-coordinates example used in the question:

    In other words, despite the appearance of 9 series in the plot (one for each logical observation), there are in fact only 3 (pairs of) traces, and only one tooltip per marker trace is shown using "Compare data on hover", consistent with the explanation arrived at above.

    Workaround to achieve the desired behaviour

    One workaround is therefore to give each logical observation its own trace by colouring by ID rather than Species. To maintain the visual colouring by species, we need to map the Species palette (n=3) to a palette applied to the distinct IDs (n=9). The result is shown at the top of this answer and the code is as follows:

    # map species colours to IDs
    species_pal <- scales::hue_pal()(length(unique(df$Species)))
    ID_pal <- species_pal[as.numeric(unique(df[c("Species", "ID")])$Species)]
      
    # generate the plot
    library(ggplot2)
    library(plotly)
    ( ggplot(df, aes(
        x = variable,
        y = value,
        colour = as.factor(ID),                # now colouring by observation ID
        group = ID
      )) +
        geom_point(aes(text = ID)) +
        geom_line() +
        scale_colour_manual(values = ID_pal) + # species colours mapped to IDs
        theme(legend.position = "none")        # suppress legend :(
    ) |> ggplotly(tooltip = "text")
    

    It isn't a perfect workaround because we have to lose the legend (which would display 9 keys if shown), but it does ensure that all tooltips are visible in "Compare data on hover" mode. (In the use case that sparked the question, which is EDA of multivariate datasets, the grouping variable is always based on statistical clustering, so the legend does not provide any extra semantic information anyway. The interesting information is the identity and attributes of each observation, contained in the tooltips.)