I ran into an issue when using Plotly and Dash for retrieving hover data via hovering the cursor over points in a scatter plot. The hover data retrieved from the Dash app seems to contain the same pointNumber and pointIndex for multiple points in the same plot. This makes it impossible to display the correct information associated to a given instance when hovering over the respective point.
Here is a simplified example which can be run in a Jupyter notebook. In the end I will want to display images on hovering.
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
from jupyter_dash import JupyterDash
from dash import dcc, html, Input, Output, no_update
import plotly.express as px
# Loading iris data to pandas dataframe
data = load_iris()
images = data.data
labels = data.target
df = pd.DataFrame(images[:, :2], columns=["feat1", "feat2"])
df["label"] = labels
# Color for each class
color_map = {0: "setosa",
1: "versicolor",
2: "virginica"}
colors = [color_map[l] for l in labels]
df["color"] = colors
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(df)
# Setup plotly scatter plot
fig = px.scatter(df, x="feat1", y="feat2", color="color")
fig.update_traces(hoverinfo="none",
hovertemplate=None)
# Setup Dash
app = JupyterDash(__name__)
app.layout = html.Div(className="container",
children=[dcc.Graph(id="graph-5", figure=fig, clear_on_unhover=True),
dcc.Tooltip(id="graph-tooltip-5", direction="bottom")])
@app.callback(Output("graph-tooltip-5", "show"),
Output("graph-tooltip-5", "bbox"),
Output("graph-tooltip-5", "children"),
Input("graph-5", "hoverData"))
def display_hover(hoverData):
if hoverData is None:
return False, no_update, no_update
print(hoverData)
hover_data = hoverData["points"][0]
bbox = hover_data["bbox"]
num = hover_data["pointNumber"]
children = [html.Div([html.Img(style={"height": "50px",
"width": "50px",
"display": "block",
"margin": "0 auto"}),
html.P("Feat1: {}".format(str(df.loc[num]["feat1"]))),
html.P("Feat2: {}".format(str(df.loc[num]["feat2"])))])]
return True, bbox, children
if __name__ == "__main__":
app.run_server(mode="inline", debug=True)
The problem can be observed for example with the following two instances retrieved via print(df):
index feat1 feat2 label color
31 5.4 3.4 0 setosa
131 7.9 3.8 2 virginica
Both are assigned the same pointNumber and pointIndex retrieved via print(HoverData):
{'points': [{'curveNumber': 2, 'pointNumber': 31, 'pointIndex': 31, 'x': 7.9, 'y': 3.8, 'bbox': {'x0': 1235.5, 'x1': 1241.5, 'y0': 152.13, 'y1': 158.13}}]}
{'points': [{'curveNumber': 0, 'pointNumber': 31, 'pointIndex': 31, 'x': 5.4, 'y': 3.4, 'bbox': {'x0': 481.33, 'x1': 487.33, 'y0': 197.38, 'y1': 203.38}}]}
This is the visualization when hovering over the two instances. The hovering information is wrong for the image on the right side.
Interestingly, the issue resolves when using
fig = px.scatter(df, x="feat1", y="feat2", color="label")
However, this will cause the legend to be displayed in a continuous manner and disable the possibility to selectively visualize instances associated to specific classes in the HTML.
Is this a bug or am I overlooking something? Any help is much appreciated!
It turned out that I incorrectly expected pointNumber
and pointIndex
to be unique. The point numbers and indices are renumbered for each class as soon as a non-numeric column is used as color
parameter in px.scatter()
. Points in the scatterplot can be uniquely identified by combining curveNumber
and one of pointNumber
and pointIndex
.
A potential solution is to generate separate indices for each class and add them to the dataframe:
curve_indices = np.array([np.arange(0, num_samples) for num_samples in np.unique(class_annot, return_counts=True)[1]], dtype="object")
curve_indices = np.concatenate(curve_indices).ravel()
df["curve_index"] = curve_indices
In the callback function the correct indices in the dataframe for each instance can then be identified using
df_index = df[(df.label == curve) & (df.curve_index == num)].index[0]