pythonplotlyplotly-express

Plotly express scatter with facet_col does not use correct colors - what I am doing wrong?


The following code shows first one scatter plot with mixed positive and negative data. Negative data (around -100) on the bottom left, positive data (around +100) on the top right.

plot with -100 and +100 data in one plot

The second plot uses the facet_col feature of the plotly express scatter function to separate positive and negative values in two subplots. Somehow the result seems wrong as the negative data uses the whole color palette from -100 until +100 (red AND blue) . I would have expected that the color of the points in the left subplot would only be red but not blue/red.

plot with -100 and +100 data in two subplots

plotly.__version__= 5.18.0

What am I doing wrong?

import pandas as pd
import plotly.express as px
import numpy as np
import plotly

print("plotly.__version__=", plotly.__version__)

np.random.seed(1)
# configuration for first array
mean0 = np.array([0., 0.])
cov0 = np.array([[1., 0.], [0., 1.]])
size0 = 10000
print("size0=", size0)
# configuration for second array
mean1 = np.array([10., 10.])
cov1 = np.array([[.5, 0.], [0., .5]])
size1 = 100
# build first array
vals0 = np.random.multivariate_normal(mean0, cov0, size0)
# append another column to the right of the array
vals0 = np.append(vals0, [[-1] for x in range(size0)], axis=1)
# fill new column with randomized data (negative values)
vals0[:, 2] = -100.0 + 0.2 * np.random.random(size0)
# build second array
vals1 = np.random.multivariate_normal(mean1, cov1, size1)
# append another column to the right of the array
vals1 = np.append(vals1, [[-1] for x in range(size1)], axis=1)
# fill new column with randomized data (positive values)
vals1[:, 2] = 100.0 - 0.2 * np.random.random(size1)
# combine first and second array
vals2 = np.append(vals0, vals1, axis=0)
# convert numpy array to pandas DataFrame
df = pd.DataFrame(vals2, columns=['x', 'y', 'z'])

df['type'] = df.z.apply(lambda z: 'negative' if z < 0 else "positive")

fig1 = px.scatter(df, x='x', y='y', color='z', color_continuous_scale=["red", "blue", ])
fig2 = px.scatter(df, x='x', y='y', color='z', facet_col='type', color_continuous_scale=["red", "blue", ])
fig1.show()
fig2.show()

Solution

  • There is nothing wrong with your code. If the second subplot looks fine, then the first one should as well. Looking at the plotly.js part via the browser console, it turns out that both traces end up using the same coloraxis with the same cmin/cmax values, and those values are properly inferred from the entire set of marker colors (negative+positive, ie. cmin/cmax fit the z data min and max, as cauto is enabled by default). So there is some inconsistency here, and this should just work for both traces.

    That said, setting range_color fixes the issue :

    fig2 = px.scatter(df, x='x', y='y', color='z', range_color=[df.z.min(), df.z.max()], facet_col='type', color_continuous_scale=["red", "blue", ])
    

    Note this is the plotly.express way of explicitly setting cmin/cmax. The values will be the same in the end, but with cauto turned off, which indicates that the problem is somehow related to cauto.