I would like to make a graph with a multi-level x axis like in the following picture:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(
go.Scatter(
x = [df['x'], df['x1']],
y = df['y'],
mode='markers'
)
)
But also I would like to put jitter on the x-axis like in the next picture:
So far I can make each graph independently using the next code:
import plotly.express as px
fig = px.strip(df,
x=[df["x"], df['x1']],
y="y",
stripmode='overlay')
Is it possible to combine the jitter and the multi-level axis in one plot?
Here is a code to reproduce the dataset:
import numpy as np
import pandas as pd
import random
'''Create DataFrame'''
price = np.append(
np.random.normal(20, 5, size=(1, 50)), np.random.normal(40, 2, size=(1, 10))
)
quantity = np.append(
np.random.randint(1, 5, size=(50)), np.random.randint(8, 12, size=(10))
)
firstLayerList = ['15 in', '16 in']
secondLayerList = ['1/2', '3/8']
vendorList = ['Vendor1','Vendor2','Vendor3']
data = {
'Width': [random.choice(firstLayerList) for i in range(len(price))],
'Length': [random.choice(secondLayerList) for i in range(len(price))],
'Vendor': [random.choice(vendorList) for i in range(len(price))],
'Quantity': quantity,
'Price': price
}
df = pd.DataFrame.from_dict(data)
Firstly - thanks for the challenge! There aren't many challenging Plotly questions these days.
The key elements to creating a scatter graph with jitter are:
mode: 'box'
- to create a box-plot, not a scatter plot.'boxpoints': 'all'
- so all points are plotted.'pointpos': 0
- to center the points on the x-axis.'fillcolor': 'rgba(255,255,255,0)'
'line': {'color': 'rgba(255,255,255,0)'}
This code simply splits the main DataFrame into a frame for each vendor, thus allowing a trace to be created for each, with their own colour.
df1 = df[df['Vendor'] == 'Vendor1']
df2 = df[df['Vendor'] == 'Vendor2']
df3 = df[df['Vendor'] == 'Vendor3']
The plotting code could use a for
-loop if you like. However, I've intentionally kept it more verbose, so as to increase clarity.
import plotly.io as pio
layout = {'title': 'Categorical X-Axis, with Jitter'}
traces = []
traces.append({'x': [df1['Width'], df1['Length']], 'y': df1['Price'], 'name': 'Vendor1', 'marker': {'color': 'green'}})
traces.append({'x': [df2['Width'], df2['Length']], 'y': df2['Price'], 'name': 'Vendor2', 'marker': {'color': 'blue'}})
traces.append({'x': [df3['Width'], df3['Length']], 'y': df3['Price'], 'name': 'Vendor3', 'marker': {'color': 'orange'}})
# Update (add) trace elements common to all traces.
for t in traces:
t.update({'type': 'box',
'boxpoints': 'all',
'fillcolor': 'rgba(255,255,255,0)',
'hoveron': 'points',
'hovertemplate': 'value=%{x}<br>Price=%{y}<extra></extra>',
'line': {'color': 'rgba(255,255,255,0)'},
'pointpos': 0,
'showlegend': True})
pio.show({'data': traces, 'layout': layout})
The data behind this graph was generated using np.random.seed(73)
, against the dataset creation code posted in the question.
The example code shown here uses the lower-level Plotly API, rather than a convenience wrapper such as graph_objects
or express
. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the list
s and dict
s which Plotly is constructing for the underlying graphing engine (orca).
And this use-case is a prime example of this reasoning, as it’s edging Plotly past its (current) design point.