In plotly I can create a histogram as e.g. in this example code from the documentation:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()
My question is how do I get the data values of the histogram? From what I can think of, this question should be equivalent to how do I access the values of a trace? (google did not help with either)
I could use numpy to redo the histogram:
import numpy as np
np.histogram(df.total_bill)
But this will not always result to the same buckets, plus it is re-doing all the sometimes expensive computation that goes to create a histogram.
My understanding of your question is that you would like to get the exact intervals and counts displayed in the histogram. For smaller subset of px.data.tips()
, this:
And reading off the chart those values would be:
counts = [2, 4, 3, 1]
bins = [5, 15, 25, 35, 45]
There's no direct way to do this, but that doesn't mean it's impossible. At least if you're willing to use the awesome fig.full_figure_for_development()
and a little numpy.
xbins = f.data[0].xbins
plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
[2 4 3 1] [ 5 15 25 35 45]
What I'm guessing you would like to be able to do is this:
Run:
fig.data[0].count
And get:
[2, 4, 3, 1]
But the closest you'll get is this:
Run:
fig.data[0].x
And get:
[15.53, 10.07, 12.6 , 32.83, 35.83, 29.03, 27.18, 22.67, 17.82,
18.78]
And those are just the raw values from the inputdf['total_bill'].tail(10)
. So DerekO is right in that the rest is handled by javascript. But fig.full_figure_for_development()
will:
[...] return a new go.Figure object, prepopulated with the same values you provided, as well as all the default values computed by Plotly.js, to allow you to learn more about what attributes control every detail of your figure and how you can customize them.
So running f = fig.full_figure_for_development(warn=False)
, and then:
f.data[0].xbins
Will give you:
histogram.XBins({
'end': 45, 'size': 10, 'start': 5
})
And now you know enough to get the same values in your figure with a little numpy:
import plotly.express as px
import numpy as np
df = px.data.tips()
df = df.tail(10)
fig = px.histogram(df, x="total_bill")
f = fig.full_figure_for_development(warn=False)
xbins = f.data[0].xbins
plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
print(counts, bins)