plotlyhistogramplotly-pythonplotly-express

Get the values from a histogram or the values from a trace


In plotly I can create a histogram as e.g. in this example code from the documentation:

import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()

which results to: enter image description here

My question is how do I get the data values of the histogram? From what I can think of, this question should be equivalent to how do I access the values of a trace? (google did not help with either)

I could use numpy to redo the histogram:

import numpy as np
np.histogram(df.total_bill)

But this will not always result to the same buckets, plus it is re-doing all the sometimes expensive computation that goes to create a histogram.

enter image description here


Solution

  • My understanding of your question is that you would like to get the exact intervals and counts displayed in the histogram. For smaller subset of px.data.tips(), this:

    enter image description here

    And reading off the chart those values would be:

    counts = [2, 4, 3, 1]
    bins = [5, 15, 25, 35, 45]
    

    There's no direct way to do this, but that doesn't mean it's impossible. At least if you're willing to use the awesome fig.full_figure_for_development() and a little numpy.

    Code highlights (complete snippet at the very end)

    xbins = f.data[0].xbins
    plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
    counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
    

    Output:

    [2 4 3 1] [ 5 15 25 35 45]
    

    All the details:

    What I'm guessing you would like to be able to do is this:

    Run:

    fig.data[0].count
    

    And get:

    [2, 4, 3, 1]
    

    But the closest you'll get is this:

    Run:

    fig.data[0].x
    

    And get:

    [15.53, 10.07, 12.6 , 32.83, 35.83, 29.03, 27.18, 22.67, 17.82,
       18.78]
    

    And those are just the raw values from the inputdf['total_bill'].tail(10). So DerekO is right in that the rest is handled by javascript. But fig.full_figure_for_development() will:

    [...] return a new go.Figure object, prepopulated with the same values you provided, as well as all the default values computed by Plotly.js, to allow you to learn more about what attributes control every detail of your figure and how you can customize them.

    So running f = fig.full_figure_for_development(warn=False), and then:

    f.data[0].xbins
    

    Will give you:

    histogram.XBins({
        'end': 45, 'size': 10, 'start': 5
    })
    

    And now you know enough to get the same values in your figure with a little numpy:

    Complete code:

    import plotly.express as px
    import numpy as np
    
    df = px.data.tips()
    df = df.tail(10)
    fig = px.histogram(df, x="total_bill")
    f = fig.full_figure_for_development(warn=False)
    
    xbins = f.data[0].xbins
    plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
    counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
    print(counts, bins)