pythonmatplotlib

Plot an histogram with y-axis as percentage (using FuncFormatter?)


I have a list of data in which the numbers are between 1000 and 20 000.

data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]

When I plot a histogram using the hist() function, the y-axis represents the number of occurrences of the values within a bin. Instead of the number of occurrences, I would like to have the percentage of occurrences.

Histogram of the data list above

Code for the above plot:

f, ax = plt.subplots(1, 1, figsize=(10,5))
ax.hist(data, bins = len(list(set(data))))

I've been looking at this post which describes an example using FuncFormatter but I can't figure out how to adapt it to my problem. Some help and guidance would be welcome :)

EDIT: Main issue with the to_percent(y, position) function used by the FuncFormatter. The y corresponds to one given value on the y-axis I guess. I need to divide this value by the total number of elements which I apparently can' t pass to the function...

EDIT 2: Current solution I dislike because of the use of a global variable:

def to_percent(y, position):
    # Ignore the passed in position. This has the effect of scaling the default
    # tick locations.
    global n

    s = str(round(100 * y / n, 3))
    print (y)

    # The percent symbol needs escaping in latex
    if matplotlib.rcParams['text.usetex'] is True:
        return s + r'$\%$'
    else:
        return s + '%'

def plotting_hist(folder, output):
    global n

    data = list()
    # Do stuff to create data from folder

    n = len(data)
    f, ax = plt.subplots(1, 1, figsize=(10,5))
    ax.hist(data, bins = len(list(set(data))), rwidth = 1)

    formatter = FuncFormatter(to_percent)
    plt.gca().yaxis.set_major_formatter(formatter)

    plt.savefig("{}.png".format(output), dpi=500)

EDIT 3: Method with density = True

enter image description here

Actual desired output (method with global variable):

enter image description here


Solution

  • Other answers seem utterly complicated. A histogram which shows the proportion instead of the absolute amount can easily produced by weighting the data with 1/n, where n is the number of datapoints.

    Then a PercentFormatter can be used to show the proportion (e.g. 0.45) as percentage (45%).

    import numpy as np
    import matplotlib.pyplot as plt
    from matplotlib.ticker import PercentFormatter
    
    data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
    
    plt.hist(data, weights=np.ones(len(data)) / len(data))
    
    plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
    plt.show()
    

    enter image description here

    Here we see that three of the 7 values are in the first bin, i.e. 3/7=43%.