pythonnumpymatplotlibhistogramnormal-distribution

How do I draw a histogram for a normal distribution?


My question is - Use the NumPy functions np.random.randn to generate data x for a normal distribution for 100,000 points. Then plot a histogram.

My computation is -

x = sp.norm.pdf(np.random.randn(100000))
plt.hist(x, bins = 20, facecolor='blue', alpha=0.5)

Is there something wrong as I can't get the histogram of a normal distribution?


Solution

  • To obtain N random samples from a standard normal distribution, you can either use np.random.randn(N) or scipy's stats.norm.rvs(size=N). These samples then can be used to create histogram.

    To draw the curve, stats.norm.pdf(y) can be used, where y is an array of subsequent x-values. Such a pdf is normalized, i.e. the area under the plot is 1. The total area of the histogram is the number of samples times the width of the bins (each sample falls in exactly one bin). Therefore, multiplying the pdf with that factor will scale it to the height of the histogram.

    The result of stats.norm.pdf(np.random.randn(N)) would be a list of probabilties of N random samples. Most samples will end up near the center of the curve (at y = 0), where the height of the pdf is about 0.40. This explains the high peak near that maximum.

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats
    
    N = 100000
    # x = np.random.randn(N)
    x = stats.norm.rvs(size=N)
    num_bins = 20
    plt.hist(x, bins=num_bins, facecolor='blue', alpha=0.5)
    
    y = np.linspace(-4, 4, 1000)
    bin_width = (x.max() - x.min()) / num_bins
    plt.plot(y, stats.norm.pdf(y) * N * bin_width)
    
    plt.show()
    

    example plot