randomprobabilityprobability-distributiontriangulargauss

Probability distribution of results from one, two and more draws


I am learning python and I found something not intuitive from my perspective. I was trying to print Gausses curve, based on output from lottery. In that program I can set draw range, number of draws in one game and number of games. I sum results of draws in each game. I record how many times the result occurred, and based on that data I draw the graph.

When I set one draw in game, then each value probability is the same. It is visible in red colour on attached graph. And I expected that.

When I set three or more draws, the middle value probability is high. For example, if I have 3 draw in range from 0 to 100 then I can expect that sum of value will be in range from 0 to 300 and most probable value will be 150. When I draw in on graph, then I get Gauss curve. It is blue in graph.

The non intuitive case is when I set two draws. I expected that curve will be the same like in previous case, but I see that output is similar to triangular. It is green curve.

--> Graph image <--

The questions are:

  1. What is fundamental difference between two and more draw and why the output curves is different?

  2. Why when I set two draw then I will not get Gauss curve?

Python code:


import random
import matplotlib.pyplot as plt
import collections

class GaussGame():
    def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000) -> None:
        self.draw_range = draw_range
        self.number_of_draws = number_of_draws
        self.number_of_games = number_of_games

    def start(self):
        #Create win dictionary which contains amounts of possible wins as a key and, number of wins for each possible amounts as a value.
        win_dict = collections.OrderedDict()
        for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
            win_dict[x]=0

        #Loop for all games
        for x in range(self.number_of_games):
            #Loop for one game
            d_sum = 0 #Sum of the drawn values
            d_sum
            for x in range(self.number_of_draws):
                d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
            win_dict[d_sum] += 1
        return win_dict

def main():
    #When I run game several times, with different number_of_draws parameter and draw it on one graph, then I can get interesting picture :-D
    g1 = GaussGame({min: 0, max: 100},1,10000000)
    g2 = GaussGame({min: 0, max: 100},2,10000000)
    g3 = GaussGame({min: 0, max: 100},3,10000000)
    g4 = GaussGame({min: 0, max: 100},4,10000000)
    g5 = GaussGame({min: 0, max: 100},5,10000000)

    d1 = g1.start()
    d2 = g2.start()
    d3 = g3.start()
    d4 = g4.start()
    d5 = g5.start()

    plt.plot(d1.keys(), d1.values(), 'r.')
    plt.plot(d2.keys(), d2.values(), 'g.')
    plt.plot(d3.keys(), d3.values(), 'b.')
    plt.plot(d4.keys(), d4.values(), 'b.')
    plt.plot(d5.keys(), d5.values(), 'b.')
    plt.show()

if __name__ == "__main__":
    main()

Solution

  • That looks about right. What you see, I believe, is Irwin-Hall distribution, or its variation.

    When you sum small number of samples, it is not gaussian, but converges to it as soon as there are many samples, see CLT