I am learning python and I found something not intuitive from my perspective. I was trying to print Gausses curve, based on output from lottery. In that program I can set draw range, number of draws in one game and number of games. I sum results of draws in each game. I record how many times the result occurred, and based on that data I draw the graph.
When I set one draw in game, then each value probability is the same. It is visible in red colour on attached graph. And I expected that.
When I set three or more draws, the middle value probability is high. For example, if I have 3 draw in range from 0 to 100 then I can expect that sum of value will be in range from 0 to 300 and most probable value will be 150. When I draw in on graph, then I get Gauss curve. It is blue in graph.
The non intuitive case is when I set two draws. I expected that curve will be the same like in previous case, but I see that output is similar to triangular. It is green curve.
The questions are:
What is fundamental difference between two and more draw and why the output curves is different?
Why when I set two draw then I will not get Gauss curve?
Python code:
import random
import matplotlib.pyplot as plt
import collections
class GaussGame():
def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000) -> None:
self.draw_range = draw_range
self.number_of_draws = number_of_draws
self.number_of_games = number_of_games
def start(self):
#Create win dictionary which contains amounts of possible wins as a key and, number of wins for each possible amounts as a value.
win_dict = collections.OrderedDict()
for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
win_dict[x]=0
#Loop for all games
for x in range(self.number_of_games):
#Loop for one game
d_sum = 0 #Sum of the drawn values
d_sum
for x in range(self.number_of_draws):
d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
win_dict[d_sum] += 1
return win_dict
def main():
#When I run game several times, with different number_of_draws parameter and draw it on one graph, then I can get interesting picture :-D
g1 = GaussGame({min: 0, max: 100},1,10000000)
g2 = GaussGame({min: 0, max: 100},2,10000000)
g3 = GaussGame({min: 0, max: 100},3,10000000)
g4 = GaussGame({min: 0, max: 100},4,10000000)
g5 = GaussGame({min: 0, max: 100},5,10000000)
d1 = g1.start()
d2 = g2.start()
d3 = g3.start()
d4 = g4.start()
d5 = g5.start()
plt.plot(d1.keys(), d1.values(), 'r.')
plt.plot(d2.keys(), d2.values(), 'g.')
plt.plot(d3.keys(), d3.values(), 'b.')
plt.plot(d4.keys(), d4.values(), 'b.')
plt.plot(d5.keys(), d5.values(), 'b.')
plt.show()
if __name__ == "__main__":
main()
That looks about right. What you see, I believe, is Irwin-Hall distribution, or its variation.
When you sum small number of samples, it is not gaussian, but converges to it as soon as there are many samples, see CLT