I am having issues understanding the algorithm. Here is the most popular one seen online
for all members of population
sum += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability
then you have been selected
end for
end
create offspring
end loop
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
^^^this piece in particular confuses me. What are the "sum of probabilities" and even "probability" in the context of an individual in a population? Are these like values individuals have on inception?
That is a very obfuscated piece of code.
In that second block of code, probability
is a variable attached to each member of the population, and sum of probabilities
is a global variable for the whole population.
Now, what the roulette wheel metaphor is saying, is that the entire population can be represented as a roulette wheel, and each member of the population has a slice in that roulette wheel proportional to its relative fitness. That code is doing the dirty work behind that metaphor-- instead of wedges on a wheel, the members are now represented by proportional intervals on the line segment [0,1], which is a customary way to represent probabilities.
To do that, you technically need two numbers, a start and an end, for each member. But the first member's start is going to be 0; the second member's start is going to be the end of the first member; etc, until the last member, which has an end of 1.
That's what that code is doing. Sum of probabilities
starts out at 0, and the first time through the loop, the probability
is what you intuitively thought it would be. It is marking the end point of the first member. Then the "sum of probabilities" is updated. The second time through the loop, the "probability" is what you intuitively thought it would be... shifted over by the "sum of probabilities." And so it goes.
So the first loop is summing fitness values as a prelude to normalizing things. The second loop, that you ask about, is normalizing and arranging those normalized probabilities in the unit interval. And the third (most complex) loop is picking two random numbers, matching them up with two members of the population, and mating them. (Note that the assumption is that those members are in some array-like format so that you can sequentially check their endpoints against the random number you've rolled.)