I'm familiar with GA within the context of strings or text but not with numerical data.
For strings, I understand how crossover and mutation would apply:
ParentA = abcdef
ParentB = uvwxyz
Using one-point crossover:
ChildA = abwxyz (pivot after 2nd gene)
ChildB = uvcdef
Using random gene mutation (after crossover):
ChildA = abwgyz (4th gene mutated)
ChildB = uvcdef (no genes mutated)
For strings, I have a discrete alphabet to go off of, but how would these operators apply to continuous numerical data?
For example, chromosomes represented as points in 4-space (each axes is a gene):
ParentA = [19, 58, 21, 54]
ParentB = [65, 21, 59, 11]
Would it be appropriate to apply crossover by switching axes of both parents for offspring?
ChildA = [19, 58, 59, 11] (pivot after 2nd gene)
ChildB = [65, 21, 21, 54]
I have a feeling this seems alright, but my naive notion of mutation, randomizing a gene, doesn't seem correct:
ChildA = [12, 58, 59, 11] (1st gene mutated)
ChildB = [65, 89, 34, 54] (2nd and 3rd genes mutated)
I'm just unsure how genetic algorithms can be applied to numeric data like this. I know what I need for GA but not how to apply the operators. For example, consider the problem being minimizing the Rastrigin function in 4-dimensions: the search space is [-512, 512]
in each dimension and the fitness function is the Rastrigin function. I don't know how the operators as I've described here can help find a more fit chromosome.
For what it's worth, elite selection and population initialization seems straightforward, my only confusion comes with crossover and mutation operators.
I did an implementation of a GA for continuous numerical data using the mutation and crossover rates as I have described here. The optimization problem is the Styblinski-Tang function in two dimensions because it is easy to graph. I'm also using standard elite and tournament selection strategies.
I find that the population best fitness does converge nicely to a solution, the average fitness doesn't really.
Here I've plotted the search space over ten generations: a black dot is a candidate solution and the red 'x' is the global optimum:
The crossover operator as I've described seems to work well but the mutation operator (randomizing both, either, or neither of the x or y positions of a chromosome) seems to create crosshair and crosshatch patterns.
I did a run in 50 dimensions to prolong convergence (since in two dimensions it converges in one generation) and plotted it:
Here the y-axis represents how close a solution was to global optimum (since the optimum is known), it's just a fraction actual output / expected output
. It's a percentage. Green line is population best (approx 96-97% target), blue is population average (fluctuates 65-85% target).
This verifies what I thought: the mutation operator doesn't really affect the population best but does mean the population average never converges and it fluctuates up and down.
So my question for the bounty is what mutation operators can be used other than randomization of a gene?
Just to add: I ask this question because I'm interested in using GA to optimize neural network weights to train a network in lieu of backpropagation. If you know anything about that, any source detailing that would also answer my question.
You may consider using polynomial mutation, which is the default operator in the very popular NSGA-II algorithm and several other real-coded genetic algorithms as well. I have also used it extensively. You can find a full description of it here (see Section 2) and a python implementation here (search for mutPolynomialBounded
).
In essence, a polynomial probability distribution is used to perturb a solution in a parent’s vicinity. It has parameter, η, that controls how likely the mutant would be close to the parent. Likelihood of the mutant's resemblance to the parent decreases with reducing η. [20, 100] is a common range for η.
You may also consider using non-uniform mutation operators, in which, as the generation number increases, mutated solutions are generated closer to the parents for better convergence. Here is an example (Page 21).