time-seriesregressiongenetic-algorithmsymbolic-mathgenetic-programming

Best way to mutate an symbolic expression tree in genetic algorithm


Im creating a GA for a timeseries forecast in python. Say I have a large symbolic expression tree such as the following:

['avg', ['diff', 'x', ['avg', 'pi', 24.90887042555513]], ['sqrt', ['max', ['mul', ['diff', 53.79502493321837, 'e'], ['mul', 0.5144978394070354, 46.36225530228578]], 44.34745373778043], ['sqrt', ['diff', ['avg', 20.424103573006004, 67.68047383230076], ['div', 35.70761733351755, 76.63350676737794]], 6.6143363501814605]]]

What is a good way to randomly mutate?

1) should I only be mutating specifically only one random node? or use probability to decide how many, if and when a mutation occurs?

2) should I mutate by adding branches, or just an individual value(leaf node)

3) how should I go about implementing this mechanism? through the recursion mechanism? or build an index of the tree shape somehow and randomly select a nest to mutate?

Thanks in advance


Solution

  • In evolutionary algorithms it is important that through repeated mutation you can transform every expression tree into any other expression tree.

    In your case that requires you to do the following 4 things:

    1. changing all numbers by some random amount.
    2. Look at each number individually and change it with a small chance to either x or replace it by a new random expression(preferably only of depth 1 to prevent the expression from getting huge).
    3. Look at all variables and change it with a small chance to a random number or a new random expression(preferably again of depth 1).
    4. Remove some expressions randomly.

    1, 2 and 3 only change the leaves and don't require you to build the tree, but can be implemented by linearly going through the string.

    With a few simplifications 4. can also be implemented without creating the tree:

    Instead of removing a random expression, it is sufficient to only remove expressions that only contain numbers(like ['mul', 4, 5]), because together with processes 1,2,3 the expression tree can still be transformed into every other possible tree. Removing expressions that contain only numbers is quite easy, because you can just calculate the expression: ['mul', 4, 5] = 4*5 = 20.

    Another advantage of this modified version is that you don't change the output, but only the structure of your tree.

    But be careful and don't replace all expressions that contain only numbers. It can be useful to have a deeper structure for future mutations.