time-seriesexpression-treesgenetic-algorithmsymbolic-mathgenetic-programming

Genetic Algorithm timeseries forcast creating an initial population


I am building a genetic algorithm that does a time series forecast in the symbolic regression analysis. I’m trying to get the algorithm to find an equation that will match the underlying trend of the data. (predict monthly beer sales)

The idea is to use lisp like expressions, which writes the equation in a tree. This allows for branch swapping in the crossover/mating stage.

5* (5 +5)

Written as:

X = '(mul 5 (add 5 5))'
Y = parser(X) 
y = ['mul', 5, ['add', 5, 5]]

I want to know how to create an initial population set where the individuals represent different expressions automatically. Where there “fitness” is related to how well each equation matches the underlying trend.

For example, one individual could be: '(add 100 (mul x (sin (mul x 3))))' Where x is time in months.

How do I automatically generate expressions for my population? I have no idea how to do this, any help would be very appreciated.


Solution

  • You can easily solve this problem with recursion and a random number generator random() which returns a (pseudo-)random float between 0 and 1. Here is some pseudocode:

    randomExp() {
        // Choose a function(like mul or add):
        func = getRandomFunction() // Just choose one of your functions randomly.
        arg1 = ""
        rand1 = random()
        // Choose the arguments. You may choose other percentages here depending how deep you want it to be and how many 'x' you want to have.
        if(rand1 < 0.2)
            arg1 = randomExp() // Here add a new expression
        else if(rand1 < 0.5)
            arg1 = "x"
        else
            arg1 = randomConstant() // Get a random constant in a predefined range.
        // Do the same for the second argument:
        arg2 = ""
        …
        …
        // Put everything together and return it:
        return "("+func+" "+arg1+" "+arg2+")"
    }
    

    You might want to also limit the recursion depth, as this might return you a theoretically infinitely long expression.