[SOLVED] How do you update the weights in function approximation with reinforcement learning?

How do you update the weights in function approximation with reinforcement learning?

My SARSA with gradient-descent keep escalating the weights exponentially. At Episode 4 step 17 the value is already nan

Exception: Qa is nan

e.g:

6) Qa:
Qa = -2.00890180632e+303

7) NEXT Qa:
Next Qa with west = -2.28577776413e+303

8) THETA:
1.78032402991e+303 <= -0.1 + (0.1 * -2.28577776413e+303) - -2.00890180632e+303

9) WEIGHTS (sample)
5.18266630725e+302 <= -1.58305782482e+301 + (0.3 * 1.78032402991e+303 * 1)

I don't know where to look for the mistake I made. Here's some code FWIW:

def getTheta(self, reward, Qa, QaNext):
    """ let t = r + yQw(s',a') - Qw(s,a) """
    theta = reward + (self.gamma * QaNext) - Qa


def updateWeights(self, Fsa, theta):
    """ wi <- wi + alpha * theta * Fi(s,a) """
    for i, w in enumerate(self.weights):
        self.weights[i] += (self.alpha * theta * Fsa[i])

I have about 183 binary features.

Solution

you need normalization in each trial. This will keep the weights in a bounded range. (e.g. [0,1]). They way you are adding the weights each time, just grows the weights and it would be useless after the first trial.

I would do something like this:

self.weights[i] += (self.alpha * theta * Fsa[i])
normalize(self.weights[i],wmin,wmax)

or see the following example (from literature of RL):

enter image description here

You need to write the normalization function by yourself though ;)