reinforcement-learningsarsa

Sarsa with neural network to solve the Mountain Car Task


I am trying to implement the Episodic Semi-gradient Sarsa for Estimating q described in Sutton's book to solve the Mountain Car Task. To approximate q I want to use a neural network. Therefore, I came up with this code. But sadly my agent is not really learning to solve the task. In some episodes the solution is found very fast (100-200 steps), but sometimes the agent needs more than 30k steps. I think, that I made some elementary mistake in my implementation, but I am not able to find it myself. Can someone help me, and point out the error/mistake in my implementation?


Solution

  • I solved this problem by changing the structure of the network: Instead of using the (state, action) pair to predict the Q-value of it, I changed it in the way DQN does it: I predict the value of all three possible actions for a given state and then choose the action according to this predictions. I was not able to find the problem with my previous approach, but at least this is now working.