machine-learningneural-networkdeep-learningreinforcement-learningq-learning

Why doesn't my neural network Q-learner doesn't learn tic-tac-toe


Okay, so I have created a neural network Q-learner using the same idea as DeepMind's Atari algorithm (except I give raw data not pictures (yet)).

Neural network build:

I'm 100% confident network is built correctly because of gradient checks and lots of tests.

Q-parameters:

Problem

All my Q-values go to zero if I give -1 reward when move is made to already occupied spot. If I don't do it the network doesn't learn that it shouldn't make moves to already occupied places and seems to learn arbitrary Q-values. Also my error doesn't seem to shrink.

Solutions that didn't work

Project in GitHub: https://github.com/Dopet/tic-tac-toe (Sorry for ugly code mostly due to all of these refactorings of code, also this was supposed to be easy test to see if the algorithm works)

Main points:


Solution

  • It was a matter of rewards/removing activation function from the output layer. Most of the times I had rewards of [-1, 1] and my output layer activation function was sigmoid which goes from [0, 1]. This resulted the network to always have error when rewarding it with -1 because the output can never be less than zero. This caused the values go to zero since it tried to fix the error but it couldn't