I'm doing an AI with reinforcement learning and i'm getting weird results, the loss shows like this: Tensorflow loss: https://i.sstatic.net/hispR.jpg
And while it's training, after each game, it's playing against a random player and after a player with a weighted matrix, but it goes up and down: results: https://i.sstatic.net/mtWiS.jpg
Basically i'm doing a reinforcement learning agent that learns to play Othello. Using E-greedy, Experience replay and deep networks using Keras over Tensorflow. Tried different architectures like sigmoid, relu and in the images shown above, tanh. All them have similar loss but the results are a bit different. In this exemple the agent is learning from 100k professional games. Here is the architecture, with default learning rate as 0.005:
model.add(Dense(units=200,activation='tanh',input_shape=(64,)))
model.add(Dense(units=150,activation='tanh'))
model.add(Dense(units=100,activation='tanh'))
model.add(Dense(units=64,activation='tanh'))
optimizer = Adam(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss=LOSS,optimizer=optimizer)
Original code: https://github.com/JordiMD92/thellia/tree/keras
So, why i get these results? Now my input is 64 neurons (8*8 matrix), with 0 void square, 1 black square and -1 white square. Is it bad to use negative inputs?
It might be your activate function's problem. Try to use relu instead of tanh, and if you are using the deep q learning, you might dont need any activate function or take care about the optimizer which reset the weights.