pythontensorflowneural-networkartificial-intelligence2048

TensorFlow neural network as an API?


I am in the process of writing an AI for the game 2048. At the moment, I can pull the game state from the browser and send moves to the game, but I don't know how to integrate that with TensorFlow. The nature of the project isn't conducive to training data, so I was wondering if it's possible to pass in the state of the game, have the network chuck out a move, run the move, repeat until the game is over, and then have it do the training?


Solution

  • This is certainly possible and trivial. You'll have to set up the model you want to use and I will assume that's been built.

    From the perspective of interacting with a tensorflow model you just need to marshal your data into numpy arrays to pass in via the feed_dict property of sess.run.

    To pass an input to tensorflow and get a result you would run something like this:

    result = sess.run([logits], feed_dict={x:input_data})
    

    This would perform a forward pass producing the output of the model without making any update. Now you'll take the results and use them to take the next step in the game.

    Now that you have the result of your action (e.g. labels) you can perform an update step:

    sess.run([update_op], feed_dict={x:input_data, y:labels})
    

    It's as simple as that. Notice that your model will have an optimizer defined (update_op in this example), but if you don't ask tensorflow to compute it (as in the first code sample) no updates will occur. Tensorflow is all about a dependency graph. The optimizer is dependent on the output logits, but computing logits is not dependent on the optimizer.

    Presumably you'll initialize this model randomly, so the first results will be randomly generated, but each step after that will benefit from the previous updates being applied.

    If you're using a reinforcement learning model then you would only produce a reward at some indeterminant time in the future and when you run the update would vary a little from this example, but the general nature of the problem remains the same.