tensorflowtensorflow-slim

What is the best practice for running training and evaluation on the same machine?


What I want to do?

  1. I only have 1 machine.

  2. I want to evaluate the mode periodically.

What I have now?

  1. use a placeholder. Say I run 1000 step of training by feeding the training data. then I feed in validation dataset for evaluation. put it in a loop.

    But as google suggested, placeholder is not a good way for long run training.

  2. So, I use slim dataset to feed in data. Now, the model is bonded with training dataset like this:

     net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
                                scope='conv1')
    

    I have to construct another model(in another graph) which is bonded with validation dataset.

Is there a better way of doing that?


Solution

  • The tf.estimator.train_and_evaluate() API is designed to simplify training and evaluation on the same machine (and also includes support for scaling to multiple machines, either locally, or using Cloud ML Engine). Internally, it builds different graphs for training and evaluation, and connects different input pipelines (defined as "input functions") from a tf.estimator.TrainSpec and a tf.estimator.EvalSpec to those graphs. You can use the Slim API to build these input functions, but we now recommend that you use the tf.data API, which is more flexible and efficient.