tensorflowresuming-training

Tensorflow: loading, saving time


I have four different models with same structures, which are used as a predictor in the "main" problem. Each time in "main" problem I call one of them to provide the prediction. Also, using the new observation, I updates the weights of the each network.

Currently, in order to differentiate between the models I save them in four different ckpt models, then I load them each time to do prediction or updating it. When the network is updated, I save it again.

This procedure works good. The problem is that initializing the variables, loading the model, and saving it again is too expensive. Each time I call the network to update, it takes about 10 seconds which around 1 second is for training and the reminder of time is for initializing, loading and saving.

As another approach, I tried to have the model stayed in memory. But, since I have one dnn.py, which I call it for each of the four problems, the name of the variables, parameters, etc. are the same. So, TensorFlow get confused about them, and it just does not work. Since I may have more than four predictors, (even like 22), it is not reasonable to create four different dnn.py with different name of variables.

I would appreciate any help to write the procedure in an efficient way.

Best,

Afshin


Solution

  • Reading variables from disk for each training step sounds inefficient, you should reorganize your network to keep those values in memory, ie, by using variable_scope to keep different sets of variables separate