The following code that uses the TF-Slim library to load a model and finetune it achieves a performance of 90% in a classification task (I omitted loading the data and preprocessing):
with slim.arg_scope(resnet_v1.resnet_arg_scope(weight_decay=0.0001)):
logits, _ = resnet_v1.resnet_v1_50(images, num_classes=dataset.num_classes, is_training=True)
one_hot_labels = slim.one_hot_encoding(labels, NUM_CLASSES)
tf.losses.softmax_cross_entropy(one_hot_labels, logits)
total_loss = tf.losses.get_total_loss()
global_step = variables.get_or_create_global_step()
lr = tf.train.exponential_decay(LEARNING_RATE, global_step, DECAY_STEPS, GAMMA)
optimizer = tf.train.MomentumOptimizer(learning_rate=lr, momentum=MOMENTUM)
train_op = slim.learning.create_train_op(total_loss, optimizer, global_step=global_step)
init_fn = slim.assign_from_checkpoint_fn("resnet_v1_50.ckpt", VARIABLES_TO_RESTORE)
final_loss = slim.learning.train( train_op, logdir=train_dir, log_every_n_steps=500, save_summaries_secs=25, init_fn=init_fn, number_of_steps = NUM_STEPS)
I tried rewriting the same code using vanilla tensorflow to have more control over the training process and for some reason I cannot achieve the same performance (10% performance drop) when using all the same hyperparameters (in uppercase) and same preprocessing. The differences are in the graph definition:
lr = tf.train.exponential_decay(LEARNING_RATE, global_step, DECAY_STEPS, GAMMA)
optimizer = tf.train.MomentumOptimizer(learning_rate=lr, momentum=MOMENTUM)
full_train_op = optimizer.minimize(total_loss, global_step=global_step)
and training:
for s in range(NUM_STEPS):
sess.run(train_init_op) #Initializes dataset iterator
while True:
try:
sess.run([full_train_op], feed_dict={is_training: True})
except tf.errors.OutOfRangeError:
break
Is the slim train function doing some other operations? I thought it might be using batch normalization or something else that I did not implement on my version of the code.
Is it possible to load the slim resnet model in tensorflow and train it without the slim train function? I am not interested in overriding train_step_fn.
This may be due to not running update_ops
associated with resnet's batch norm.
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
optimizer = tf.train.MomentumOptimizer(learning_rate=lr, momentum=MOMENTUM)
with tf.control_dependencies(update_ops):
full_train_op = optimizer.minimize(total_loss, global_step)
# same training loop