pythontensorflowtensorflow-hubtensorflow-slim

Reproduce Tensorflow Hub module output with Tensorflow Slim


I am trying to reproduce the output from a Tensorflow Hub module that is based on a Tensorflow Slim checkpoint, using the Tensorflow Slim modules. However, I can't seem to get the expected output. For example, let us load the required libraries, create a sample input and the placeholder to feed the data:

import tensorflow_hub as hub
from tensorflow.contrib.slim import nets

images = np.random.rand(1,224,224,3).astype(np.float32)
inputs = tf.placeholder(shape=[None, 224, 224, 3], dtype=tf.float32)

Load the TF Hub module:

resnet_hub = hub.Module("https://tfhub.dev/google/imagenet/resnet_v2_152/feature_vector/3")
features_hub = resnet_hub(inputs, signature="image_feature_vector", as_dict=True)["resnet_v2_152/block4"]

Now, let's do the same with TF Slim and create a loader that will load the checkpoint:

with slim.arg_scope(nets.resnet_utils.resnet_arg_scope()):
    _, end_points = nets.resnet_v2.resnet_v2_152(image, is_training=False)
    features_slim = end_points["resnet_v2_152/block4"]
loader = tf.train.Saver(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="resnet_v2_152"))

Now, once we have everything in place we can test whether the outputs are the same:

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    loader.restore(sess, "resnet_v2_152_2017_04_14/resnet_v2_152.ckpt")
    slim_output = sess.run(features_slim, feed_dict={inputs: images})
    hub_output = sess.run(features_hub, feed_dict={inputs: images})
    np.testing.assert_array_equal(slim_output, hub_output)

However, the assertion fails because the two outputs are not the same. I assume that this is because TF Hub uses an internal preprocessing of the inputs that the TF Slim implementation lacks.

Let me know what you think!


Solution

  • Those Hub modules scale their inputs from the canonical range [0,1] to what the respective slim checkpoint expects from the preprocessing it was trained with (typically [-1,+1] for "Inception-style" preprocessing). Passing them the same inputs can explain a large difference. Even after linear rescaling to fix that, a difference up to compounded numerical error wouldn't surprise me (given the many degrees of freedom inside TF), but major discrepancies might indicate a bug.