I want to profile the FLOPs of a very simple neural network model, which is used to classify the MNIST dataset, and the batch size is 128. As I followed the official tutorials, I got the result of the following model, but I cannot understand some parts of the output.
w1 = tf.Variable(tf.random_uniform([784, 15]), name='w1')
w2 = tf.Variable(tf.random_uniform([15, 10]), name='w2')
b1 = tf.Variable(tf.zeros([15, ]), name='b1')
b2 = tf.Variable(tf.zeros([10, ]), name='b2')
hidden_layer = tf.add(tf.matmul(images_iter, w1), b1)
logits = tf.add(tf.matmul(hidden_layer, w2), b2)
loss_op = tf.reduce_sum(\
tf.nn.softmax_cross_entropy_with_logits(logits=logits,
labels=labels_iter))
opetimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = opetimizer.minimize(loss_op)
The images_iter
and the labels_iter
are the iterators of tf.data, which are similar to the placeholder.
tf.profiler.profile(
tf.get_default_graph(),
options=tf.profiler.ProfileOptionBuilder.float_operation())
I used this code, which equals to scope -min_float_ops 1 -select float_ops -account_displayed_op_only
in tfprof comments line tool, to profile the FLOPs and got the below result.
Profile:
node name | # float_ops
_TFProfRoot (--/23.83k flops)
random_uniform (11.76k/23.52k flops)
random_uniform/mul (11.76k/11.76k flops)
random_uniform/sub (1/1 flops)
random_uniform_1 (150/301 flops)
random_uniform_1/mul (150/150 flops)
random_uniform_1/sub (1/1 flops)
Adam/mul (1/1 flops)
Adam/mul_1 (1/1 flops)
softmax_cross_entropy_with_logits_sg/Sub (1/1 flops)
softmax_cross_entropy_with_logits_sg/Sub_1 (1/1 flops)
softmax_cross_entropy_with_logits_sg/Sub_2 (1/1 flops)
My questions are
random_uniform_1 (150/301 flops)
, what are 150 and 301?I know it is discouraging to read a question so long, but a desperate boy who cannot find relating information from the official document needs your guys to help.
I'll give it a try:
(1) From this example, looks like the first number is the "self" flops, the second number means the "total" flops under the naming scope. For example: for the 3 nodes respectively named random_uniform (if there is such a node), random_uniform/mul, random_uniform/sub, they respectively take 11.76k, 11.76k, and 1 flops, and in total 23.52k flops.
For another example: 23.83k = 23.52k + 300.
Does this make sense?
(2) The root node is a "virtual" top-level node added by the profiler, which doesn't have a "self" flops , or in other words, it has zero self flops.
(3) Not sure why it is 1. It would help if you can print the GraphDef and find out what this node really is, with print(sess.graph_def)
Hope this helps.