I was wondering if anybody has an idea why the number of FLOPs for a Conv2d
operation is 2 instead of 1. In the example below, the input is a 1x1
image with 1 channel and the batch size is 1. The number of features in the convolution is also 1 with no bias. Ideally the number of multiplication should be 1. But the output of TF profiler says that the FLOPs is 2. Does the FLOPs include something other than the multiplication? Thanks.
Here is the example:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # assuming you have a gpu0
import tensorflow as tf
from keras import backend as K
def load_pb(pb):
with tf.gfile.GFile(pb, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def, name='')
return graph
def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
from tensorflow.python.framework.graph_util import convert_variables_to_constants
graph = session.graph
with graph.as_default():
freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
output_names = output_names or []
output_names += [v.op.name for v in tf.global_variables()]
input_graph_def = graph.as_graph_def()
if clear_devices:
for node in input_graph_def.node:
node.device = ""
frozen_graph = convert_variables_to_constants(session, input_graph_def,output_names, freeze_var_names)
return frozen_graph
# define the model
inp = tf.keras.layers.Input(batch_shape=(1, 1, 1, 1), name='input')
x = tf.keras.layers.Conv2D(1, kernel_size=(1, 1), strides=(1, 1), padding='same', name='conv', use_bias=False)(inp)
out = tf.keras.layers.Flatten(name='output')(x)
model = tf.keras.models.Model(inputs=inp, outputs=out)
model.summary()
# freeze the model
output_graph_def = freeze_session(K.get_session(), output_names=[out.op.name for out in model.outputs])
with tf.gfile.GFile('graph.pb', "wb") as f:
f.write(output_graph_def.SerializeToString())
# load the protobuf and perform tf profiling
g2 = load_pb('./graph.pb')
with g2.as_default():
opts = tf.profiler.ProfileOptionBuilder.float_operation()
flops = tf.profiler.profile(g2, run_meta=tf.RunMetadata(), cmd='scope', options=opts)
print('FLOP', flops.total_float_ops)
The output is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (1, 1, 1, 1) 0
_________________________________________________________________
conv (Conv2D) (1, 1, 1, 1) 1
_________________________________________________________________
output (Flatten) (1, 1) 0
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
Converted 1 variables to const ops.
Parsing Inputs...
=========================Options=============================
-max_depth 10000
-min_bytes 0
-min_peak_bytes 0
-min_residual_bytes 0
-min_output_bytes 0
-min_micros 0
-min_accelerator_micros 0
-min_cpu_micros 0
-min_params 0
-min_float_ops 1
-min_occurrence 0
-step -1
-order_by float_ops
-account_type_regexes .*
-start_name_regexes .*
-trim_name_regexes
-show_name_regexes .*
-hide_name_regexes
-account_displayed_op_only true
-select float_ops
-output stdout:
==================Model Analysis Report======================
Doc:
scope: The nodes in the model graph are organized by their names, which is hierarchical like filesystem.
flops: Number of float operations. Note: Please read the implementation for the math behind it.
Profile:
node name | # float_ops
_TFProfRoot (--/2 flops)
conv/Conv2D (2/2 flops)
======================End of Report==========================
FLOP 2
Consider almost the same setup as you have, but instead there are n channels to the convolution. Then you would have n multiplications, and then you would cumulatively sum the results of all multiplications. Now one can say that you can initialize the sum by the result of the first multiplication, and then cumulatively sum the rest of then (n-1) multiplications. But this would be a special treatment to the first multiplication, and instead it makes more sense to initialize the sum by 0, and then cumulatively sum it with all n multiplications. In particular when n=1 you would have an absurd case where
sum = 0
mult = w1 * a1
sum = sum + mult
which will result in 2 FLOPs, or 1 MAC which is (multiply-accumulate) operation.