I am building a neural net to predict the outcome of pairwise comparison. The same encoder network is applied on both inputs before merging and computing the result in the downstream part. In my use case, I am computing all the pairwise predictions for a given set of elements, as such the number of predictions grows very quickly, thus I am interested in speeding-up the prediction process.
Doing the complete set of pairwise predictions naively involves computing the result of the encoder network on each of the element over and over again. Since the encoder network is bigger than the downstream part (merging + layers downstream), I thought that precomputing the result of the encoder network on each input element and then only computing the downstream on these encoded values would lead to significant speed-up. However that is not really what I find in practice. For the example below both on Colab (CPU) and on my machine (CPU), I can get savings of 10-15% of runtimes when I would have expected something like 50% if you think in terms of layers and even more if you think in terms of parameters.
I feel like I am missing something, either in the implementation or that tensorflow/keras already does some kind of magic (caching?) given the structure of the network thus leading to smaller gains?
import numpy as np # numpy will be used for mgrid to compute all the pairs of the input
import tensorflow as tf
# Encoder Network
input_a = tf.keras.Input(shape=(10,4))
x = tf.keras.layers.Flatten()(input_a)
x = tf.keras.layers.Dense(100, activation='relu')(x)
x = tf.keras.layers.Dense(20, activation='relu')(x)
x = tf.keras.layers.Dense(10, activation='relu')(x)
upstream_network = tf.keras.Model(input_a, x)
# Downstream network, from merge to final prediction
input_downstream_a = tf.keras.Input(shape = upstream_network.layers[-1].output_shape[1:])
input_downstream_b = tf.keras.Input(shape = upstream_network.layers[-1].output_shape[1:])
x = tf.keras.layers.subtract([input_downstream_a, input_downstream_b])
x = tf.keras.layers.Dense(20, activation='relu')(x)
x = tf.keras.layers.Dense(1, activation='sigmoid')(x)
downstream_network = tf.keras.Model((input_downstream_a, input_downstream_b), x)
# Full network
input_full_a = tf.keras.Input(shape=(10,4))
input_full_b = tf.keras.Input(shape=(10,4))
intermed_a = upstream_network(input_full_a)
intermed_b = upstream_network(input_full_b)
res = downstream_network([intermed_a, intermed_b])
full_network = tf.keras.Model([input_full_a, input_full_b], res)
full_network.compile(loss='binary_crossentropy')
# Experiment
population = np.random.random((300, 10, 4))
# %%timeit 10
# 1.9s on Colab CPU
indices = np.mgrid[range(population.shape[0]), range(population.shape[0])].reshape(2, -1)
full_network.predict([population[indices[0]], population[indices[1]]])
# %%timeit 10
# 1.7s on Colab CPU
out = upstream_network.predict(population)
indices = np.mgrid[range(population.shape[0]), range(population.shape[0])].reshape(2, -1)
downstream_network.predict([out[indices[0]], out[indices[1]]])
First you are not going to be able to test time, using small input population, you may try bigger input size 600, 700, 800, but even though, the prediction time not going to increase a lot.
In your case I suggest using predict_on_batch
rather than predict
, as it's not going to split your input to n batches, which a time consuming task, and predict_on_batch
is more reasonable in case of your data could be fitted into google colab memory
full_network.predict_on_batch([population[indices[0]], population[indices[1]]])
Using your test case (300, 10, 4)
- predict_on_batch
array([[0.5 ],
[0.5022318 ],
[0.47754446],
...,
[0.50507313],
[0.4884554 ],
[0.5 ]], dtype=float32)
time: 216 ms