Once I have loaded in a TensorflowJS model successfully, the first prediction always has a 1-2 second delay. This only occurs for the VERY first prediction globally. Say I have 2 models and I predict with model 1
and then with model 2
, I will get the delay on the first prediction with model 1
but NOT with model 2
s first prediction.
const prediction = model.predict(X[m][i]).dataSync()[0]
I am creating all my input tensors before I predict, so the delay must be coming exclusively from the prediction component. I assume there is some sort of initialization that's taking place. How can I remove the delay/initialize before first prediction?
The very first prediction has to initialized the weights on the backend. A warmup of the model is often recommanded to prevent the delay during first prediction. A warmup is just a prediction with a dummy data such as tf.ones
of tf.random
. The output of such a prediction is of no importance. But making such a prediction makes all the tensors of weigths to be initialized making the model ready - faster - for the next predictions.
const model = await tf.loadLayersModel(modelUrl);
// Warmup the model before using real data.
const warmupResult = model.predict(tf.zeros(inputShape));
warmupResult.dataSync(); // we don't care about the result
warmupResult.dispose();
// Now we can use the model for real predictions
// The second predict() will be much faster
const result = model.predict(userData);