
Tensorflow JS first prediction delay

Once I have loaded in a TensorflowJS model successfully, the first prediction always has a 1-2 second delay. This only occurs for the VERY first prediction globally. Say I have 2 models and I predict with model 1 and then with model 2, I will get the delay on the first prediction with model 1 but NOT with model 2s first prediction.

const prediction = model.predict(X[m][i]).dataSync()[0]

I am creating all my input tensors before I predict, so the delay must be coming exclusively from the prediction component. I assume there is some sort of initialization that's taking place. How can I remove the delay/initialize before first prediction?


  • The very first prediction has to initialized the weights on the backend. A warmup of the model is often recommanded to prevent the delay during first prediction. A warmup is just a prediction with a dummy data such as tf.ones of tf.random. The output of such a prediction is of no importance. But making such a prediction makes all the tensors of weigths to be initialized making the model ready - faster - for the next predictions.

    const model = await tf.loadLayersModel(modelUrl);
    // Warmup the model before using real data.
    const warmupResult = model.predict(tf.zeros(inputShape));
    warmupResult.dataSync(); // we don't care about the result
    // Now we can use the model for real predictions
    // The second predict() will be much faster
    const result = model.predict(userData);