tensorflowkerastensorflow2.0tf.kerastensorflow2.x

Does mixed tf.keras.mixed_precision work for inference?


I am not sure if I understand the idea of tensorflow keras mixed precision. My goal is to run a tf.keras model with floating point 16 precision to improve inference speed. Can this be done with mixed precision?

I am setting this policy before training my model:

from tensorflow.keras.mixed_precision import experimental as mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

Or this is just to speed-up training. If this is the case, how could I achieve weights/activations of my tf.keras model to have FP16 precision?

Note: I am using tensorflow==2.3.0


Solution

  • There is mixed precision in training as in the link you mentioned. Nvidia has a little bit more in depth information what that means here: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

    Here is the paper describing the actual process, as well as how and why there is a copy of the FP16 weights in FP32 as master weights (therefore a mix of precisions). https://arxiv.org/pdf/1710.03740.pdf

    But there are also mixed precision operations in hardware and these can speedup your inference when your data is FP32 and your weights/biases are in FP16, with hardware that supports mixed precision operations this can accelerate inference a lot.

    For example with a Nvidia T4 i had a a speedup of ~2 on YOLO3, but no speedup on a Nvidia 1080.