tensorflowtensorboardtensorflow-litequantizationquantization-aware-training

Does Tensorflows quantization aware training lead to an actual speedup during training?


we are looking into using quantization aware training for a research project to determine the impact of quantization during training on convergence rates an runtimes. We are though not yet fully convinced that this is the right tool. Could you please clarify the following points: 1) If a layer is quantized during quantization aware training, this means inputs and weights are quantized and all operations including activation function are quantized and then, before returning, the outputs are de-quantized to a precision compatible with the next layer. Is this understanding correct? 2) Tensorboard profiler compatibility? 3) Does quantization aware training, in principle, lead to a speedup during training in your general experience or is this impossible due to it beeing solely a simulation? 4) Can you point us to a resource on how to add custom quantizers and datatypes to tensorflow s.t. they are GPU compatible?

Thank you very much for your help!


Solution

  • After doing some research, QAT does not speed up training but only prepares the model for post training quantization. MuPPET, however, is an algorithm that actually speeds up training via quantization.