tensorflowkerasonnxquantization

There exists ONNX or Tensorflow CNN 4-bit quantized models available?


I would like to use 4-bit quantized CNN models with reasonable accuracy. However, when I have tried to quantize a network by myself, I have obtained a significant accuracy reduction. Does anyone know if there exist available CNN quantized 4-bit models in tensorflow/keras or ONNX format?


Solution

  • The paper Banner, R., Nahshan, Y., & Soudry, D. (2019). Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems, 32. performed effective 4-bit quantization on CNN models such as ResNet50. They also provided a codebase which you can find on GitHub.

    Once you get the code up and running, you should be able to export your quantized model of choice to the ONNX format. Also, please note that this paper already is 4 years "old" (which is fairly old in the realm of machine learning). There might exist a newer paper which also comes with a codebase. Since the paper I've mentioned is quite popular (>500 citations), any newer approach would probably have referenced this article; therefore, you can check Google Scholar and see if there are any newer quantization articles out there who based their work on the aforementioned paper.