I'm working with Keras/Tensorflow to develop an ANN that will be deployed to a low-end MCU. For this purpose, I have quantized the original ANN using the post-training quantization mechanism offered by Tensorflow Lite. If the weights are indeed quantized to int8, biases were converted from float to int32. Considering that I pretend to implement this ANN in CMSIS-NN, this is a problem as they only support int8 and int16 data.
Is it possible to configure TF Lite to also quantize biases to int8? Below follows the code I am executing:
def quantizeToInt8(representativeDataset):
# Cast the dataset to float32
data = tf.cast(representativeDataset, tf.float32)
data = tf.data.Dataset.from_tensor_slices((data)).batch(1)
# Generator function that returns one data point per iteration
def representativeDatasetGen():
for inputValue in data:
yield[inputValue]
# ANN quantization
model = tf.keras.models.load_model("C:/Users/miguel/Documents/Universidade/PhD/Code_Samples/TensorFlow/originalModel.h5")
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representativeDatasetGen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_type = tf.int8
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_quant_model = converter.convert()
return tflite_quant_model
From Comments
It's not possible to configure
TFLite
to do that.Biases
are intentionallyint32
otherwise the quantization accuracy would not be good. In order to make this work, you'd have to add a new op or custom op and then come up with a custom quantization tooling all together.(paraphrased from Meghna Natraj).