tensorflow huggingface-transformers huggingface tflite

Why tflite model output shape is different than the original model converted from T5ForConditionalGeneration?

T5ForConditionalGeneration Model to translate English to German

from transformers import T5TokenizerFast, T5ForConditionalGeneration

tokenizer = T5TokenizerFast.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

input_ids = tokenizer("translate English to German: the flowers are wonderful.", return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output : Die Blumen sind wunderbar.

Input Shape

input_ids.shape

Output : torch.Size([1, 11])

Output Shape

outputs.shape

Output : torch.Size([1, 7])

Save Pretrained model

!mkdir /content/test
model.save_pretrained('/content/test')

Load TFT5Model model from pretrained

from transformers import TFT5Model
t5model = TFT5Model.from_pretrained('/content/test',from_pt=True)
!mkdir /content/test/t5
t5model.save('/content/test/t5')

Convert TFT5Model to TFlite

import tensorflow as tf

saved_model_dir = '/content/test/t5'
!mkdir  /content/test/tflite
tflite_model_path = '/content/test/tflite/model.tflite'

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)

converter.experimental_new_converter = True
converter.experimental_new_quantizer = True
converter.experimental_new_dynamic_range_quantizer = True
converter.allow_custom_ops=True

converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
#print(tflite_model)
print(type(tflite_model))


# Save the model
with open(tflite_model_path, 'wb') as f:
    f.write(tflite_model)

Load The TFLite model

import numpy as np
import tensorflow as tf

tflite_model_path = '/content/test/tflite/model.tflite'
# Load the TFLite model and allocate tensors
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)

interpreter.resize_tensor_input(0, [1,5], strict=True)
interpreter.resize_tensor_input(1, [1,5], strict=True)
interpreter.resize_tensor_input(2, [1,5], strict=True)
interpreter.resize_tensor_input(3, [1,5], strict=True)
interpreter.allocate_tensors()

# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

input_shape = input_details[0]['shape']

#print the output
input_data = np.array(np.random.random_sample((input_shape)), dtype=np.int64)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

Get The Output Shape

print(output_data.shape)

Output : (1, 8, 5, 64)

Expected something like : (1, 7)

Can someone let me know where am I going wrong ?

The output shape of the tflite model is completely different from the T5ForConditionalGeneration model

Output : (1, 8, 5, 64)

Expected something like : (1, 7)

Solution

The issue has been fixed, the below code can be used.

The output should be keras_output.logits as in the code below

model = TFGPT2LMHeadModel.from_pretrained('gpt2') # or 'distilgpt2'
input = tf.keras.Input([ 64 ], batch_size=1, dtype=tf.int32)
keras_output = model(input, training=False)
model = tf.keras.Model(input, keras_output.logits)
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# For FP16 quantization:
# converter.optimizations = [tf.lite.Optimize.DEFAULT]
# converter.target_spec.supported_types = [tf.float16]

tflite_model = converter.convert()

open("model.tflite", "wb").write(tflite_model)