kerastensorflow-litekeras-nlp

TFlite conversion not working for GPT2 model


TL;DR:

I am unable to convert the GPT2 model to tflite in my Colab notebook. It throws an error because the causal_lm.generate() function is no longer suitable (I think) to be wrapped around with a tf concrete function, as there are now .asnumpy() conversions happening during text generation postprocessing (and potentially other steps). Any idea how I can fix or circumvent this? It may be as simple as a version incompatibility..

Background details:

My notebook is based on this google codelab notebook that demonstrates exporting GPT2 to tflite. It runs front to back without issues. It is quite dated however, and does not run with GPU support. The specific imports and the versions of packages used in the notebook are:

!pip install -q git+https://github.com/keras-team/keras-nlp.git@google-io-2023 tensorflow-text==2.12
import numpy as np
import keras_nlp
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_text as tf_text
from tensorflow import keras
from tensorflow.lite.python import interpreter
import time
from google.colab import files
print(tf.__version__)
print(keras.__version__)
print(keras_nlp.__version__)
2.12.1
2.12.0
0.5.0

I have modified the import as below (copied from the notebook at the top) to enable GPU support.

!pip install keras_nlp
print(tf.__version__)
print(keras.__version__)
print(keras_nlp.__version__)
2.16.1
3.3.3
0.11.1

But now the tflite conversion does not work, throwing the error below. The error occurs while trying to Convert the generate() function from GPT2CausalLM. Even if I bypass creating the concrete function, it then throws an error while converting the model itself..

# The generate() function from GPT2CausalLM is the actual function that does the magic. So you will convert it now. First, you wrap the generate() function into a TensorFlow concrete function.
@tf.function
def generate(prompt, max_length):
    return gpt2_lm.generate(prompt, max_length)

concrete_func = generate.get_concrete_function(tf.TensorSpec([], tf.string), 100)

Error:

NotImplementedError                       Traceback (most recent call last)
[<ipython-input-10-d8866e8eac5d>](https://localhost:8080/#) in <cell line: 5>()
      3     return gpt2_lm.generate(prompt, max_length)
      4 
----> 5 concrete_func = generate.get_concrete_function(tf.TensorSpec([], tf.string), 100)

27 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor.py](https://localhost:8080/#) in __array__(***failed resolving arguments***)
    625   def __array__(self, dtype=None):
    626     del dtype
--> 627     raise NotImplementedError(
    628         f"Cannot convert a symbolic tf.Tensor ({self.name}) to a numpy array."
    629         f" This error may indicate that you're trying to pass a Tensor to"

NotImplementedError: in user code:

    File "<ipython-input-10-d8866e8eac5d>", line 3, in generate  *
        return gpt2_lm.generate(prompt, max_length)
    File "/usr/local/lib/python3.10/dist-packages/keras_nlp/src/models/causal_lm.py", line 371, in postprocess  *
        return self.preprocessor.generate_postprocess(x)
    File "/usr/local/lib/python3.10/dist-packages/keras_nlp/src/models/gpt2/gpt2_causal_lm_preprocessor.py", line 178, in generate_postprocess  *
        token_ids = ops.convert_to_numpy(token_ids)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/ops/core.py", line 512, in convert_to_numpy  **
        return backend.convert_to_numpy(x)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/core.py", line 131, in convert_to_numpy
        return np.asarray(x)

    NotImplementedError: Cannot convert a symbolic tf.Tensor (StatefulPartitionedCall:1) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

Also posted on the keras_nlp discussion forum, but there is not much activity there..


Solution

  • Incompatibility 1: The reason why the original notebook does not work with GPU is because it tries to use TF v 2.12.1, and since Google upgraded CUDA from 11.8 to 12.2, tensorflow versions < 2.15.0 simply do not recognise the GPU. [1]

    I tried adding !apt update && apt install cuda-11-8 to the beginning of the original Google codelab notebook, and it recognises the GPU now. however, when I try to !nvidia-smi, the terminal output still shows Cuda 12.2. So some wires are being crossed but the GPU becomes visible somehow..

    Incompatibility 2: Vent out: keras_nlp is really poor for maintenance and backward compatibility. If it wasnt for tf-lite, I would not go anywhere near keras tbh. I found this link, which mentions:

    You should see the example in #998. The generator.generate method is not JITable (you can't apply the tf.function decorator to it). So, you will have to use the generate_function instead which accepts a dictionary of token_ids and padding_mask generated by the tokenizer. You will then need to run the tokenizer outside the TF graph. Something like this:

    While I do not fully understand this, I tried with a lot of keras_nlp and tensorflow version combinations and nothing works aparts from keras_nlp==0.5.0, and tensorflow==12.1. I understand @tf.concrete decorator does not like any numpy contents, and they have made changes in the background for Task and CausalLM classes, whereby everything is converted to .numpy() left and right with newer versions.