python tensorflow bert-language-model data-preprocessing tensorflow-hub

NotFoundError using BERT Preprocessing from TFHub

I'm trying to use the pre-trained BERT models on TensorFlow Hub to do some simple NLP. I'm on a 2021 MacBook Pro (Apple Silicon) with Python 3.9.13 and TensorFlow v2.9.2. However, preprocessing any amount of text returns a "NotFoundError" that I can't seem to resolve. The link to the preprocessing model is here: (https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3) and I have pasted my code/error messages below. Does anyone know why this is happening and how I can fix it? Thanks in advance.

Code

bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
print(bert_preprocess(["test"]))

Output

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
Cell In [42], line 3
      1 bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
      2 bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
----> 3 print(bert_preprocess(["test"]))

File ~/miniforge3/envs/tfenv/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     65 except Exception as e:  # pylint: disable=broad-except
     66   filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67   raise e.with_traceback(filtered_tb) from None
     68 finally:
     69   del filtered_tb

File ~/miniforge3/envs/tfenv/lib/python3.9/site-packages/tensorflow_hub/keras_layer.py:237, in KerasLayer.call(self, inputs, training)
    234   else:
    235     # Behave like BatchNormalization. (Dropout is different, b/181839368.)
    236     training = False
--> 237   result = smart_cond.smart_cond(training,
    238                                  lambda: f(training=True),
    239                                  lambda: f(training=False))
    241 # Unwrap dicts returned by signatures.
    242 if self._output_key:

File ~/miniforge3/envs/tfenv/lib/python3.9/site-packages/tensorflow_hub/keras_layer.py:239, in KerasLayer.call.<locals>.<lambda>()
...
     [[StatefulPartitionedCall/StatefulPartitionedCall/bert_pack_inputs/PartitionedCall/RaggedConcat/ArithmeticOptimizer/AddOpsRewrite_Leaf_0_add_2]] [Op:__inference_restored_function_body_209194]

Call arguments received by layer "keras_layer_6" (type KerasLayer):
  • inputs=["'test'"]
  • training=None

Solution

Update: While using BERT preprocessing from TFHub, Tensorflow and tensorflow_text versions should be same so please make sure that installed both versions are same. It happens because you're using latest version for tensorflow_text but you're using other versions for python and tensorflow but there is internal dependancy with versions for Tensorflow and tensorflow_text which should be same.

!pip install -U tensorflow
!pip install -U tensorflow-text
import tensorflow as tf
import tensorflow_text as text

# Or install with a specific Version
!pip install -U tensorflow==2.11.*
!pip install -U tensorflow-text==2.11.*
import tensorflow as tf
import tensorflow_text as text

I have executed below lines of code in Google Colab and It's working fine,

bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
print(bert_preprocess(["test"]))

Here is output:

{'input_type_ids': <tf.Tensor: shape=(1, 128), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32)>, 'input_mask': <tf.Tensor: shape=(1, 128), dtype=int32, numpy=
array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32)>, 'input_word_ids': <tf.Tensor: shape=(1, 128), dtype=int32, numpy=
array([[ 101, 3231,  102,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0]], dtype=int32)>}

I hope it will help you to resolve your issue, Thank You!