deep-learningnlpbert-language-modelword-embeddingtensorflow-hub

Is there a faster way to convert sentences to TFHUB embeddings?


So I am involved in a project that involves feeding a combination of text embeddings and image vectors into a DNN to arrive at the result. Now for the word embedding part, I am using TFHUB's Electra while for the image part I am using a NASNet Mobile network.

However, the issue I am facing is that while running the word embedding part, using the code shown below, the code just keeps running nonstop. It has been over 2 hours now and my training dataset has just 14900 rows of tweets.

Note - The input to the function is just a list of 14900 tweets.

tfhub_handle_encoder="https://tfhub.dev/google/electra_small/2" 
tfhub_handle_preprocess="https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

# Load Models 
bert_preprocess_model = hub.KerasLayer(tfhub_handle_preprocess) 
bert_model = hub.KerasLayer(tfhub_handle_encoder)

def get_text_embedding(text):

  preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='Preprocessor')   
  encoder_inputs = preprocessing_layer(text)   encoder = 
  hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='Embeddings')   outputs = 
  encoder(encoder_inputs)   text_repr = outputs['pooled_output']   text_repr = 
  tf.keras.layers.Dense(128, activation='relu')(text_repr)

  return text_repr

text_repr = get_text_embedding(train_text)

Is there a faster way to get text representation using these models?

Thanks for the help!


Solution

  • The operation performed in the code is quadratic in its nature. While I managed to execute your snippet with 10000 samples within a few minutes, a 14900 long input ran out of memory on 32GB RAM runtime. Is it possible that your runtime is experiencing swapping?

    It is not clear what is the snippet trying to achieve. Do you intend to train model? In such case you can define the text_input as an Input layer and use fit to train. Here is an example: https://www.tensorflow.org/text/tutorials/classify_text_with_bert#define_your_model