I recently came across a warning in Tensorflow that caused some head-scratching and took a while to fix. Since I didn't find a solution online, I wanted to share.
I am building a transformer (encoder-decoder) architecture. But my training results are really bad. The transformer always gives the same answer no matter the input, although the training accuracy looks very good (above 0.95). On top of that, I get this warning:
WARNING:tensorflow:Gradients do not exist for variables ['embedding/embeddings:0'] when minimizing the loss. If you're using 'model.compile()', did you forget to provide a 'loss' argument?
Both the encoder and decoder have
keras.Embedding
layerkeras_nlp.PositionEmbedding
layer.Here is the encoder code:
encoder_inputs = Input(shape=(encoder_inputs_size,), name="encoder_inputs")
token_embeddings = Embedding(input_dim=vocabulary_size, output_dim=embedding_dim) (encoder_inputs)
position_embeddings = PositionEmbedding(sequence_length=encoder_inputs_size)(token_embeddings)
encoder_outputs = TransformerEncoder(intermediate_dim=intermediate_dim, num_heads=num_heads)(inputs=position_embeddings)
encoder = Model(encoder_inputs, encoder_outputs, name="encoder")
There is keras_nlp.TokenAndPositionEmbedding
that combines two embeddings into a single layer and using it makes the problem disappear. But since I want to use other forms of embedding, like patch embedding for image processing, I can't use this combined layer.
The solution is that unlike regular keras layers that simply pass the information through when connecting them, the token embedding and the positional embedding must be manually added up, so the following code fixes the problem:
encoder_inputs = Input(shape=(encoder_inputs_size,), name="encoder_inputs")
token_embeddings = Embedding(input_dim=vocabulary_size, output_dim=embedding_dim)(encoder_inputs)
position_embeddings = PositionEmbedding(sequence_length=encoder_inputs_size)(token_embeddings)
# this line adds up the embeddings and fixes the problem
embeddings = token_embeddings + position_embeddings
encoder_outputs = TransformerEncoder(intermediate_dim=intermediate_dim, num_heads=num_heads)(inputs=embeddings)
encoder = Model(encoder_inputs, encoder_outputs, name="encoder")