[SOLVED] Warning: Gradients do not exist for variables

Warning: Gradients do not exist for variables

I recently came across a warning in Tensorflow that caused some head-scratching and took a while to fix. Since I didn't find a solution online, I wanted to share.

I am building a transformer (encoder-decoder) architecture. But my training results are really bad. The transformer always gives the same answer no matter the input, although the training accuracy looks very good (above 0.95). On top of that, I get this warning:

WARNING:tensorflow:Gradients do not exist for variables ['embedding/embeddings:0'] when minimizing the loss. If you're using 'model.compile()', did you forget to provide a 'loss' argument?

Both the encoder and decoder have

a token embedding realized through a keras.Embedding layer
a positional embedding, realized through a keras_nlp.PositionEmbedding layer.

Here is the encoder code:

encoder_inputs = Input(shape=(encoder_inputs_size,), name="encoder_inputs")
token_embeddings = Embedding(input_dim=vocabulary_size, output_dim=embedding_dim)   (encoder_inputs)
position_embeddings = PositionEmbedding(sequence_length=encoder_inputs_size)(token_embeddings)
encoder_outputs = TransformerEncoder(intermediate_dim=intermediate_dim, num_heads=num_heads)(inputs=position_embeddings)
encoder = Model(encoder_inputs, encoder_outputs, name="encoder")

There is keras_nlp.TokenAndPositionEmbedding that combines two embeddings into a single layer and using it makes the problem disappear. But since I want to use other forms of embedding, like patch embedding for image processing, I can't use this combined layer.

Solution

The solution is that unlike regular keras layers that simply pass the information through when connecting them, the token embedding and the positional embedding must be manually added up, so the following code fixes the problem:

encoder_inputs = Input(shape=(encoder_inputs_size,), name="encoder_inputs")
token_embeddings = Embedding(input_dim=vocabulary_size, output_dim=embedding_dim)(encoder_inputs)
position_embeddings = PositionEmbedding(sequence_length=encoder_inputs_size)(token_embeddings)

# this line adds up the embeddings and fixes the problem
embeddings = token_embeddings + position_embeddings

encoder_outputs = TransformerEncoder(intermediate_dim=intermediate_dim, num_heads=num_heads)(inputs=embeddings)
encoder = Model(encoder_inputs, encoder_outputs, name="encoder")