machine-learningpytorchnlphuggingface-transformersbart

Why token embedding different from the embedding by the BartForConditionalGeneration model


Why both the embeddings are different even when i generate them using same BartForConditionalGenration model?

First embedding is generated by combining token embedding and positional embedding from

embed_pos = modelBART.model.encoder.embed_positions(input_ids.input_ids)
inputs_embeds = modelBART.model.encoder.embed_tokens(input_ids.input_ids)

The Second embedding by the model via

output = modelBART(input_ids.input_ids)
print("\n\n output: \n\n",output.encoder_last_hidden_state)

Shouldn't the embedding by first and second be same? What to do so that difference of the embedding from first and second be zero?


Solution

  • The first embeddings (input + position) are the first layer of the model. These embeddings are used to map tokens to vectors.

    The second set of embeddings (encoder_last_hidden_state) are the outputs of the final layer in the model's encoder.

    These embeddings are supposed to be different.