pytorchstanford-nlp

How do I get word indexes for Glove embeddings in pytorch


I am trying to use glove embeddings in pytorch to use in a model. I have the following code:

from torchtext.vocab import GloVe
import torch.nn
glove= GloVe()
my_embeddings = torch.nn.Embedding.from_pretrained(glove.vectors,freeze=True) 

However, I don't understand how I can get the embeddings for a specific word from this. my_embeddings only take a pytorch index rather than text. I can just use:

from torchtext.data import get_tokenizer
tokenizer = get_tokenizer("basic_english")
glove.get_vecs_by_tokens(tokenizer("Hello, How are you?"))

But then I am confused why I need to use torch.nn.Embedding at all as most tutorials suggest I do?


Solution

  • So I believe this is done using glove.stoi:

    sentence = "Hello, How are you?"
    tokenized_sentence = tokenizer(sentence)
    torch_tensor_first_word = torch.tensor(glove.stoi[tokenized_sentence[0]], dtype=torch.long)
    embeddings_for_first_word = my_embeddings(torch_tensor_first_word)