nlptorchtext

AttributeError: 'Field' object has no attribute 'vocab' preventing me to run the code


I have found this code and I wanna see what is the object that im printing in the last line. im new in field of nlp so please help me fix this code, because it gives AttributeError: 'Field' object has no attribute 'vocab'error. by the way I have found out that torchtext has been changed and the error is probably related to these changes, and the code probably was working before.

import spacy
from torchtext.legacy.data import Field
spacy_eng = spacy.load("en")
def tokenize_eng(text):
    return [tok.text for tok in spacy_eng.tokenizer(text)]

english = Field(
    tokenize=tokenize_eng, lower=True, init_token="<sos>", eos_token="<eos>"
)
print([english.vocab.stoi["<sos>"]])

Solution

  • You have to build the vocabulary for the english Field before you try to access it. You will need a dataset to build the vocabulary, which will be the dataset you are looking to build a model for. You can use english.build_vocab(...). Here are the docs for build_vocab.

    Also, if you would like to learn how to migrate what you are doing to the new version of torchtext, here is a good resource.