pytorchtorchtext

Torchtext 0.7 shows Field is being deprecated. What is the alternative?


Looks like the previous paradigm of declaring Fields, Examples and using BucketIterator is deprecated and will move to legacy in 0.8. However, I don't seem to be able to find an example of the new paradigm for custom datasets (as in, not the ones included in torch.datasets) that doesn't use Field. Can anyone point me at an up-to-date example?

Reference for deprecation:

https://github.com/pytorch/text/releases


Solution

  • It took me a little while to find the solution myself. The new paradigm is like so for prebuilt datasets:

    from torchtext.experimental.datasets import AG_NEWS
    train, test = AG_NEWS(ngrams=3)
    

    or like so for custom built datasets:

    from torch.utils.data import DataLoader
    def collate_fn(batch):
        texts, labels = [], []
        for label, txt in batch:
            texts.append(txt)
            labels.append(label)
        return texts, labels
    dataloader = DataLoader(train, batch_size=8, collate_fn=collate_fn)
    for idx, (texts, labels) in enumerate(dataloader):
        print(idx, texts, labels)
    

    I've copied the examples from the Source