[SOLVED] Torchtext 0.7 shows Field is being deprecated. What is the alternative?

Torchtext 0.7 shows Field is being deprecated. What is the alternative?

Looks like the previous paradigm of declaring Fields, Examples and using BucketIterator is deprecated and will move to legacy in 0.8. However, I don't seem to be able to find an example of the new paradigm for custom datasets (as in, not the ones included in torch.datasets) that doesn't use Field. Can anyone point me at an up-to-date example?

Reference for deprecation:

https://github.com/pytorch/text/releases

Solution

It took me a little while to find the solution myself. The new paradigm is like so for prebuilt datasets:

from torchtext.experimental.datasets import AG_NEWS
train, test = AG_NEWS(ngrams=3)

or like so for custom built datasets:

from torch.utils.data import DataLoader
def collate_fn(batch):
    texts, labels = [], []
    for label, txt in batch:
        texts.append(txt)
        labels.append(label)
    return texts, labels
dataloader = DataLoader(train, batch_size=8, collate_fn=collate_fn)
for idx, (texts, labels) in enumerate(dataloader):
    print(idx, texts, labels)

I've copied the examples from the Source