nlppytorchtorchtext

how to pad a text after build the vocab in pytorch


I used torchtext vocab to convert the text to index, but which function should I use to make all the index list be the same length before I send them to the net?

For example I have 2 texts:

I am a good man
I would like a coffee please

After vocab:

[1, 3, 2, 5, 7]
[1, 9, 6, 2, 4, 8]

And what I want is:

[1, 3, 2, 5, 7, 0]
[1, 9, 6, 2, 4, 8]

Solution

  • It is easy to understand by looking at the following example.

    Code:

    import torch
    
    v = [
            [0,2],
            [0,1,2],
            [3,3,3,3]
    ]
    
    torch.nn.utils.rnn.pad_sequence([torch.tensor(p) for p in v], batch_first=True)
    

    Result:

    tensor([[0, 2, 0, 0],
            [0, 1, 2, 0],
            [3, 3, 3, 3]])