pythonneural-networkpytorchrecurrent-neural-networkembedding

How to embed Sequence of Sentences in RNN?


I am trying to make a RNN model (in Pytorch), that takes couple of sentences and then classifies it to be either Class 0 or Class 1.

For the sake of this question let's assume that the max_len of the sentence is 4 and max_amount of time steps is 5. Thus, each datapoint is on the form (0 is a value that used for padding padded value):

    x[1] = [
    # Input features at timestep 1
    [1, 48, 91, 0],
    # Input features at timestep 2
    [20, 5, 17, 32],
    # Input features at timestep 3
    [12, 18, 0, 0],
    # Input features at timestep 4
    [0, 0, 0, 0],
    # Input features at timestep 5
    [0, 0, 0, 0]
    ]
    y[1] = [1]

When I have just one sentence per target: I simply pass each word to the embedding layer and then to the LSTM or GRU, but I am a bit stuck on what to do when I have a sequence of sentences per target?

How do I build an embedding that can handle sentences?


Solution

  • The simplest way is to use 2 kinds of LSTM.

    Prepare the toy dataset

    xi = [
    # Input features at timestep 1
    [1, 48, 91, 0],
    # Input features at timestep 2
    [20, 5, 17, 32],
    # Input features at timestep 3
    [12, 18, 0, 0],
    # Input features at timestep 4
    [0, 0, 0, 0],
    # Input features at timestep 5
    [0, 0, 0, 0]
    ]
    yi = 1
    
    x = torch.tensor([xi, xi])
    y = torch.tensor([yi, yi])
    
    print(x.shape)
    # torch.Size([2, 5, 4])
    
    print(y.shape)
    # torch.Size([2])
    

    Then, x is the batch of inputs. Here batch_size = 2.

    Embed the input

    vocab_size = 1000
    embed_size = 100
    hidden_size = 200
    embed = nn.Embedding(vocab_size, embed_size)
    
    # shape [2, 5, 4, 100]
    x = embed(x)
    

    The first word-LSTM is to encode each sequence into a vector

    # convert x into a batch of sequences
    # Reshape into [2, 20, 100]
    x = x.view(bs * 5, 4, 100)
    
    wlstm = nn.LSTM(embed_size, hidden_size, batch_first=True)
    # get the only final hidden state of each sequence
    
    _, (hn, _) = wlstm(x)
    
    # hn shape [1, 10, 200]
    
    # get the output of final layer
    hn = hn[0] # [10, 200]
    

    The second seq-LSTM is to encode sequences into a single vector

    # Reshape hn into [bs, num_seq, hidden_size]
    hn = hn.view(2, 5, 200)
    
    # Pass to another LSTM and get the final state hn
    slstm = nn.LSTM(hidden_size, hidden_size, batch_first=True)
    _, (hn, _) = slstm(hn) # [1, 2, 200]
    
    # Similarly, get the hidden state of the last layer
    hn = hn[0] # [2, 200]
    

    Add some classification layers

    pred_linear = nn.Linear(hidden_size, 1)
    
    # [2, 1]
    output = torch.sigmoid(pred_linear(hn))