deep-learningtheanolasagnetheano-cuda

Lasagne use image inputs as the initial hidden state of a LSTMLayer


I am doing a project on image captioning. I want to set a batch of image features with shape=(batch_size, 512) as the initial hidden state of a LSTMLayer in Lasagne (theano). The sequence input to the LSTMLayer is a batch of text sequence with shape=(batch_size, max_sequence_length, 512). I notice that LSTMLayer in lasagne has a hid_init parameter. Does anyone know how to use it for LSTMLayer in Lasagne? Do I need to implement a custom LSTMLayer by myself?


Solution

  • You dont need to set h_0 parameter, because h_0 uses c0 (see this enter link description here and write down connections from h0 to c0), so, you have to set only c0 parameter:

    decoder = LSTMLayer(l_word_embeddings,
                    num_units=LSTM_UNITS,
                    cell_init=your_image_features_layer_512_shape, #this is c0
                    mask_input=l_mask)
    

    You can set c0 as Layer or as other arrays (see lasagne LSTM doc enter link description here).

    Ready to discuss further.