pythontensorflowkeraslstmrecurrent-neural-network

What does Keras do when the number of inputs to an LSTM layer is greater than or less than the number of LSTM cells in that layer?


Please see python code below, I put comments in the code where I felt emphasis on information is required.

import keras
import numpy

def build_model():
    model = keras.models.Sequential()
    model.add(keras.layers.LSTM(3, input_shape = (3, 1), activation = 'elu'))# Number of LSTM cells in this layer = 3.
    return model

def build_data():
    inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    outputs = [10, 11, 12, 13, 14, 15, 16, 17, 18]
    inputs = numpy.array(inputs)
    outputs = numpy.array(outputs)
    inputs = inputs.reshape(3, 3, 1)# Number of samples = 3, Number of input vectors in each sample  = 3, size of each input vector = 3.
    outputs = outputs.reshape(3, 3)# Number of target samples = 3, Number of outputs per target sample = 3.
    return inputs, outputs

def train():
    model = build_model()
    model.summary()
    model.compile(optimizer= 'adam', loss='mean_absolute_error', metrics=['accuracy'])
    x, y = build_data()
    model.fit(x, y, batch_size = 1, epochs = 4000)
    model.save("LSTM_testModel")

def apply():
    model = keras.models.load_model("LSTM_testModel")
    input = [[[7], [8], [9]]]
    input = numpy.array(input)
    print(model.predict(input))

def main():
    train()

main()

My understanding is that for each input sample there are 3 input vectors. Each input vector goes to an LSTM cell. i.e. For sample 1, input vector 1 goes to LSTM cell 1, input vector 2 goes to LSTM cell 2 and so on.

Looking at tutorials on the internet, I've seen that the number of LSTM cells is much greater than the number of input vectors e.g. 300 LSTM cells.

So say for example I have 3 input vectors per sample what input goes to the 297 remaining LSTM cells?

I tried compiling the model to have 2 LSTM cells and it still accepted the 3 input vectors per sample, although I had to change the target outputs in the training data to accommodate for this(change the dimensions) . So what happened to the third input vector of each sample...is it ignored?

Image taken from: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

I believe the above image shows that each input vector (of an arbitrary scenario) is mapped to a specific RNN cell. I may be misinterpreting it. Above image taken from the following URL: http://karpathy.github.io/2015/05/21/rnn-effectiveness/


Solution

  • I will try to answer some of your questions and then will consolidate the information provided in the comments for completeness, for the benefit of you as well as for the Community.

    As mentioned by Matias in the comments, irrespective of whether the Number of Inputs are more than or less than the Number of Units/Neurons, it will be connected like a Fully Connected Network, as shown below.

    LSTM Fully Connected Network

    To understand how RNN/LSTM work internally, let's assume we have

    Number of Input Features => 3 => F1, F2 and F3

    Number of Timesteps => 2 => 0 and 1

    Number of Hidden Layers => 1

    Number of Neurons in each Hidden Layer => 5

    Then what actually happens inside can be represented in the screenshots shown below:

    Understanding RNN/LSTM

    enter image description here

    You have also asked about words being assigned to LSTM Cell. Not sure which link you are referring to and whether it is correct or not but in simple terms (words in this screenshot actually will be replaced by Embedding Vectors), you can understand how LSTM handles the Text as shown in the screenshot below:

    enter image description here

    For more information, please refer Beautiful Explanation by OverLordGoldDragon and Daniel Moller.

    Hope this helps. Happy Learning!