How N_u
units of LSTM
works on a data of N_x
length? I know that there are many similar questions asked before but the answers are full of contradictions and confusions. Therefore I am trying to clear my doubts by asking specific questions. I am following the simple blog here:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Q0) Is keras
implementation consistent with the above blog?
Please consider the following code.
import tensorflow as tf
N_u,N_x=1,1
model = tf.keras.Sequential([
tf.keras.layers.LSTM(N_u, stateful=True, batch_input_shape=(32, 1, N_x))
])
model.summary()
For simplicity, my input data here is just a scalar and I have one time step to keep things simple. The output shape is (32,1)
. No. of parameter is 12.
Q1) I have one LSTM
unit or cell, right? The following represent a cell, right?
I understand from the picture that there would be 12 parameters : forget gate=2 weights+1 bias; input_gate=2*(2 weights+1 bias); output gate=(2 weights+1 bias). So everything is fine up to this point.
Q2) Now let us set N_u,N_x=1,2
. I expect the same cell will be applied to the two elements of x
. But I found that the total number of parameters now is 16! Why? Is it because I get 4 additional weight parameters corresponding to the LSTM
connection between the x_2
and the LSTM
unit?
Q3) Now let us set N_u,N_x=2,1
. I have now two units of LSTM
. My understanding was the two cells will operate parallelly on the same data (a scalar number in this case). Are these two units completely independent or do they influence each other? I expected the parameters number would be 2*12=24, but I in reality got 32 instead. Why 32?
Q4) If I set N_u,N_x=2,2
, number of parameter is 40. I think I can get it if I understand the above two points.
Q5) Finally, is there a documentation/paper which the keras implementation is based on?
The nuber of parameters can be computed with this formula:
LSTM parameter number = 4 × ((x + h) × h + h)
where x is the dimension of the input vector and h is the size of the output space.
See this link for an explanation: https://www.kaggle.com/code/kmkarakaya/lstm-understanding-the-number-of-parameters
N_u=1, N_x=1, means the ouput space size is 1, the input space size is 1, so P=12
N_u=1, N_x=2, means you have changed the input space size (x) to 2, so using the formula you get 16.
N_u=2, Nx_1, means you have doubled the output space. Again using the formula you get 32
The formula holds also for N_u=2, N_x=2.
I did not find a paper used as a reference for LSTM Keras implementation, but maybe this source code explanation could be of help: https://blog.softmaxdata.com/keras-lstm/
Please note the source code link is not working. Use this instead: https://github.com/keras-team/keras/blob/master/keras/src/layers/rnn/lstm.py