I am reading about LSTM in deep learning. From Prof. Andrew Ng course LSTM three inputs for each LSTM cell.
Inputs are cell state from previous cell i.e., "c" superscript (t-1) and output of LSTM cell "a" super script (t-1) and input x super script (t).
Outputs for LSTM cell is current cell state i.e., "c" superscript (t) and output of LSTM cell "a" super script (t).
How do we pass initialize parameters for LSTM cell in keras for inputs mentioned above?
Thanks for the help. Simple example will be helpful.
By default you don't have to specify an initial state for the LSTM layer in keras.
If you want to specify the initial state you can do it like this LSTM(units)(input, initial_state)
, where the initial_state
is a list of tensors [hidden_state, cell_State]
. The hidden_state
and cell_state
is respectively by your notation "a" super script (t-1) and "c" superscript (t-1). There is one hidden and one cell state for each event, hence at training each shape should be(batch_size, units)
.
Se below for a minimal working example for how to do this in tf.keras
(should be the same in keras
, but haven't tested the code)
from tensorflow import keras
import numpy as np
n_features=3
n_timelag=10
n_pred=1
batch_size=32
lstm_size=30
# make initial state
single_hidden_state=np.random.random(lstm_size)
single_cell_state=np.random.random(lstm_size)
# clone for each batch
hidden_state=np.tile(single_hidden_state,(batch_size,1))
cell_state=np.tile(single_cell_state,(batch_size,1))
# numpy to tensorflow constant
initial_state=[keras.backend.constant(hidden_state),keras.backend.constant(cell_state)]
# create training data
X=np.random.random((batch_size,n_timelag,n_features))
Y=np.random.random((batch_size,n_pred))
# create network
inp=keras.Input((n_timelag,n_features))
lstm_l1=keras.layers.LSTM(lstm_size)(inp, initial_state=initial_state)
pred = keras.layers.Dense(n_pred)(lstm_l1)
# create model
model = keras.models.Model(inputs=inp, outputs=pred)
model.compile(loss='mse', optimizer='adam')
# train model
model.fit(X,Y)
For more about how to handle LSTM initial state and sequence to sequence in keras see this link