keraslstmlstm-stateful

Providing inputs to LSTM cell in keras API


I am reading about LSTM in deep learning. From Prof. Andrew Ng course LSTM three inputs for each LSTM cell.

Inputs are cell state from previous cell i.e., "c" superscript (t-1) and output of LSTM cell "a" super script (t-1) and input x super script (t).

Outputs for LSTM cell is current cell state i.e., "c" superscript (t) and output of LSTM cell "a" super script (t).

How do we pass initialize parameters for LSTM cell in keras for inputs mentioned above?

Thanks for the help. Simple example will be helpful.


Solution

  • By default you don't have to specify an initial state for the LSTM layer in keras.

    If you want to specify the initial state you can do it like this LSTM(units)(input, initial_state), where the initial_state is a list of tensors [hidden_state, cell_State]. The hidden_state and cell_state is respectively by your notation "a" super script (t-1) and "c" superscript (t-1). There is one hidden and one cell state for each event, hence at training each shape should be(batch_size, units).

    Se below for a minimal working example for how to do this in tf.keras (should be the same in keras, but haven't tested the code)

    from tensorflow import keras 
    import numpy as np
    
    n_features=3
    n_timelag=10
    n_pred=1
    batch_size=32
    lstm_size=30
    
    # make initial state
    single_hidden_state=np.random.random(lstm_size)
    single_cell_state=np.random.random(lstm_size)
    # clone for each batch
    hidden_state=np.tile(single_hidden_state,(batch_size,1))
    cell_state=np.tile(single_cell_state,(batch_size,1))
    # numpy to tensorflow constant
    initial_state=[keras.backend.constant(hidden_state),keras.backend.constant(cell_state)]
    
    # create training data
    X=np.random.random((batch_size,n_timelag,n_features))
    Y=np.random.random((batch_size,n_pred))
    
    # create network
    inp=keras.Input((n_timelag,n_features))
    lstm_l1=keras.layers.LSTM(lstm_size)(inp, initial_state=initial_state)
    pred = keras.layers.Dense(n_pred)(lstm_l1)
    
    # create model
    model = keras.models.Model(inputs=inp, outputs=pred)
    model.compile(loss='mse', optimizer='adam')
    
    # train model
    model.fit(X,Y)
    

    For more about how to handle LSTM initial state and sequence to sequence in keras see this link