This is apparently the code for seq2seq model with embedding that i wrote
encoder_inputs = Input(shape=(MAX_LEN, ), dtype='int32',)
encoder_embedding = embed_layer(encoder_inputs)
encoder_LSTM = LSTM(HIDDEN_DIM, return_state=True)
encoder_outputs, state_h, state_c = encoder_LSTM(encoder_embedding)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(MAX_LEN, ))
decoder_embedding = embed_layer(decoder_inputs)
decoder_LSTM = LSTM(HIDDEN_DIM, return_state=True, return_sequences=True)
decoder_outputs, _, _ = decoder_LSTM(
decoder_embedding, initial_state=encoder_states)
outputs = TimeDistributed(
Dense(VOCAB_SIZE, activation='softmax'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], outputs)
# defining inference model
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(None,))
decoder_state_input_c = Input(shape=(None,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_LSTM(
decoder_embedding, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
outputs = TimeDistributed(
Dense(VOCAB_SIZE, activation='softmax'))(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs, [outputs] + decoder_states)
return model, encoder_model, decoder_model
we are using inference mode for predictions particularly encoder and decoder model, but i am not sure where the training is happening for the encoder and decoder?
Code is build upon: https://keras.io/examples/lstm_seq2seq/,
with added embedding layer and Time Distributed dense layer.
for more info on issue: github repo
Encoder and decoder are trained simultaneously, or more precisely the model that is composed of these two is trained which in turn trains both of them (this is not GAN where you need some fancy training cycle)
If you look closely in the provided link, there is a section where the model is trained.
# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
If you look more closely, the "new" model that you are defining after fit consists of layers that have already been trained in the previous step. i.e Model(encoder_inputs, encoder_states) both encoder_inputs and encoder_states were used during the initial training, you are just repackaging them.