Let,
Sample Size = 100
(X1,X2,...,X100)
Timesteps = 5
Input Feature = 10
Error Calculation:
How is the error calculation done when batch size = Sample size?
My understanding: I will insert X1,X2,X3,X4,X5
into LSTM and get an output after 5
timesteps, say Y1
.
Error E1 = X6 - Y1
. Similarly I will calculate E2,E3,...,E95
.
Actual Error = E1+E2+....+E95
. This will be used to update weights.
Is it correct?
Error for Batch:
Based on above understanding. If batch size = 10
. Then only E1,E2,E3,E4 and E5
will be used to calculate actual error. This will be used to update weights.
Batching in stateful LSTM:
Batches allows the model to allow parallelism where each entity in the batch calculates its error and then all the errors are summed. How does LSTM achieve parallelism within a batch if the LSTM is stateful (the hidden states of previous sequence are used to initialize the hidden states of next sequence, is this understanding of Satetful correct?) ?
References:
Understanding Keras LSTMs: Role of Batch-size and Statefulness
Batch size effect on LSTM: For batch size 1 the model takes 1 input at each timestep. For batch size n, model takes n input at each Timestep
Error Calculation part mentioned in question: It is the error calculation for batch size 1.
Error for a batch: Sum up the error for each element of the batch to get the final error
Batching in Stateful LSTM: My understanding of parallelism was incorrect. Parallelism is done within batch not across them.