I'm trying to solve the following problem:
Therefore, my data is shaped: (number of device recordings, 3000, 4)
.
I'm trying produce a vector of length 3000 where each data point of is one of 3 labels (y1, y2, y3), so my desired output dim is (number of device recording, 3000, 1). I have labeled data for training.
I'm trying to use an LSTM model for this, as 'classification as I move along time series data' seems like a RNN type of problem.
I have my network set up like this:
model = Sequential()
model.add(LSTM(3, input_shape=(3000, 4), return_sequences=True))
model.add(LSTM(3, activation = 'softmax', return_sequences=True))
model.summary()
and the summary looks like this:
Model: "sequential_23"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_29 (LSTM) (None, 3000, 3) 96
_________________________________________________________________
lstm_30 (LSTM) (None, 3000, 3) 84
=================================================================
Total params: 180
Trainable params: 180
Non-trainable params: 0
_________________________________________________________________
All looks good and well in the output space, as I can use the result from each unit to determine which of my three categories belongs to that particular time step (I think).
But I only have 180 trainable parameters, so I'm guessing that I am doing something horribly wrong.
Questions:
In a simplistic viewpoint, you can consider a LSTM
layer as an augmented Dense
layer with a memory (hence enabling efficient processing of sequences). So the concept of "units" is also the same for both: the number of neurons or feature units of these layers, or in other words, the number of distinctive features these layers can extract from the input.
Therefore, when you specify the number of units to 3 for the LSTM
layer, more or less it means that this layer can only extract 3 distinctive features from the input timesteps (note that the number of units has nothing to do with the length of input sequence, i.e. the entire input sequence will be processed by the LSTM
layer no matter what the number of units or the length of input sequence is).
Usually, this might be sub-optimal (though, it really depends on the difficulty of the specific problem and dataset you are working on; i.e. maybe 3 units might be enough for your problem/dataset, and you should experiment to find out). Therefore, often a higher number is chosen for the number of units (common choices: 32, 64, 128, 256), and also the classification task is delegated to a dedicated Dense
layer (or sometimes called "softmax layer") at the top of the model.
For example, considering the description of your problem, a model with 3 stacked LSTM
layers and a Dense
classification layer at the top might look like this:
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(3000, 4)))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(Dense(3, activation = 'softmax'))