I am training a LSTM using tf.learn in tensorflow. I have split the data into training (90%) and validation (10%) for this purpose. As I understand, a model usually fits better training data than validation data but I am getting the opposite results. Loss is lower and accuracy is higher for validation set.
As I have read in other answers, this can be because of dropout not being applied during validation. However, when I remove dropout from my LSTM architecture I validation loss is still lower than training loss (difference is smaller though).
Also, the loss shown at the end of each epoch is not an average of the losses over each batch (like when using Keras). It is the loss for he last batch. I also thought this could be a reason for my results but turned out it was not.
Training samples: 783
Validation samples: 87
--
Training Step: 4 | total loss: 1.08214 | time: 1.327s
| Adam | epoch: 001 | loss: 1.08214 - acc: 0.7549 | val_loss: 0.53043 - val_acc: 0.9885 -- iter: 783/783
--
Training Step: 8 | total loss: 0.41462 | time: 1.117s
| Adam | epoch: 002 | loss: 0.41462 - acc: 0.9759 | val_loss: 0.17027 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 12 | total loss: 0.15111 | time: 1.124s
| Adam | epoch: 003 | loss: 0.15111 - acc: 0.9984 | val_loss: 0.07488 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 16 | total loss: 0.10145 | time: 1.114s
| Adam | epoch: 004 | loss: 0.10145 - acc: 0.9950 | val_loss: 0.04173 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 20 | total loss: 0.26568 | time: 1.124s
| Adam | epoch: 005 | loss: 0.26568 - acc: 0.9615 | val_loss: 0.03077 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 24 | total loss: 0.11023 | time: 1.129s
| Adam | epoch: 006 | loss: 0.11023 - acc: 0.9863 | val_loss: 0.02607 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 28 | total loss: 0.07059 | time: 1.141s
| Adam | epoch: 007 | loss: 0.07059 - acc: 0.9934 | val_loss: 0.01882 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 32 | total loss: 0.03571 | time: 1.122s
| Adam | epoch: 008 | loss: 0.03571 - acc: 0.9977 | val_loss: 0.01524 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 36 | total loss: 0.05084 | time: 1.120s
| Adam | epoch: 009 | loss: 0.05084 - acc: 0.9948 | val_loss: 0.01384 - val_acc: 1.0000 -- iter: 783/783
--
Training Step: 40 | total loss: 0.22283 | time: 1.132s
| Adam | epoch: 010 | loss: 0.22283 - acc: 0.9714 | val_loss: 0.01227 - val_acc: 1.0000 -- iter: 783/783
The network used (note that dropout has been commented out):
def get_network_wide(frames, input_size, num_classes):
"""Create a one-layer LSTM"""
net = tflearn.input_data(shape=[None, frames, input_size])
#net = tflearn.lstm(net, 256, dropout=0.2)
net = tflearn.fully_connected(net, num_classes, activation='softmax')
net = tflearn.regression(net, optimizer='adam',
loss='categorical_crossentropy',metric='default', name='output1')
return net
This is not necessarily a problematic phenomenon in essence.
It can take place due to many reasons, as stated below.
TLosses [0.60,0.59,...0.3 (loss on TS at the end of the epoch)]
-> VLosses [0.3,0.29,0.35] (because the model has already trained a lot as compared to the start of the epoch.
However, your training set is very small, as well as your validation set. Such a split (90% on train and 10% on validation/development) should be made only when there is very much data (tens of thousands or even hundreds of thousands in this case). On the other hand, your entire training set(train + val) has less than 1000 samples. You need much more data, as LSTMs are well-known for requiring a lot of training data.
Then, you could try using KFoldCrossValidation or even StratifiedKFoldCrossValidation. In this way, you would ensure that you have not manually created a very 'easy' validation set, on which you always test; instead, you can have k-folds, out of which k-1 are used for training and 1 for validation; in this way you may avoid situation (1).
The answer lies in the data. Prepare it carefully, as the results depend significantly on the quality of data (preprocessing data, cleaning data, creating relevant training/validation/test sets).