pythonkerasdeep-learningloss-functionmultipleoutputs

Calculation of val_loss in Keras' multiple output


I have a question about how to calculate val_loss in Keras' multiple output. Here is an excerpt of my code.

nBatchSize  = 200
nTimeSteps  = 1
nInDims     = 17
nHiddenDims = 10
nFinalDims  = 10
nOutNum     = 24
nTraLen     = 300
nMaxEP      = 20
nValLen     = 50
sHisCSV     = "history.csv"

oModel = Sequential()
oModel.add(Input(batch_input_shape=(nBatchSize, nTimeSteps, nInDims)))
oModel.add(LSTM(nHiddenDims, return_sequences=True,  stateful=True))
oModel.add(LSTM(nHiddenDims, return_sequences=False, stateful=True))
oModel.add(Dense(nFinalDims, activation="relu")
oModel.add(Dense(nOutNum,    activation="linear")
oModel.compile(loss="mse", optimizer=Nadam())

oModel.reset_states()
oHis = oModel.fit_generator(oDataGen, steps_per_epoch=nTraLen,
epochs=nMaxEP, shuffle=False,
validation_data=oDataGen, validation_steps=nValLen,
callbacks=[CSVLogger(sHisCSV, append=True)])

# number of cols is nOutNum(=24), number of rows is len(oEvaGen)
oPredDF = pd.DataFrame(oPredModel.predict_generator(oEvaGen, steps=len(oEvaGen))

# GTDF is a dataframe of Ground Truth
nRMSE   = np.sqrt(np.nanmean(np.array(np.power(oPredDF - oGTDF, 2))))

In history.csv, val_loss is written and it is written as 3317.36. The RMSE calculated from the prediction result is 66.4.

By my understanding my Keras specification, val_loss written in history.csv is the mean MSE of 24 outputs. Assuming that it is correct, RMSE can be computed as 11.76 (= sqrt(3317.36/24)) from history.csv, which is quite different from value of nRMSE (=66.4) Just as sqrt(3317.36) = 57.6 is rather close to it.

Is my understanding of Keras specification on val_loss incorrect?


Solution

  • Your first assumption is correct, but the further derivation went wrong a bit.
    As the MSE is the mean of the model's output's squared errors, as you can see in the Keras documentation:

    mean_squared_error
    keras.losses.mean_squared_error(y_true, y_pred)

    and in the Keras source code:

    K.mean(K.square(y_pred - y_true), axis=-1)
    

    thus the RMSE is the square root of this value:

    K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
    

    What you wrote would be the root square of the square error i.e. RSE.

    So from your actual example:
    RSE can be computed as sqrt(3317.36/24) = 11.76
    RMSE can be computed as sqrt(3317.36) = 57.6

    Thus the RMSE (and nRMSE) values provided by the model are correct.