[SOLVED] Why Keras MAPE metric is exploding during training but MSE loss is not?

Why Keras MAPE metric is exploding during training but MSE loss is not?

I implemented an LSTM with attention in Keras to reproduce this paper. The strange behavior is simple: I have an MSE loss function and an MAPE and MAE as metrics. During training the MAPE is exploding but the MSE and MAE seem to train normally:

Epoch 1/20
275/275 [==============================] - 191s 693ms/step - loss: 0.1005 - mape: 15794.8682 - mae: 0.2382 - val_loss: 0.0334 - val_mape: 24.9470 - val_mae: 0.1607
Epoch 2/20
275/275 [==============================] - 184s 669ms/step - loss: 0.0099 - mape: 6385.5464 - mae: 0.0725 - val_loss: 0.0078 - val_mape: 11.3268 - val_mae: 0.0803
Epoch 3/20
275/275 [==============================] - 186s 676ms/step - loss: 0.0025 - mape: 5909.3735 - mae: 0.0369 - val_loss: 0.0131 - val_mape: 14.9827 - val_mae: 0.1061
Epoch 4/20
275/275 [==============================] - 187s 678ms/step - loss: 0.0015 - mape: 4746.2788 - mae: 0.0278 - val_loss: 0.0142 - val_mape: 16.1894 - val_mae: 0.1122
Epoch 5/20
 30/275 [==>...........................] - ETA: 2:38 - loss: 0.0012 - mape: 9.3647 - mae: 0.0246

The MAPE is exploding at the end of each epoch. What could be the cause of this specific behavior?

The MAPE is still decreasing with each epoch so is this not really an issue since it is not hindering the training process?

Solution

Your loss and MAPE are decreasing so it sounds good. But if you fear the high values in MAPE you can tell if there is a Y value near zero. Because MAPE is a percentage error.

MAPE results can be misleading. From Wikipedia:

Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application, and there are many studies on shortcomings and misleading results from MAPE.

It cannot be used if there are zero values (which sometimes happens for example in demand data) because there would be a division by zero.

For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.

MAPE puts a heavier penalty on negative errors, than on positive errors.

To overcome these issues with MAPE, there are some other measures proposed in literature:

Mean Absolute Scaled Error (MASE)

Symmetric Mean Absolute Percentage Error (sMAPE)

Mean Directional Accuracy (MDA)

Mean Arctangent Absolute Percentage Error (MAAPE)