I'm training a Keras model with a custom function, which I have already teste successfully before. Recently, I'm training it with a new dataset and I've got a strange result: The model trains fine but the val_loss
gives nan
.
Here is the loss:
def Loss(y_true,y_pred):
y_pred = relu(y_pred)
z = k.maximum(y_true, y_pred)
y_pred_negativo = Lambda(lambda x: -x)(y_pred)
w = k.abs(add([y_true, y_pred_negativo]))
if k.sum(z) == 0:
error = 0
elif k.sum(y_true) == 0 and k.sum(z) != 0:
error = 100
elif k.sum(y_true) == 0 and k.sum(z) == 0:
error = 0
else:
error = (k.sum(w)/k.sum(z))*100
return error
I have tried many things:
Someone told me that it could be a problem with CUDA installation, but I'm not sure.
Any idea about what is the problem or how I can diagnosis it?
The problem turned out to be division per zero, but the reason why it was taking place was a little tricky. As you can see, the above definition has some conditionals which were supposed to preclude division per zero. However, they were written to handle NumPy objects and not tensors, which are the objects passed by the Keras methods. Therefore, they were never taking place, and division per zero was happening very often.
In order to fix it, I had to rewrite the Loss in terms of Keras conditionals - remind, avoiding to mix pure Keras with tf.keras - just as I've posted here. Any further comment is more than welcomed!