tensorflowkerasdeep-learningneural-networkkeras-tuner

got nan in keras tuner but it works when I train it


I trained my network several times and I already got some results. Then I found out about the Keras tuner and wanted to find the best hyperparameters with it. but the loss in tuner always becomes nan ( it won't get nan if you train it regularly). I'm using MobileNetv3Small as the backbone and wanted to find optimal layers numbers and units. here is my model build:

def build_model(hp):
model = keras.Sequential()
model.add(base)
# Tune the number of layers.
if hp.Boolean('globalMax'):
  model.add(layers.GlobalMaxPool2D())
model.add(layers.Flatten())
for i in range(hp.Int("num_layers", 1, 3)):
    model.add(
        layers.Dense(
            # Tune number of units separately.
            units=hp.Int(f"units_{i}", min_value=3, max_value=12, step=1),
        )
    )
if hp.Boolean("dropout"):
    model.add(layers.Dropout(rate=0.1))
model.add(layers.Dense(3))
model.compile(loss=mae, optimizer='sgd',metrics=[mae])
return model

and I'm using

 `tuner = kt.RandomSearch(
    hypermodel=build_model,
    objective="val_loss",
    executions_per_trial=2,
    overwrite=True
)`

and this is the output: Best val_loss So Far: nan Total elapsed time: 00h 02m 28s INFO:tensorflow:Oracle triggered exit

what is the problem? I already checked any other optimizer ( however it works with .fit perfectly), tried removing dropout and even normalization


Solution

  • So I finally found the problem. It happened because keras_tuner is just trying to find some validation with a small batch and in my situation, it will be nan because the number is nearly infinite. after trying a bigger batch, and changing the loss function, it could get out of being Nan all the time and found some results.