pythontensorflowmachine-learningkeraskeras-tuner

Understanding Keras Tuner


I am trying to understand the use of Keras tuner in obtaining optimal value of hyperparameters for a simple MLP model. The code that I am using is as follows:

def build_model2(hp):
      model = tf.keras.Sequential()
      for i in range(hp.Int('layers', 2, 6)):
          model.add(tf.keras.layers.Dense(units=hp.Int('units_' + str(i), 32, 512, step=128), 
                                          activation = hp.Choice('act_' + str(i), ['relu', 'sigmoid','tanh'])))
      model.add(Flatten())
      model.add(layers.Dense(5, activation='softmax'))
      learning_rate = hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")
      model.compile(keras.optimizers.Adam(learning_rate=learning_rate), loss = 'categorical_crossentropy', metrics = ['accuracy'])
      return model
 
tuner2 = RandomSearch(build_model2, objective = 'val_accuracy', max_trials = 5,
                      executions_per_trial = 3, overwrite=True)
  
tuner2.search_space_summary()
  
tuner2.search(X_train, Y_train, epochs=25, validation_data=(X_train, Y_train),verbose = 1)

tuner2.results_summary()

# Get the optimal hyperparameters
best_hps=tuner2.get_best_hyperparameters(num_trials=1)[0]
print("The optimal parameters are:")
print(best_hps.values)  

# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model = tuner2.hypermodel.build(best_hps)
history = model.fit(X_train, Y_train, epochs=50, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))
  
hypermodel = tuner2.hypermodel.build(best_hps)

# Retrain the model
hypermodel.fit(X_train, Y_train, epochs=best_epoch)
  
eval_result = hypermodel.evaluate(X_test, Y_test)
print("[test loss, test accuracy]:", eval_result)

The paramters that I am tuning are: Number of hidden layers (2 - 6), number of neurons in the hidden layer (min = 32, max = 512, step size = 128), activtion function ('relu', 'sigmoid','tanh') and learning rate (min_value=1e-4, max_value=1e-2, sampling="log").

For different combinations of the above parameters, I obtained different values as shown below:

enter image description here

I have the following doubts:

  1. How can I conclude that a particular combination is giving me the most optimal value?
  2. If the optimal numbers of hidden layers are say 2, then also why do I get values for unit_2, unit_3 and unit_4?
  3. If I repeat the simulation again the values of the hyperparameters change, then how should I conclude that a particular combination is optimal?

Solution

  • Here's my take:

    1. Please note that in your search call you set validation_data to (X_train, Y_train) and it should be (X_test, Y_test). I think that's why your accuracy is near perfect and misleading. Correcting that will return 'val_accuracy' which will tell you whether or not model generalizes well. That would be my to-go metric.

    2. Parameters for all layers will be displayed regardless of the actual number of layers tuned. This is expected (although a bit confusing) behaviour. More comments here: https://github.com/keras-team/keras-tuner/issues/66#issuecomment-525923517

    3. To have reproducible results and be able to compare them, you would need to set the random seed. It is not that straightforward as it may seem but is not hard to implement. Check this answer: https://stackoverflow.com/a/52897216/19135414