tensorflowkeras

Better understanding of training parameter for Keras-Model call method needed


I'd like to get a better understanding of the parameter training, when calling a Keras model.

In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):

pred = model(x, training=True)

and when you want to do inference, you should set training to false:

pred = model(x, training=False)

What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False (because I plan on freezing it like in this tutorial here):

inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)

What will in such a case happen, when I later on call new_model(x_new, training=True)? Will the usage of training=False for the base_model be overruled? Or will training now allways be set to True for the base_model, regardless of what I pass to the new_model? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True), that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)?

Thanks in advance!


Solution

  • training is a boolean argument that determines whether this call function runs in training mode or inference mode. For example, the Dropout layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.

    y = Dropout(0.5)(x, training=True) 
    

    By this, we're setting training=True for the Dropout layer for training time. When we call .fit(), it set sets a flag to True and when we use evaluate or predict, in behind it sets a flag to False. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training argument is set as True or False if we want layers to operate either training mode or inference mode respectively.

    # training mode 
    with tf.GradientTape() as tape:
       logits = model(x, training=True) # forward pass
    
    # inference mode 
    al_logits = model(x, training=False) 
    

    Now coming to your question. After defining the model

    # Freeze the base_model
    base_model.trainable = False
    
    inputs = keras.Input(shape=(150, 150, 3))
    x = base_model(inputs, training=False)
    outputs = head_model(x)
    
    new_model = keras.Model(inputs, outputs)
    

    Now if your run this new model whether .fit() or custom training loop, the base_model will always run in inference mode as it's sets training=False.