tensorflowkerasneural-networktransfer-learning

How does Tensorflow or Keras handle model weight inititialization and when does it happen?


After reading the answer to this question I am a bit confused as to when exactly TensorFlow initializes the weight and bias variables. As per the answers, Compile defines the loss function, the optimizer and the metrics. That's all.

Since the compile() method doesn't initialize it then that would suggest that it happens during the fit() method run.

However the issue with that is, in case of loading models or loading weights how would fit() know that the weights, its presented with, are actually useful and should not be thrown away and then assigned random values in place of those.

We pass the type of intitializer in the argument kernel_initializer while declaring the layer. For example:

dense02 = tf.keras.layers.Dense(units=10, 
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros')

So an obvious question would be whether the weights are initialized layer by layer during the first epoch forward pass or does it happen for all layers before the first epoch.

(What I am trying to say is that if there say 5 Dense layers in the model, then does the initialization happen say a layer at a time, i.e. the first Dense layer gets initialized then the forward pass happens for that layer, then the second layer is initialized and the forward pass for second Dense layer happens and so on)

Another aspect is regarding transfer learning, when stacking custom layers on top of a trained model, the trained model layers have the weights, while the layers that I added wouldn't have any useful layers. So how would TensorFlow know to only initialize the variables of the layers I added and not the mess up the layers of the transferred model (provided, I don't have trainable=False)

How does TensorFlow or Keras handle weight initialization?


Solution

  • The weights are initialized when the model is created (when each layer in model is initialized), i.e before the compile() and fit():

    import tensorflow as tf
    from tensorflow.keras import models, layers
    
    inputs = layers.Input((3, ))
    outputs = layers.Dense(units=10, 
                    kernel_initializer='glorot_uniform',
                    bias_initializer='zeros')(inputs)
    
    model = models.Model(inputs=inputs, outputs=outputs)
    
    for layer in model.layers: 
        print("Config:\n{}\nWeights:\n{}\n".format(layer.get_config(), layer.get_weights()))
    

    Outputs:

    Config:
    {'batch_input_shape': (None, 3), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_1'}
    Weights:
    []
    
    Config:
    {'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 10, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}
    Weights:
    [array([[-0.60352975,  0.08275259, -0.6521113 , -0.5860774 , -0.42276743,
            -0.3142944 , -0.28118378,  0.07770532, -0.5644444 , -0.47069687],
           [ 0.4611913 ,  0.35170448, -0.62191975,  0.5837332 , -0.3390234 ,
            -0.4033073 ,  0.03493106, -0.06078851, -0.53159714,  0.49872506],
           [ 0.43685734,  0.6160207 ,  0.01610583, -0.3673877 , -0.14144647,
            -0.3792309 ,  0.05478126,  0.602067  , -0.47438127,  0.36463356]],
          dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]