Reusability of Layers in Tensorflow Subclassing for Model Creation

So I'm currently learning how to create a model in Tensorflow using subclassing. According to the tutorial, the following snippet of code should run perfectly:

#Defining the class
class FeatureExtractor(Layer):
    def __init__(self):
        super().__init__()

        self.conv_1 = Conv2D(filters = 6, kernel_size = 4, padding = "valid", activation = "relu")
        self.batchnorm_1 = BatchNormalization()
        self.maxpool_1 = MaxPool2D(pool_size = 2, strides=2)

        self.conv_2 = Conv2D(filters = 16, kernel_size = 4, padding = "valid", activation = "relu")
        self.batchnorm_2 = BatchNormalization()
        self.maxpool_2 = MaxPool2D(pool_size = 2, strides=2)


    def call(self, x):
        x = self.conv_1(x)
        x = self.batchnorm_1(x)
        x = self.maxpool_1(x)

        x = self.conv_2(x)
        x = self.batchnorm_2(x)
        x = self.maxpool_2(x)

        return x

#Calling and using the class
feature_extractor = FeatureExtractor()

func_input = Input(shape=(IMG_SIZE, IMG_SIZE, 3), name="Input_Image")

x = feature_extractor(func_input)

And it does indeed run flawlessly. But then I realized that in __init__(), the BatchNormalization() and MaxPool2D() look the same but are defined twice instead of being reused, so I did some research and asked on Stackoverflow and came to the conclusion that when layers are called in call(), they adapt to the dimension of the input, and then keep that dimension for later calls, which prevents these layers to being able to be reused.

But as Fynn pointed out in my previous question, the maxpool member can actually be reused, while batchnorm cannot. So some layers adapt to the dimension of the input (like batchnorm) and some don't (like maxpool)? Or is there a property of these layers that causes this behavior? (that I can look up in the documentation)

Solution

Generally speaking, some layers have parameters, and those cannot be re-used, except in some edge cases, but they probably shouldn't. In this concrete example, batch normalization has learnable beta and gamma parameters, as well as non-trainable parameters that are used in inference (if this doesn't mean anything to you, I suggest looking up how batch norm works in detail).

Now, one layer object only has one set of parameters, and if you were to re-use that object, the same parameters would be applied in multiple places. Consider this:

self.conv_1 = Conv2D(filters = 6, kernel_size = 4, padding = "valid", activation = "relu")
self.batchnorm = BatchNormalization()
self.maxpool_1 = MaxPool2D(pool_size = 2, strides=2)

self.conv_2 = Conv2D(filters = 16, kernel_size = 4, padding = "valid", activation = "relu")
self.maxpool_2 = MaxPool2D(pool_size = 2, strides=2)

and assume batchnorm would be used after both convolutions. In this example, batchnorm would be built to have 6 betas/gammas because that's the number of filters in the first layer; trying to re-use it for the second layer would not work, as that has 16 filters, requiring 16 betas/gammas. This is why only creating one batchnorm layer doesn't work.

The following should actually run:

self.conv_1 = Conv2D(filters = 16, kernel_size = 4, padding = "valid", activation = "relu")
self.batchnorm = BatchNormalization()
self.maxpool_1 = MaxPool2D(pool_size = 2, strides=2)

self.conv_2 = Conv2D(filters = 16, kernel_size = 4, padding = "valid", activation = "relu")
self.maxpool_2 = MaxPool2D(pool_size = 2, strides=2)

I changed it such that both convolutions use 16 filters. However, this would re-use the same beta/gamma after both layers (since we use the same batchnorm object), which is probably a bad idea.

For maxpool, this is not an issue, since it doesn't have any parameters. Pooling with the same window size and stride works the same for any input.