python tensorflow conv-neural-network keras-layer

Gradients do not exist for variables in custom layers in TensorFlow

I am trying to build a custom 1D convolution layer in TensorFlow. I have checked that the layer does what is supposed to do. However when I insert it in a sequential Keras model I get that WARNING that Gradients do not exists for variables in the custom layer.

Can you please explain why this happens and how I can fix it?

This is the code

import tensorflow as tf
import numpy as np


class customC1DLayer(tf.keras.layers.Layer):
    def __init__(self, filter_size = 1, activation = None ,**kwargs):
        super(customC1DLayer, self).__init__(**kwargs)
        self.filter_size = filter_size
        self.activation = tf.keras.activations.get(activation)
    
    def build(self, input_shape):
        self.filter = self.add_weight('filter', shape=[self.filter_size, ], trainable=True, dtype=tf.float32)

        self.padding = tf.Variable(initial_value=tf.zeros(shape=[input_shape[-1] - self.filter_size, ], dtype=tf.float32), trainable=False)
        padded_filter = tf.concat([self.filter, self.padding], axis=0)
        col = tf.concat([padded_filter[:1], tf.zeros_like(padded_filter[1:])], axis=0)
        self.augmented_filter = tf.linalg.LinearOperatorToeplitz(padded_filter, col).to_dense()
     
    def call(self, inputs):
        outputs = tf.transpose(tf.matmul(self.augmented_filter, inputs, transpose_b=True))
        if self.activation is not None:
            outputs = self.activation(outputs)
        return outputs

To explain the code, in the method build I initialize some weights for example [a b c] and then the augmented_filter is just the circulant matrix [[a b c 0 0], [0 a b c 0], [0 0 a b c]]

I know that such mistakes can occur when non differentiable functions are used. However, in this case I am only using matrix operation which should differentiable, as far as I know.

Solution

The issue is that, in the call function, there is no path from the augmented_filter to padding -- I presume you want the gradients for the latter. As it stands, the variable is effectively not used, so no gradients can be computed. You will need to do this transformation within call:

class customC1DLayer(tf.keras.layers.Layer):
    def __init__(self, filter_size = 1, activation = None ,**kwargs):
        super(customC1DLayer, self).__init__(**kwargs)
        self.filter_size = filter_size
        self.activation = tf.keras.activations.get(activation)
    
    def build(self, input_shape):
        self.filter = self.add_weight('filter', shape=[self.filter_size, ], trainable=True, dtype=tf.float32)

        self.padding = tf.Variable(initial_value=tf.zeros(shape=[input_shape[-1] - self.filter_size, ], dtype=tf.float32), trainable=False)
     
    def call(self, inputs):

        padded_filter = tf.concat([self.filter, self.padding], axis=0)
        col = tf.concat([padded_filter[:1], tf.zeros_like(padded_filter[1:])], axis=0)
        augmented_filter = tf.linalg.LinearOperatorToeplitz(padded_filter, col).to_dense()
        outputs = tf.transpose(tf.matmul(augmented_filter, inputs, transpose_b=True))
        if self.activation is not None:
            outputs = self.activation(outputs)
        return outputs