python tensorflow gradient-descent graph-neural-network

tensor flow/spektral graph-neural-networks gradient descent issue

I am running into problem when trying to run gradient descent using graph-neural networks in interactive learning style. My goal is to use graph neural-networks to identify action, use action value to compute loss and use the loss value to perform gradient descent. However, the gradient descent part is causing issues.

I have created the self-contained version of the problem and showed the code below and also copied the error message that I am getting during execution.

class GIN0(Model):
    def __init__(self, channels, n_layers):
        super().__init__()
        self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
        self.convs = []
        for _ in range(1, n_layers):
            self.convs.append(
                GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
            )
        self.pool = GlobalAvgPool()
        self.dense1 = Dense(channels, activation="relu")
        self.dropout = Dropout(0.5)
        self.dense2 = Dense(channels, activation="relu")

    def call(self, inputs):
        x, a, i = inputs
        x = self.conv1([x, a])
        for conv in self.convs:
            x = conv([x, a])
        x = self.pool([x, i])
        x = self.dense1(x)
        x = self.dropout(x)
        return self.dense2(x)
class IGDQN(object):
    def __init__(self,
                 number_of_outputs,
                 layers,
                 alpha,
                 gamma,
                 epsilon
        ):
        self.number_of_outputs = number_of_outputs
        self.layers = layers
        self.alpha = alpha
        self.gamma = gamma
        self.epsilon = epsilon
        self.opt = Adam(lr=alpha)
        self.model = GIN0(number_of_outputs, layers)

    def choose_action(self, state, debug=False):
        if np.random.rand() < self.epsilon:
            return random.randrange(self.number_of_outputs)
        q = self.model.predict(state)
        if debug:
            print('q=',q)
            print('action_code=',np.argmin(q[0]))
        return np.argmin(q[0])

    @tf.function
    def update(self, loss):
        with tf.GradientTape(persistent=True) as tape:
            #the gin0 network weights are updated
            gradients = tape.gradient(loss, self.model.trainable_variables)
            print(gradients)
            self.opt.apply_gradients(zip(gradients, self.model.trainable_variables))

def get_inputs():
    indices = [
     [0, 1],
     [0, 2],
     [0, 4],
     [1, 0],
     [1, 2],
     [1, 3],
     [1, 5],
     [2, 0],
     [2, 1],
     [2, 3],
     [2, 4],
     [3, 1],
     [3, 2],
     [3, 7],
     [4, 0],
     [4, 2],
     [4, 5],
     [4, 6],
     [5, 1],
     [5, 4],
     [5, 6],
     [6, 4],
     [6, 5],
     [6, 7],
     [7, 3],
     [7, 6]]
    values = [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]
    dense_shape = [8,8]
    adjacency_matrix = tf.sparse.SparseTensor(
        indices, values, dense_shape
    )
    matrix = [
        [0., 0., 0., 1., 0., 6., 1.,],
        [0., 0., 0., 1., 0., 7., 0.,],
        [0., 0., 0., 1., 0., 1., 2.,],
        [0., 0., 0., 1., 0., 1., 3.,],
        [0., 0., 0., 1., 0., 6., 0.,],
        [0., 0., 0., 1., 0., 7., 1.,],
        [0., 0., 0., 1., 0., 0., 3.,],
        [0., 0., 0., 1., 0., 0., 2.,],
    ]
    properties_matrix = np.array(matrix)
    am = tf.sparse.to_dense(adjacency_matrix)
    g = Graph( x=properties_matrix, a=am.numpy(), e=None,y=[456] )
    ds = [g]
    design_name = PLconfig_grid.designName
    dsr = CircuitDataset2(design_name, ds, False, path="/home/xx/CircuitAttributePrediction/dataset")
    loader = DisjointLoader(dsr, batch_size=1)
    inputs, target = loader.__next__()
    return inputs

def check_IGDQN(designName, inputDir):
    number_of_outputs = 128
    layers = 3
    alpha = 5e-4
    gamma = 0.2
    epsilon = 0.3
    dqn = IGDQN(
            number_of_outputs,
            layers,
            alpha,
            gamma,
            epsilon
    )

    inputs = get_inputs()
    next_state = state = inputs
    action = dqn.choose_action(state)
    #loss calculation steps simplified for debug purposes
    loss = tf.constant(100, dtype=tf.float32)
    dqn.update(loss)

I am getting the following errors when running the code above. I got Nones from gradient function based on hypothetical loss value and it subsequently resulted in errors during weight updates. I am using tensor flow in imperative style due to dependency on Graph Neural-Networks and spektral library.

I am not sure what is going wrong here. I have gradient descent using graph-neural-networks in regressions and it worked fine.

[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Traceback (most recent call last):
  File "test_PLKerasNetworks_GIN0.py", line 142, in <module>
    main()
  File "test_PLKerasNetworks_GIN0.py", line 136, in main
    check_IGDQN(designName, inputDir)    
  File "test_PLKerasNetworks_GIN0.py", line 130, in check_IGDQN
    dqn.update(loss)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3887, in bound_method_wrapper
    return wrapped_fn(*args, **kwargs)
  File "/home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    test_PLKerasNetworks_GIN0.py:56 update  *
        self.opt.apply_gradients(zip(gradients, self.model.trainable_variables))
    /home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:598 apply_gradients  **
        grads_and_vars = optimizer_utils.filter_empty_gradients(grads_and_vars)
    /home/xx/.local/share/virtualenvs/xx-TxBsk36Y/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/utils.py:79 filter_empty_gradients
        ([v.name for _, v in grads_and_vars],))

    ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'dense_2/kernel:0', 'dense_3/kernel:0', 'dense_3/bias:0', 'dense_4/kernel:0', 'dense_4/bias:0', 'dense_5/kernel:0', 'dense_6/kernel:0', 'dense_6/bias:0', 'dense_7/kernel:0', 'dense_7/bias:0', 'dense_8/kernel:0', 'gi_n0/dense/kernel:0', 'gi_n0/dense/bias:0', 'gi_n0/dense_1/kernel:0', 'gi_n0/dense_1/bias:0'].

Solution

The above question/problem involves using Subclass Model API and training loop is defined using tf.function. tf.function creates computation graph of the problem. The development of valid graph requires computation of loss inside the update function.

The original code was computing loss outside the update function and tensorflow was not able to build the computation graph correctly. Hence, the gradient was not computed correctly.

Here is the documentation of TensorFlow on the matter: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch

Here is an example of gradient computation inside the update function tf.function decorator and it works well:

    @tf.function
    def update(self, state, next_state):
        loss = None
        with tf.GradientTape(
                watch_accessed_variables=True, persistent=False) as tape:
            #the gin0 network weights are updated
            v1 = self.model(state)
            v2 = self.model(next_state)
            action = tf.math.argmin(v2[0])
            actions = tf.reshape([0,action], [1,2])
            v2_updated = tf.tensor_scatter_nd_update(v1, actions, tf.constant([4.0]))
            loss = self.mse(v1,v2_updated)
            gradients = tape.gradient(loss, self.model.trainable_variables)
            self.opt.apply_gradients(zip(gradients, self.model.trainable_variables))
        return loss, actions, v1, self.model.trainable_variables