python tensorflow gradient recurrent-neural-network hessian

TypeError computing gradients with GradientTape.gradient

Hello,

I'm currently trying to compute gradients in Tensorflow 1.13.1 and using the GradientTape class as explained in the official documentation , but I am getting a TypeError: Fetch argument None has invalid type <class 'NoneType'>.
Below, I will include two simple cases where I get this error, using only out-of-the-box Tensorflow function, the first one being the simpler minimal working example, and the second one that I actually need to solve/get a work-around. For completeness, I am using Python 3.6.8.

Simpler one

import tensorflow as tf

tf.reset_default_graph()
x = tf.constant([1., 2., 3.])
with tf.GradientTape(persistent=True) as gg:
    gg.watch(x)
    f1 = tf.map_fn(lambda a: a**2, x)
    f2 = x*x

# Computes gradients
d_fx1 = gg.gradient(f1, x)     #Line that causes the error
d_fx2 = gg.gradient(f2, x)     #No error
del gg #delete persistent GradientTape

with tf.Session() as sess:
    d1, d2 = sess.run((d_fx1, d_fx2))
print(d1, d2)

In this code, f1 and f2 are computed in different ways, but give the same array. However, when trying to compute the gradients associated with them, the first line one gives the following error, whereas the second line works flawlessly. I report below the stack trace of the error

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9c59a2cf2d9b> in <module>()
     15 
     16 with tf.Session() as sess:
---> 17     d1, d2 = sess.run((d_fx1, d_fx2))
     18 print(d1, d2)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1135     # Create a fetch handler to take care of the structure of fetches.
   1136     fetch_handler = _FetchHandler(
-> 1137         self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
   1138 
   1139     # Run request and get response.

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, graph, fetches, feeds, feed_handles)
    469     """
    470     with graph.as_default():
--> 471       self._fetch_mapper = _FetchMapper.for_fetch(fetches)
    472     self._fetches = []
    473     self._targets = []

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
    259     elif isinstance(fetch, (list, tuple)):
    260       # NOTE(touts): This is also the code path for namedtuples.
--> 261       return _ListFetchMapper(fetch)
    262     elif isinstance(fetch, collections.Mapping):
    263       return _DictFetchMapper(fetch)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, fetches)
    368     """
    369     self._fetch_type = type(fetches)
--> 370     self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
    371     self._unique_fetches, self._value_indices = _uniquify_fetches(self._mappers)
    372 

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in <listcomp>(.0)
    368     """
    369     self._fetch_type = type(fetches)
--> 370     self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
    371     self._unique_fetches, self._value_indices = _uniquify_fetches(self._mappers)
    372 

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
    256     if fetch is None:
    257       raise TypeError('Fetch argument %r has invalid type %r' % (fetch,
--> 258                                                                  type(fetch)))
    259     elif isinstance(fetch, (list, tuple)):
    260       # NOTE(touts): This is also the code path for namedtuples.

TypeError: Fetch argument None has invalid type <class 'NoneType'>

Please note that I also tried computing only one gradient at a time, i.e with persistent=False, and got the same results.

Actual need

Below, I will include also the minimal working example to reproduce the same error I got, but trying to resolve the problem I am actually working on.

In this code, I'm using a RNN to compute an output w.r.t some inputs, and I need to compute the jacobian of this output w.r.t the inputs.

import tensorflow as tf
from tensorflow.keras.layers import RNN, GRUCell

# Define size of variable. TODO: adapt to data
inp_dim = 2
num_units = 50
batch_size = 100
timesteps = 10

# Reset the graph, so as to avoid errors
tf.reset_default_graph()

# Building the model
inputs = tf.ones(shape=(timesteps, batch_size, inp_dim))

# Follow gradient computations
with tf.GradientTape() as g:
    g.watch(inputs)
    cells = [GRUCell(num_units), GRUCell(num_units)]
    rnn = RNN(cells, time_major=True, return_sequences=True)
    f = rnn(inputs)
d_fx = g.batch_jacobian(f, inputs)

# Run graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grads = sess.run(d_fx)
grads.shape

Regarding the stack trace, I get the same error but with less lines (there are one for_fetch, <listcomp> and __init less in this stack trace). For completeness, I still include it below

TypeError                                 Traceback (most recent call last)
<ipython-input-5-bb2ce4eebe87> in <module>()
     25 with tf.Session() as sess:
     26     sess.run(tf.global_variables_initializer())
---> 27     grads = sess.run(d_fx)
     28 grads.shape

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1135     # Create a fetch handler to take care of the structure of fetches.
   1136     fetch_handler = _FetchHandler(
-> 1137         self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
   1138 
   1139     # Run request and get response.

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, graph, fetches, feeds, feed_handles)
    469     """
    470     with graph.as_default():
--> 471       self._fetch_mapper = _FetchMapper.for_fetch(fetches)
    472     self._fetches = []
    473     self._targets = []

C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
    256     if fetch is None:
    257       raise TypeError('Fetch argument %r has invalid type %r' % (fetch,
--> 258                                                                  type(fetch)))
    259     elif isinstance(fetch, (list, tuple)):
    260       # NOTE(touts): This is also the code path for namedtuples.

TypeError: Fetch argument None has invalid type <class 'NoneType'>

I feel like there is a bug with some Tensorflow function that gets me the error, however I am not sure. At the end, what interest me is getting a tensor containing the jacobian of the output of my network w.r.t to the inputs. How can I achieve that using other tools, or correcting my code ?

EDIT: Ok, so I took into account the comments by danyfang, and tried to look into the issue raised on Github he quoted about tf.gradients returning None instead of 0 due to some implementation design in low-level Tensorflow.

Therefore, I tried to create a simple case where I am sure that gradient are different from 0, by computing tf.matmul(tf.transpose(x), x). I am posting below a MWE.

import tensorflow as tf

tf.reset_default_graph()
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape(persistent=True) as gg:
    gg.watch(x)
    y = tf.matmul(x, tf.transpose(x))
    f1 = tf.map_fn(lambda a: a, y)

# Computes gradients
d_fx1 = gg.gradient(f1, x)
d_yx = gg.gradient(y, x)
del gg #delete persistent GradientTape

with tf.Session() as sess:
    #d1 = sess.run(d_fx1) # Same error None type
    d2 = sess.run(d_yx) #Works flawlessly. returns array([[2., 4., 6.]], dtype=float32)
d2

This shows (at least in my opinion) that the error arises not because of the behavior reported by this issue, but another thing due to lower level implementation.

Solution

EDIT: Below, I report how I computed the tf.hessians of my output w.r.t the inputs.

I succeeded in computing the gradients using the function tf.gradients. However, according to the documentation this function uses symbolic derivation whereas GradientTape.gradient uses automatic differentiation. In the papers I am reading, they talk about automatic differentiation, so I don't know if I'll encounter some problems later on, but at least my code runs.

Below, I post a MWE with the RNN code I already used.

import tensorflow as tf
from tensorflow.keras.layers import RNN, GRUCell, Dense

# Define size of variable. TODO: adapt to data
inp_dim = 2
num_units = 50
batch_size = 100
timesteps = 10

# Reset the graph, so as to avoid errors
tf.reset_default_graph()

inputs = tf.ones(shape=(timesteps, batch_size, inp_dim))

### Building the model
cells = [GRUCell(num_units), GRUCell(num_units)]
rnn = RNN(cells, time_major=True, return_sequences=True)
final_layer = Dense(1, input_shape=(num_units,))

# Apply to inputs
last_state = rnn(inputs)
f = final_layer(last_state)

[derivs] = tf.gradients(f, inputs)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grads = sess.run(derivs)

Just to warn any interested bystander that would like to compute second order derivatives, using tf.gradients(tf.gradients(func, vars)) is not supported. There is a function called tf.hessian too, but replacing tf.gradients by tf.hessian in the code above did not work, and led to an error so long that I will not include it here. I will most likely do an issue on Github, that I will link here for anyone interested. For the moment, as I encountered a unsatisfying workaround, I will mark my own response as solving my problem.

Computing second-order derivatives

See this issue on Github.