Hello,
I'm currently trying to compute gradients in Tensorflow 1.13.1 and using the GradientTape
class as explained in the official documentation , but I am getting a TypeError: Fetch argument None has invalid type <class 'NoneType'>
.
Below, I will include two simple cases where I get this error, using only out-of-the-box Tensorflow function, the first one being the simpler minimal working example, and the second one that I actually need to solve/get a work-around. For completeness, I am using Python 3.6.8.
import tensorflow as tf
tf.reset_default_graph()
x = tf.constant([1., 2., 3.])
with tf.GradientTape(persistent=True) as gg:
gg.watch(x)
f1 = tf.map_fn(lambda a: a**2, x)
f2 = x*x
# Computes gradients
d_fx1 = gg.gradient(f1, x) #Line that causes the error
d_fx2 = gg.gradient(f2, x) #No error
del gg #delete persistent GradientTape
with tf.Session() as sess:
d1, d2 = sess.run((d_fx1, d_fx2))
print(d1, d2)
In this code, f1
and f2
are computed in different ways, but give the same array. However, when trying to compute the gradients associated with them, the first line one gives the following error, whereas the second line works flawlessly. I report below the stack trace of the error
TypeError Traceback (most recent call last)
<ipython-input-1-9c59a2cf2d9b> in <module>()
15
16 with tf.Session() as sess:
---> 17 d1, d2 = sess.run((d_fx1, d_fx2))
18 print(d1, d2)
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
927 try:
928 result = self._run(None, fetches, feed_dict, options_ptr,
--> 929 run_metadata_ptr)
930 if run_metadata:
931 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1135 # Create a fetch handler to take care of the structure of fetches.
1136 fetch_handler = _FetchHandler(
-> 1137 self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
1138
1139 # Run request and get response.
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, graph, fetches, feeds, feed_handles)
469 """
470 with graph.as_default():
--> 471 self._fetch_mapper = _FetchMapper.for_fetch(fetches)
472 self._fetches = []
473 self._targets = []
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
259 elif isinstance(fetch, (list, tuple)):
260 # NOTE(touts): This is also the code path for namedtuples.
--> 261 return _ListFetchMapper(fetch)
262 elif isinstance(fetch, collections.Mapping):
263 return _DictFetchMapper(fetch)
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, fetches)
368 """
369 self._fetch_type = type(fetches)
--> 370 self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
371 self._unique_fetches, self._value_indices = _uniquify_fetches(self._mappers)
372
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in <listcomp>(.0)
368 """
369 self._fetch_type = type(fetches)
--> 370 self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
371 self._unique_fetches, self._value_indices = _uniquify_fetches(self._mappers)
372
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
256 if fetch is None:
257 raise TypeError('Fetch argument %r has invalid type %r' % (fetch,
--> 258 type(fetch)))
259 elif isinstance(fetch, (list, tuple)):
260 # NOTE(touts): This is also the code path for namedtuples.
TypeError: Fetch argument None has invalid type <class 'NoneType'>
Please note that I also tried computing only one gradient at a time, i.e with persistent=False
, and got the same results.
Below, I will include also the minimal working example to reproduce the same error I got, but trying to resolve the problem I am actually working on.
In this code, I'm using a RNN
to compute an output w.r.t some inputs, and I need to compute the jacobian
of this output w.r.t the inputs.
import tensorflow as tf
from tensorflow.keras.layers import RNN, GRUCell
# Define size of variable. TODO: adapt to data
inp_dim = 2
num_units = 50
batch_size = 100
timesteps = 10
# Reset the graph, so as to avoid errors
tf.reset_default_graph()
# Building the model
inputs = tf.ones(shape=(timesteps, batch_size, inp_dim))
# Follow gradient computations
with tf.GradientTape() as g:
g.watch(inputs)
cells = [GRUCell(num_units), GRUCell(num_units)]
rnn = RNN(cells, time_major=True, return_sequences=True)
f = rnn(inputs)
d_fx = g.batch_jacobian(f, inputs)
# Run graph
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
grads = sess.run(d_fx)
grads.shape
Regarding the stack trace, I get the same error but with less lines (there are one for_fetch
, <listcomp>
and __init
less in this stack trace). For completeness, I still include it below
TypeError Traceback (most recent call last)
<ipython-input-5-bb2ce4eebe87> in <module>()
25 with tf.Session() as sess:
26 sess.run(tf.global_variables_initializer())
---> 27 grads = sess.run(d_fx)
28 grads.shape
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
927 try:
928 result = self._run(None, fetches, feed_dict, options_ptr,
--> 929 run_metadata_ptr)
930 if run_metadata:
931 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1135 # Create a fetch handler to take care of the structure of fetches.
1136 fetch_handler = _FetchHandler(
-> 1137 self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
1138
1139 # Run request and get response.
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in __init__(self, graph, fetches, feeds, feed_handles)
469 """
470 with graph.as_default():
--> 471 self._fetch_mapper = _FetchMapper.for_fetch(fetches)
472 self._fetches = []
473 self._targets = []
C:\HOMEWARE\Miniconda3-Windows-x86_64\envs\rdwsenv\lib\site-packages\tensorflow\python\client\session.py in for_fetch(fetch)
256 if fetch is None:
257 raise TypeError('Fetch argument %r has invalid type %r' % (fetch,
--> 258 type(fetch)))
259 elif isinstance(fetch, (list, tuple)):
260 # NOTE(touts): This is also the code path for namedtuples.
TypeError: Fetch argument None has invalid type <class 'NoneType'>
I feel like there is a bug with some Tensorflow function that gets me the error, however I am not sure. At the end, what interest me is getting a tensor
containing the jacobian of the output of my network w.r.t to the inputs. How can I achieve that using other tools, or correcting my code ?
EDIT: Ok, so I took into account the comments by danyfang, and tried to look into the issue raised on Github he quoted about tf.gradients
returning None
instead of 0
due to some implementation design in low-level Tensorflow.
Therefore, I tried to create a simple case where I am sure that gradient are different from 0
, by computing tf.matmul(tf.transpose(x), x)
. I am posting below a MWE.
import tensorflow as tf
tf.reset_default_graph()
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape(persistent=True) as gg:
gg.watch(x)
y = tf.matmul(x, tf.transpose(x))
f1 = tf.map_fn(lambda a: a, y)
# Computes gradients
d_fx1 = gg.gradient(f1, x)
d_yx = gg.gradient(y, x)
del gg #delete persistent GradientTape
with tf.Session() as sess:
#d1 = sess.run(d_fx1) # Same error None type
d2 = sess.run(d_yx) #Works flawlessly. returns array([[2., 4., 6.]], dtype=float32)
d2
This shows (at least in my opinion) that the error arises not because of the behavior reported by this issue, but another thing due to lower level implementation.
EDIT: Below, I report how I computed the tf.hessians
of my output w.r.t the inputs.
I succeeded in computing the gradients using the function tf.gradients
. However, according to the documentation this function uses symbolic derivation whereas GradientTape.gradient
uses automatic differentiation. In the papers I am reading, they talk about automatic differentiation, so I don't know if I'll encounter some problems later on, but at least my code runs.
Below, I post a MWE with the RNN code I already used.
import tensorflow as tf
from tensorflow.keras.layers import RNN, GRUCell, Dense
# Define size of variable. TODO: adapt to data
inp_dim = 2
num_units = 50
batch_size = 100
timesteps = 10
# Reset the graph, so as to avoid errors
tf.reset_default_graph()
inputs = tf.ones(shape=(timesteps, batch_size, inp_dim))
### Building the model
cells = [GRUCell(num_units), GRUCell(num_units)]
rnn = RNN(cells, time_major=True, return_sequences=True)
final_layer = Dense(1, input_shape=(num_units,))
# Apply to inputs
last_state = rnn(inputs)
f = final_layer(last_state)
[derivs] = tf.gradients(f, inputs)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
grads = sess.run(derivs)
Just to warn any interested bystander that would like to compute second order derivatives, using tf.gradients(tf.gradients(func, vars))
is not supported. There is a function called tf.hessian
too, but replacing tf.gradients
by tf.hessian
in the code above did not work, and led to an error so long that I will not include it here. I will most likely do an issue on Github, that I will link here for anyone interested. For the moment, as I encountered a unsatisfying workaround, I will mark my own response as solving my problem.
See this issue on Github.