I am trying to compute the Hessian of the output of a neural network with respect to its inputs. To give you an idea, this is the matrix I am trying to compute:
I am running Tensorflow 2.5.0 and my code to calculate the M matrix looks like this:
def get_Mass_Matrix(self, q, dq):
nDof = dq.shape[1]
with tf.GradientTape(persistent = True) as t2:
t2.watch(dq)
with tf.GradientTape(persistent = True) as t1:
t1.watch(dq)
T = self.kinetic(q, dq)
g = t1.gradient(T, dq)
h = t2.batch_jacobian(g, dq)
return h
The function self.kinetic() calls a multilayer perceptron. When I compute M like this, I get the correct answer but my neural network training slows down significantly, even when running on a GPU.
I was wondering if there is a more efficient way to perform the same computation that doesn't result in so much overhead? Thank you.
For reference, I am using the subclassing approach to build the model (it inherits from tf.keras.Model).
Edit:
Adding more details about the self.kinetic function:
def kinetic(self, q, qdot):
nDof = q.shape[1]
qdq = tf.concat([tf.reshape(q, ((-1, nDof))),
tf.reshape(qdot, ((-1, nDof)))], axis = -1)
return self.T_layers(qdq)
T_layers is defined as:
self.T_layers = L(nlayers = 4, n = 8, input_dim = (latent_dim, 1), nlact = 'swish', oact = 'linear')
Which is calling:
class L(tf.keras.layers.Layer):
def __init__(self, nlayers, n, nlact, input_dim, oact = 'linear'):
super(L, self).__init__()
self.layers = nlayers
self.dense_in = tf.keras.layers.Dense(n, activation = nlact, input_shape = input_dim)
self.dense_lays = []
for lay in range(nlayers):
self.dense_lays.append(tf.keras.layers.Dense(n, activation = nlact, kernel_regularizer = 'l1'))
self.dense_out = tf.keras.layers.Dense(1, activation = oact, use_bias = False)
def call(self, inputs):
x = self.dense_in(inputs)
for lay in range(self.layers):
x = self.dense_lays[lay](x)
return self.dense_out(x)
I suspect part of the problem might be that I am not "building" the layers? Any advice is appreciated!
In order to get a reasonable performance from tensorflow, especially when computing gradients, you have to decorate your get_Mass_Matrix
with @tf.function
to make sure it runs in graph mode. To do this, everything inside the function have to be graph-mode compatible.
In the call
function of class L
, it is better to iterate the list directly instead of indexing it, i.e.:
class L(tf.keras.layers.Layer):
...
def call(self, inputs):
x = self.dense_in(inputs)
for l in self.dense_lays:
x = l(x)
return self.dense_out(x)
Then, you can decorate your get_Mass_Matrix
.
@tf.function
def get_Mass_Matrix(self, q, dq):
with tf.GradientTape() as t2:
t2.watch(dq)
with tf.GradientTape() as t1:
t1.watch(dq)
T = self.kinetic(q, dq)
g = t1.gradient(T, dq)
return t2.batch_jacobian(g, dq)
Remark: q
and dq
that are passed into get_Mass_Matrix
must be tensors of constant shape(constant between calls), otherwise, it will retrace every time there is a new shape and slow down instead.