I would like to compute the first and the second derivatives (diagonal part of Hessian) of my specified Loss with respect to each feature map of a vgg16 conv4_3 layer's kernel which is a 3x3x512x512 dimensional matrix. I know how to compute derivatives if it is respected to a low-rank one according to How to compute all second derivatives (only the diagonal of the Hessian matrix) in Tensorflow? However, when it turns to higher-rank, I got completed lost.
# Inspecting variables under Ipython notebook
In : Loss
Out : <tf.Tensor 'local/total_losses:0' shape=() dtype=float32>
In : conv4_3_kernel.get_shape()
Out : TensorShape([Dimension(3), Dimension(3), Dimension(512), Dimension(512)])
## Compute derivatives
Grad = tf.compute_gradients(Loss, conv4_3_kernel)
Hessian = tf.compute_gradients(Grad, conv4_3_kernel)
In : Grad
Out : [<tf.Tensor 'gradients/vgg/conv4_3/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 512, 512) dtype=float32>]
In : Hessian
Out : [<tf.Tensor 'gradients_2/vgg/conv4_3/Conv2D_grad/Conv2DBackpropFilter:0' shape=(3, 3, 512, 512) dtype=float32>]
Please help me to check my understandings. So, for conv4_3_kernel
, each dim stand for [Kx, Ky, in_channels, out_channels], so Grad
should be partial derivatives of Loss
with respect to each element(pixel) in the each feature maps. And Hessian
is the second derivatives.
But, Hessian
computes all the derivatives, how can I only compute only the diagonal part? should I use tf.diag_part()
?
tf.compute_gradients
computes derivative of a scalar quantity. If the quantity provided isn't scalar, it turns it into scalar by summing up the components which is what's happening in your example
To compute full Hessian you need n
calls to tf.gradients
, The example is here. If you want just the diagonal part, then modify arguments to i
th call to tf.gradients
to differentiate with respect to i
th variable, rather than all variables.