I am trying to understand DNN with Matconvnet DagNN. I've a question based on the following last two layers of a net which uses euclidean loss for regression
net.addLayer('fc9', dagnn.Conv('size', [1 1 4096 1], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop8'}, {'prediction'}, {'conv10f' 'conv10b'});
net.addLayer('l2_loss', dagnn.L2Loss(), {'prediction', 'label'}, {'objective'});
where the code for L2Loss is
function Y=vl_nnL2(X,c,dzdy)
c=reshape(c,size(X));
if nargin == 2 || (nargin == 3 && isempty(dzdy))
diff_xc=(bsxfun(@minus, X,(c)));
Y=diff_xc.^2;
elseif nargin == 3 && ~isempty(dzdy)
Y=(X-c).*dzdy;
end
end
X is the output of fc9 layer, which is the feature vector of length 100 (batch size), and c is the labels.
*-----------new modified L2 regression function
function Y=vl_nnL2_(X,c,dzdy)
c=reshape(c,size(X));
[~,chat] = max(X,[],3) ;
[~,lchat] = max(c,[],3) ;
if nargin == 2 || (nargin == 3 && isempty(dzdy))
t = (chat-lchat).^ 2 ;
Y=sum(sum(t));
elseif nargin == 3 && ~isempty(dzdy)
ch=squeeze(chat);
aa1=repmat(ch',35,1);
lch=squeeze(lchat);
aa2=repmat(lch',35,1);
t = (chat-lchat);
Y = dzdy.*(aa1-aa2)*2;
Y = single(reshape(Y,size(X)));
end
end
"if nargin == 2 || (nargin == 3 && isempty(dzdy))" checks if it's forward mode.
In the forward mode, you compute (prediction - label).^2:
diff_xc=(bsxfun(@minus, X,(c)));
Y=diff_xc.^2;
The derivative of L2 loss w.r.t. prediction is 2*(prediction - label). Thus we have
Y=(X-c).*dzdy;
in your code. Here the author of your code isn't rigorous enough to put the constant 2*. But in general it will work since it's just a constant scaling factor on your gradients. dzdy is the gradient from downstream layers. If this layer is the last one, dzdy=1, which manually provided by MatConvnet.
c must be of the same size as X since its' regression.
More comments coming. Let me know if you have other questions. I'm pretty familiar with MatConvNet.