I am trying to recreate the results reported in Reducing the dimensionality of data with neural networks of autoencoding the olivetti face dataset with an adapted version of the MNIST digits matlab code, but am having some difficulty. It seems that no matter how much tweaking I do on the number of epochs, rates, or momentum the stacked RBMs are entering the fine-tuning stage with a large amount of error and consequently fail to improve much at the fine-tuning stage. I am also experiencing a similar problem on another real-valued dataset.
For the first layer I am using a RBM with a smaller learning rate (as described in the paper) and with
negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);
I'm fairly confident I am following the instructions found in the supporting material but I cannot achieve the correct errors.
Is there something I am missing? See the code I'm using for real-valued visible unit RBMs below, and for the whole deep training. The rest of the code can be found here.
rbmvislinear.m:
epsilonw = 0.001; % Learning rate for weights
epsilonvb = 0.001; % Learning rate for biases of visible units
epsilonhb = 0.001; % Learning rate for biases of hidden units
weightcost = 0.0002;
initialmomentum = 0.5;
finalmomentum = 0.9;
[numcases numdims numbatches]=size(batchdata);
if restart ==1,
restart=0;
epoch=1;
% Initializing symmetric weights and biases.
vishid = 0.1*randn(numdims, numhid);
hidbiases = zeros(1,numhid);
visbiases = zeros(1,numdims);
poshidprobs = zeros(numcases,numhid);
neghidprobs = zeros(numcases,numhid);
posprods = zeros(numdims,numhid);
negprods = zeros(numdims,numhid);
vishidinc = zeros(numdims,numhid);
hidbiasinc = zeros(1,numhid);
visbiasinc = zeros(1,numdims);
sigmainc = zeros(1,numhid);
batchposhidprobs=zeros(numcases,numhid,numbatches);
end
for epoch = epoch:maxepoch,
fprintf(1,'epoch %d\r',epoch);
errsum=0;
for batch = 1:numbatches,
if (mod(batch,100)==0)
fprintf(1,' %d ',batch);
end
%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
data = batchdata(:,:,batch);
poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));
batchposhidprobs(:,:,batch)=poshidprobs;
posprods = data' * poshidprobs;
poshidact = sum(poshidprobs);
posvisact = sum(data);
%%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
poshidstates = poshidprobs > rand(numcases,numhid);
%%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);% + randn(numcases,numdims) if not using mean
neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1)));
negprods = negdata'*neghidprobs;
neghidact = sum(neghidprobs);
negvisact = sum(negdata);
%%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
err= sum(sum( (data-negdata).^2 ));
errsum = err + errsum;
if epoch>5,
momentum=finalmomentum;
else
momentum=initialmomentum;
end;
%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
vishidinc = momentum*vishidinc + ...
epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);
visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);
hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);
vishid = vishid + vishidinc;
visbiases = visbiases + visbiasinc;
hidbiases = hidbiases + hidbiasinc;
%%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
fprintf(1, '\nepoch %4i error %f \n', epoch, errsum);
end
dofacedeepauto.m:
clear all
close all
maxepoch=200; %In the Science paper we use maxepoch=50, but it works just fine.
numhid=2000; numpen=1000; numpen2=500; numopen=30;
fprintf(1,'Pretraining a deep autoencoder. \n');
fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch);
load fdata
%makeFaceData;
[numcases numdims numbatches]=size(batchdata);
fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid);
restart=1;
rbmvislinear;
hidrecbiases=hidbiases;
save mnistvh vishid hidrecbiases visbiases;
maxepoch=50;
fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen);
batchdata=batchposhidprobs;
numhid=numpen;
restart=1;
rbm;
hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;
save mnisthp hidpen penrecbiases hidgenbiases;
fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2);
batchdata=batchposhidprobs;
numhid=numpen2;
restart=1;
rbm;
hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;
save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2;
fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen);
batchdata=batchposhidprobs;
numhid=numopen;
restart=1;
rbmhidlinear;
hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;
save mnistpo hidtop toprecbiases topgenbiases;
backpropface;
Thanks for your time
Silly me, I had forgotten to change the back-propagation fine-tuning script (backprop.m). One has to change the output layer (where the faces get reconstructed) to be for real-valued units. I.e.
dataout = w7probs*w8;