Mnistauto 4

An Analysis of RBM.m of Hinton’s
mnistdeepauto example’s
backprop.m
by Ali Riza SARAL
arsaral((at))yahoo.com
References:
Hinton’s «Lecture 12C _ Restricted Boltzmann Machines»
Hugo Larochelle’s «Neural networks [5.2] _ Restricted Boltzmann machine – inference»
Hugo Larochelle’s «Neural networks [5.4] _ Restricted Boltzmann machine - contrastive divergence»

@copyright
• % Version 1.000%% Code provided by Ruslan Salakhutdinov
and Geoff Hinton%% Permission is granted for anyone to
copy, use, modify, or distribute this% program and
accompanying programs and documents for any purpose,
provided% this copyright notice is retained and
prominently displayed, along with% a note saying that the
original programs are available from our% web page.% The
programs and documents are distributed without any
warranty, express or% implied. As the programs were
written for research purposes only, they have% not been
tested to the degree that would be advisable in any
important% application. All use of these programs is
entirely at the user's own risk.

Initialization 1.
• maxepoch=1; %maxepoch=200;
• fprintf(1,'nFine-tuning deep autoencoder by
minimizing cross entropy error. n');
• fprintf(1,'60 batches of 1000 cases each. n');
• load mnistvh
• load mnisthp
• load mnisthp2
• load mnistpo

mnistdeepauto
• hidrecbiases=hidbiases; % 1x1000
• save mnistvh vishid hidrecbiases visbiases; %784x1000 1x1000 1x784
• fprintf(1,'nPretraining Layer 2 with RBM: %d-%d n',numhid,numpen); % 1000, 500
• ...
• hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;
• save mnisthp hidpen penrecbiases hidgenbiases; % 1000x500 1x500 1x1000
• fprintf(1,'nPretraining Layer 3 with RBM: %d-%d n',numpen,numpen2); % 500, 250
• ...
• hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;
• save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2; % 500x250 1x250 1x500
• fprintf(1,'nPretraining Layer 4 with RBM: %d-%d n',numpen2,numopen); %250, 30
• ...
• hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;
• save mnistpo hidtop toprecbiases topgenbiases; % 250x30 1x30 1x250

Backprop Initialization 2.
• makebatches;[numcases numdims
numbatches]=size(batchdata); % 100x784x600
• N=numcases;

Initialization 3.
• %%%% PREINITIALIZE WEIGHTS OF THE AUTOENCODER
• w1=[vishid; hidrecbiases]; % 784x1000 ; 1x1000 = 785x1000
• w2=[hidpen; penrecbiases]; % 1000x500 ; 1x500 = 1001x500
• w3=[hidpen2; penrecbiases2]; %500x250 ; 1x250 = 501x250
• w4=[hidtop; toprecbiases]; % 250x30 ; 1x30 = 251x30
• w5=[hidtop'; topgenbiases]; % 30x250 ; 1x250 = 31x250
• w6=[hidpen2'; hidgenbiases2]; % 250x500 ; 1x500 = 251x500
• w7=[hidpen'; hidgenbiases]; % 500x1000 ; 1x1000 = 501x1000
• w8=[vishid'; visbiases]; % 1000x784 ; 1x784 = 1001x784
• %%%%%%%%%% END OF PREINITIALIZATION OF WEIGHTS

Initialization 4.
• %%%%%%%%%% END OF PREINITIALIZATION OF WEIGHTS
• l1=size(w1,1)-1; % 784
• l2=size(w2,1)-1; % 1000
• l3=size(w3,1)-1; % 500
• l4=size(w4,1)-1; % 250
• l5=size(w5,1)-1; % 30
• l6=size(w6,1)-1; % 250
• l7=size(w7,1)-1; % 500
• l8=size(w8,1)-1; % 1000
• l9=l1; % 784
• test_err=[];
• train_err=[];
• The weights are bidirectional, the 4 layers become 8 and their lengths
remain the same for the reverse processing.

Epoch loop
• for epoch = 1:maxepoch
• %%%%%%%%%%%% COMPUTE TRAINING RECONSTRUCTION ERROR
• %%%% DISPLAY FIGURE TOP ROW REAL DATA BOTTOM ROW RECONSTRUCTIONS
• %%%%%%%%%%% COMPUTE TEST RECONSTRUCTION ERROR
• PERFORM CONJUGATE GRADIENT LOOP
• end

Conjugate Gradient Loop
• for batch = 1:numbatches/10
• %%%%%%%%%%% COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH
• %%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3 LINESEARCHES
• End
• save mnist_weights w1 w2 w3 w4 w5 w6 w7 w8
• save mnist_error test_err train_err;
• End (of epoche)

Call mnistdisp.m
• fprintf(1,'Displaying in figure 1: Top row - real data, Bottom row -- reconstructions
n');
• output=[]; %Concat the numbers in output
• for ii=1:15 %Take only the first 15 numbers (30 infact)
output = [output data(ii,1:end-1)' dataout(ii,:)']; % 784x100 ++ 784x100
• end %Take the training number first and then the corresponding reconstruction
• if epoch==1 %Manage figure positioning etc.
• close all
• figure('Position',[100,600,1000,200]);
• else
• figure(1)
• end
• mnistdisp(output); % prepare data to be displayed and display
• drawnow;

Mnistdisp.m 1
• function [err] = mnistdisp(digits); % 784x30
• % display a group of MNIST images
• col=28;row=28;
• [dd,N] = size(digits); % 784x30 N=30;
• imdisp=zeros(2*28,ceil(N/2)*28);
• % 56, 420 pixel picture size, 56 is two numbers
over each other = 28+28, 420 is 15 numbers
adjacent to each other 15 * 28 = 420

Mnistdisp.m 2
• for nn=1:N % 1:30
•
• ii=rem(nn,2);
• if(ii==0) ii=2; end % ii is line number 1rem1->1 2rem2=0->2 3->1
4->2
• jj=ceil(nn/2); % jj is digit column sequence in the picture 1/2=1 2/2=1
3/2=2 4/2=2
• img1 = reshape(digits(:,nn),row,col);
• %reshape((784x1),28,28) = 28x28 nn=1..30 reshapes digit
nn in nn loop, there are 30 number columns of 784 length
• img2(((ii-1)*row+1):(ii*row),((jj-1)*col+1):(jj*col))=img1';
• % img2(nn=1 -> 0+1: 1*row, 0+1:1*col) = img1‘
• % İmg2 (row ..., column ...,) = img1’ ... İndicates row range
and column age to be updated with the extracted column image in img1
• % img2(nn=2 -> 1*row+1: 2*row, 1*col+1:2*col) = img1‘
• End

Mnistdisp.m 3
• imagesc(img2,[0 1]);
• colormap gray;
• axis equal;
• axis off;
• drawnow;
• err=0; % not used

Minimize.m
• function [X, fX, i] = minimize(X, f, length, varargin)
• % Minimize a differentiable multivariate function.
• %% Usage: [X, fX, i] = minimize(X, f, length, P1,
P2, P3, ... )
• %% where the starting point is given by "X" (D by
1), and the function named in% the string "f",
must return a function value and a vector of
partial% derivatives of f wrt X

Backprop.m calling minimize.m
• %%%%%%%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3
LINESEARCHES
• max_iter=3;
• VV = [w1(:)' w2(:)' w3(:)' w4(:)' w5(:)' w6(:)' w7(:)'
w8(:)']';
• Dim = [l1; l2; l3; l4; l5; l6; l7; l8; l9];
• [X, fX] = minimize(VV,'CG_MNIST',max_iter,Dim,data);
• VV is starting point X at minimize.m
• the function named in% the string "f " is 'CG_MNIST‘.

Minimize.m
• There is an older version of this program at coursera-
ml-master package’s mlclass-ex5 and others under the
name:
• Fmincg
• Copyright (C) 2001 and 2002 by Carl Edward
Rasmussen. Date 2002-02-13
• Minimize.m is the newer and better
explained/documented version
• % Copyright (C) 2001 - 2006 by Carl Edward Rasmussen
(2006-09-08).

Minimize.m 1
• function [X, fX, i] = minimize(X, f, length,
varargin)
• The function returns the found% solution "X",
and a vector of function values "fX" indicating
the progress made% and "i" the number of
iterations used.

Minimize.m 2
• Backprop.m
• [X, fX] =
minimize(VV,'CG_MNIST',max_iter,Dim,data);
• Minimize.m
• function [X, fX, i] = minimize(X, f, length, varargin)
• CG_MINIST.m
• function [f, df] = CG_MNIST(VV,Dim,XX);

Minimize.m 3
• Backprop.m
• [X, fX] =
• X is minimized VV.
• fx= is df in CG_MNIST.m = [dw1(:)' dw2(:)' dw3(:)'
dw4(:)' dw5(:)' dw6(:)' dw7(:)' dw8(:)' ]';
• Max_iter=3
• Dim is in backprop.m= [l1; l2; l3; l4; l5; l6; l7; l8; l9];
• Data is in backprop.m = data=[]; for kk=1:10
data=[data batchdata(:,:,(tt-1)*10+kk)]; end
• 10 minibatches combined into 1 larger batch

CG_MNIST 1
• function [f, df] = CG_MNIST(VV,Dim,XX);
• l1 = Dim(1); % 784
• l2 = Dim(2); % 1000
• l3 = Dim(3); % 500
• l4= Dim(4); % 250
• l5= Dim(5); % 30
• l6= Dim(6); % 250
• l7= Dim(7); % 500
• l8= Dim(8); % 1000
• l9= Dim(9); % 784
• N = size(XX,1); % 1000
• Set lengths of weights. XX is the batch data of 10 x 100 = 1000
items.

CG_MNIST 2
• w1 = reshape(VV(1:(l1+1)*l2),l1+1,l2); % 785x1000
• xxx = (l1+1)*l2; % 785000
• w2 = reshape(VV(xxx+1:xxx+(l2+1)*l3),l2+1,l3); % 1001x500
• xxx = xxx+(l2+1)*l3; % 1285500
• xxx = xxx+(l3+1)*l4;
• xxx = xxx+(l4+1)*l5;
• xxx = xxx+(l5+1)*l6;
• xxx = xxx+(l6+1)*l7;
• xxx = xxx+(l7+1)*l8;
• Extract weights between layers

CG_MNIST 3.1
• XX = [XX ones(N,1)]; % 1000x785
• w1probs = 1./(1 + exp(-XX*w1));
• w1probs = [w1probs ones(N,1)]; % 1000x785 * 785x1000 =
1000x100->1000x1001
• w2probs = 1./(1 + exp(-w1probs*w2));
• w2probs = [w2probs ones(N,1)]; % 1000x1001 * 1001x500
= 1000x500->1000x501
• w3probs = [w3probs ones(N,1)]; % 1000x501 * 501x250 =>
1000x251
• w4probs = w3probs*w4;
• w4probs = [w4probs ones(N,1)]; % 1000x251 * 251x30 =
1000x31

CG_MNIST 3.2
• W1,w2,w3probs are calculated with sigma
function where as w4 is calculated as a
Gaussian function.
• A forward processing and a backward
processing is done for reconstruction.
• Processing is done for 1000 batch input items.
• (10x100).

CG_MNIST 3.3
• w5probs = [w5probs ones(N,1)]; % 1000x31 *
31x250 => 1000x251
• w6probs = [w6probs ones(N,1)]; % 1000x251
* 251x500 => 1000x501
• w7probs = [w7probs ones(N,1)]; % 1000x501
* 501x1000 => 1000x1001
• XXout = 1./(1 + exp(-w7probs*w8)); %
1000x1001 * 1001x784 = 1000x784

CG_MNIST 4.1
• f = -1/N*sum(sum( XX(:,1:end-1).*log(XXout) +
(1-XX(:,1:end-1)).*log(1-XXout)));
• % 1000x785-1 .* 1000x784 --> 1x1
• This is the function value returned for this
calling of the ‘function’ CG_MNIST by
minimize for this databatch of 1000.
• If carefully viewed one can notice the
similarity of this algorithm to lrcostfunction.m
of coursera-ml-mastermlclass-ex3.

CG_MNIST 4.2
• f = -1/N*sum(sum( XX(:,1:end-1).*log(XXout) +
(1-XX(:,1:end-1)).*log(1-XXout)));
• Vs
• h_of_x = sigmoid(X * theta);
• J = 1 / m * sum( -1 * y' * log(h_of_x) - (1-y') *
log(1 - h_of_x) );
• This is practically a cost function.

CG_MNIST 5.1
• IO = 1/N*(XXout-XX(:,1:end-1)); % 1000x784
• Ix8=IO; % 1000x784
• dw8 = w7probs'*Ix8;
• % 1001x1000 * 1000x784 = 1001x784
• The difference of XXout and XX(data) is divided by the number
of cases and this difference is reverse multiplied with
w7probs. This gives us the dw8 difference caused by this
layer. The outcome IO of each step is similarly reverse
multiplied with the weights of each layer and the difference
caused by that layer is found.

CG_MNIST 5.2
• Ix7 = (Ix8*w8').*w7probs.*(1-w7probs);
• % 1000x784 * 784x1001 .* 1000x1001 .* 1-1000x1001 = 1000x1001
• Ix7 = Ix7(:,1:end-1); % 1000x1000
• dw7 = w6probs'*Ix7; % 501x1000 * 1000x1000 = 501x1000
• Ix6 = (Ix7*w7').*w6probs.*(1-w6probs); % 1000x1000 * 1000x501 =
1000x501I
• x6 = Ix6(:,1:end-1); % 1000x500
• dw6 = w5probs'*Ix6; % 251x500
• % 1000x500 * 500x251 =1000x251
• Ix5 = Ix5(:,1:end-1); % 1000x250
• dw5 = w4probs'*Ix5; % 31x250
• Ix4 = (Ix5*w5'); % 1000x250 * 250x31 = 1000x31
• Ix4 = Ix4(:,1:end-1); % 1000x30
• dw4 = w3probs'*Ix4; % 251x100 * 1000x30 = 251x30

CG_MNIST 5.3
• % 1000x30 * 30x251 .* 1000x251 = 1000x251
• Ix3 = Ix3(:,1:end-1); % 1000x250
• dw3 = w2probs'*Ix3; % 501x250
• % 1000x250 * 250x501 .*1000x501 = 1000x501
• Ix2 = Ix2(:,1:end-1); % 1000x500
• dw2 = w1probs'*Ix2; % 1001x500
• %1000x500 * 500x1001 = 1000x1001
• Ix1 = Ix1(:,1:end-1); % 1000x1000
• dw1 = XX'*Ix1; % 785x1000

CG_MNIST 5.4
• df = [dw1(:)' dw2(:)' dw3(:)' dw4(:)' dw5(:)'
dw6(:)' dw7(:)' dw8(:)' ]';
• % 2837314x1
• Finally the df return parameter of CG_MNIST
is set.

Minimize.m to backprop.m connection
• Minimize.m returns the minimized version of
VV in the variable X.
• [X, fX] =

End of backprop.m
• w1 = reshape(X(1:(l1+1)*l2),l1+1,l2); xxx = (l1+1)*l2;
• w2 = reshape(X(xxx+1:xxx+(l2+1)*l3),l2+1,l3); xxx = xxx+(l2+1)*l3;
• w3 = reshape(X(xxx+1:xxx+(l3+1)*l4),l3+1,l4); xxx = xxx+(l3+1)*l4;
w4 = reshape(X(xxx+1:xxx+(l4+1)*l5),l4+1,l5); xxx = xxx+(l4+1)*l5;
w8 = reshape(X(xxx+1:xxx+(l8+1)*l9),l8+1,l9);
• Resetting the weight values according to the X value returned by
minimize.m
• save mnist_weights w1 w2 w3 w4 w5 w6 w7 w8
• save mnist_error test_err train_err;
• Save w values and error values for this epoche.

Mnistauto 4

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Mnistauto 4

Semelhante a Mnistauto 4 (20)

Mais de Ali Rıza SARAL

Mais de Ali Rıza SARAL (8)

Último

Último (20)

Mnistauto 4