Cnn backpropagation derivation

Backpropagation Derivation in Convolutional Neural
Networks
Punnoose A K
punnoose07@gmail.com
August 10, 2020
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 1 / 30

Contents
1 A simple CNN
2 Diﬀerential for various functions
3 Update equations for various CNN parameters
4 General order of update

A simple 2 Layered, Depth 1 CNN
X
K J1
f
C1
S1
G J2
f
C2
S2
A

Notations
1 X : Input feature. Assuming only 1 feature stream
2 K : Kernel. Depth 1
3 f : Rectiﬁed Linear Unit
4 C1 : First Convolution Output before ReLU
5 J1 : First Convolution Output after ReLU
6 S1 : Max Pooled Output, ﬁrst layer
7 G : Kernel at second layer
8 C2 :Second Convolution Output before ReLU
9 J2 : Second Convolution Output after ReLU
10 S2 : Max Pooled Output, second layer
11 A : Flattened S2. I is the input size. A1 to AI
12 uij : weight between input and hidden layer
13 vjk: weight between hidden and output layer
14 yj : net input to hidden layer node j
15 zj : net input to output layer node j
16 bj : output of hidden layer node j
17 cj : output of output layer node j

What is to be learned here?
1 uij for 1 ≤ i ≤ I, 1 ≤ j ≤ H
2 vjk for 1 ≤ j ≤ H, 1 ≤ k ≤ O
3 Kernel G
4 Kernel K
Ignore the bias for time being

CNN Operations
1 C1=relu(Convolution(X,K))
2 S1=MaxPooling(C1)
3 C2=relu(Convolution(S1,G))
4 S2=MaxPooling(C2)
5 A=ﬂatten(S2)
6 Feed A as the input to the fully connected network
7 sigmoid function at hidden layer
8 softmax function at output layer
9 Cross entropy as the loss function

Diﬀerential of Sigmoid Function
Let f (x) =
1
1 + e−x
f (x) =
e−x
(1 + e−x )2
=
1
1 + e−x
e−x
1 + e−x
=
1
1 + e−x
(1 −
1
1 + e−x
)
=f (x)(1 − f (x))

Differential of Softmax Function
Softmax function is defined as
fi = f (i; c1, c2...cO) =
eci
i eci
For finding the differential, there are 2 cases.
i.e, dfi
dck
where i = k
and dfi
dck
, where i = k.

For sake of simplicity consider 2 class output, i.e, O = 2.
f1 = f (1; c1, c2) =
ec1
ec1 + ec2
Case: i = k
df1
dc1
=
(ec1 + ec2 )ec1 − ec1 ec1
(ec1 + ec2 )2
=
(ec1 + ec2 )ec1
(ec1 + ec2 )2
−
ec1 ec1
(ec1 + ec2 )2
=
ec1
(ec1 + ec2 )
−
ec1
(ec1 + ec2 )
2
=
ec1
(ec1 + ec2 )
1 −
ec1
(ec1 + ec2 )
=f1(1 − f1)

Case: i = k
df1
dc2
= −
ec1 ec2
(ec1 + es2 )2
= −
ec1
(ec1 + es2 )
ec2
(ec1 + es2 )
= − f1f2

Generalizing,
dfi
dck
=
fi (1 − fi ) i = k
−fi fk i = k

Diﬀerential of Cross Entropy Loss
Cross entropy loss between the output vector y and the target vector t of
size O is given by
L(y, t) = −
O
i
yi log(ti )
dL
dyi
= −
yi
ti

Differential of Rectified Linear Unit (ReLU)
ReLU is defined as
f (x) =max(0, x)
df
dx
=
1 x > 0
0 otherwise
In practise leaky ReLU is used, which is defined as
f (x) =max(c0, x)
df
dx
=
1 x > 0
c otherwise

Differential of Max Pooling
f (10, 20, 30, 40) = 40
Now if we increase 10 to 10.1, still f (10.1, 20, 30, 40) = 40
If we increase 30 to 30.1, still f (10.1, 20, 30, 40) = 40
Now if we increase 40 to 40.1, f (10.1, 20, 30, 40) = 40.1
if we increase 40 to 50, f (10.1, 20, 30, 50) = 50
The definition of differential is f (x) = limh→0
f (x+h)−f (x)
h
For the above function, the output changes linearly with the input(max).
So f (a, b, c, d) = 1

Backpropagation at Hidden to Output Layer
Input: X
Output: c
Target : t
z c
The output layer is divided into 2 sublayers shown in green and red. z is
the weighted sum and is passed to all the output nodes. The layer in red
actually performs the softmax operation and the softmax output is c.

Derivation
dL
dvjk
=
dL
dzk
dzk
dvjk
dL
dzk
=
O
o=1
dL
dco
dco
dzk
=
dL
dck
dck
dzk
+
o=k
dL
dco
dco
dzk
= −
tk
ck
ck(1 − ck) +
o=k
−to
co
(−cock)
= − tk(1 − ck) +
o=k
tock
= − tk + tkck +
o=k
tock

Derivation Continued
= −tk +
o=k
tock + tkck
= −tk +
o
tock
= −tk + ck
o
to
= −tk + ck1
dL
dzk
= ck − tk
dzk
dvjk
=bj
dL
dvjk
=(ck − tk)bj

Backpropagation at Input to Hidden Layer
dL
duij
=
k=O
k
dL
dzk
dzk
dbj
dbj
dyj
dyj
duij
=
dbj
dyj
dyj
duij
k
dL
dzk
dzk
dbj
= bj (1 − bj )xi
k
(ck − tk)vjk

Backpropagation for the kernels
X
K J1
f
C1
S1
G J2
f
C2
S2
A

Backpropagation for kernel G
We need to find dL
dG and dL
dK
dL
dG
=
dL
dA
dA
dS2
dS2
dC2
dC2
dJ2
dJ2
dG
As A = S2 flattened,
dL
dG
=
dL
dS2
dS2
dC2
dC2
dJ2
dJ2
dG
S2 is a max pooled version of C2. Operation wise, it is effectively selecting
the max value out of a set of numbers. Change in non max value in the
input set does not affect the output. A small change in max value value in
the input set affect linearly in the output. So it can be said that maximum
value out of a set, is locally linear. So dS2
dC2
= 1.

dL
dG
=
dL
dS2
dC2
dJ2
dJ2
dG
=
dL
dJ2
dJ2
dG
dL
dS2
is dL
dA written in square form.
dL
dAi
=
j k
dL
dzk
dzk
dbj
dbj
dyj
dyj
dAi
=
j k
(ck − tk)vjkbj (1 − bj )uij
dL
dA
=





dL
dA1
dL
dA2
...
dL
dAI






dL
dS2
is dL
dA written in square form.
dL
dS2
=


dL
dA1
. .
. . .
. . dL
dAI


dC2
dJ2
=
1 x > 0
c otherwise
dL
dG
=
dL
dJ2
dJ2
dG
=conv(
dL
dJ2
, S1)

Backpropagation for the kernels
X
K J1
f
C1
S1
G J2
f
C2
S2
A

Backpropagation for kernel K
dL
dK
= conv(X,
dL
dJ1
)
We need to ﬁnd dL
dJ1
.
dL
dJ1
=
dL
dS1
dS1
dC1
dC1
dJ1
dC1
dJ1
=
1 x > 0
c otherwise
dS1
dC1
=1

Backpropagation for kernel K
dL
dS1
=
dJ2
dG
Deﬁne G =1800
rotated G
Then,
dL
dS1
=fullconv(G ,
dL
dJ2
)

Full convolution explained
G11G12
G21G22 1800 rotate to
G22G21
G12G11
The convolution output is given as J2 =
J211 J212
J221 J222
The input matrix can be written as S1=
S131 S132 S133
S121 S122 S123
S111 S112 S113

Full convolution continued
where,
S111 =G11
dL
dJ211
S112 =G12
dL
dJ211
+ G11
dL
dJ212
S113 =G12
dL
dJ212
S121 =G21
dL
dJ211
+ G11
dL
dJ221
S122 =G22
dL
dJ211
+ G21
dL
dJ212
+ G12
dL
dJ221
+ G11
dL
dJ222
S123 =G22
dL
dJ212
+ G12
dL
dJ222

Full convolution continued
S131 =G21
dL
dJ221
S132 =G22
dL
dJ221
+ G21
dL
dJ222
S133 =G22
dL
dJ222

Order of update
1 Update hidden to output weights
2 Update input to hidden weights
3 Update kernel G
4 Update kernel K

Queries??
Connect me at punnoose07@gmail.com

Cnn backpropagation derivation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Cnn backpropagation derivation

Semelhante a Cnn backpropagation derivation (20)

Último

Último (20)

Cnn backpropagation derivation