1. Backpropagation Derivation in Convolutional Neural
Networks
Punnoose A K
punnoose07@gmail.com
August 10, 2020
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 1 / 30
2. Contents
1 A simple CNN
2 Differential for various functions
3 Update equations for various CNN parameters
4 General order of update
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 2 / 30
3. A simple 2 Layered, Depth 1 CNN
X
K J1
f
C1
S1
G J2
f
C2
S2
A
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 3 / 30
4. Notations
1 X : Input feature. Assuming only 1 feature stream
2 K : Kernel. Depth 1
3 f : Rectified Linear Unit
4 C1 : First Convolution Output before ReLU
5 J1 : First Convolution Output after ReLU
6 S1 : Max Pooled Output, first layer
7 G : Kernel at second layer
8 C2 :Second Convolution Output before ReLU
9 J2 : Second Convolution Output after ReLU
10 S2 : Max Pooled Output, second layer
11 A : Flattened S2. I is the input size. A1 to AI
12 uij : weight between input and hidden layer
13 vjk: weight between hidden and output layer
14 yj : net input to hidden layer node j
15 zj : net input to output layer node j
16 bj : output of hidden layer node j
17 cj : output of output layer node j
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 4 / 30
5. What is to be learned here?
1 uij for 1 ≤ i ≤ I, 1 ≤ j ≤ H
2 vjk for 1 ≤ j ≤ H, 1 ≤ k ≤ O
3 Kernel G
4 Kernel K
Ignore the bias for time being
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 5 / 30
6. CNN Operations
1 C1=relu(Convolution(X,K))
2 S1=MaxPooling(C1)
3 C2=relu(Convolution(S1,G))
4 S2=MaxPooling(C2)
5 A=flatten(S2)
6 Feed A as the input to the fully connected network
7 sigmoid function at hidden layer
8 softmax function at output layer
9 Cross entropy as the loss function
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 6 / 30
7. Differential of Sigmoid Function
Let f (x) =
1
1 + e−x
f (x) =
e−x
(1 + e−x )2
=
1
1 + e−x
e−x
1 + e−x
=
1
1 + e−x
(1 −
1
1 + e−x
)
=f (x)(1 − f (x))
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 7 / 30
8. Differential of Softmax Function
Softmax function is defined as
fi = f (i; c1, c2...cO) =
eci
i eci
For finding the differential, there are 2 cases.
i.e, dfi
dck
where i = k
and dfi
dck
, where i = k.
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 8 / 30
9. Differential of Softmax Function
For sake of simplicity consider 2 class output, i.e, O = 2.
f1 = f (1; c1, c2) =
ec1
ec1 + ec2
Case: i = k
df1
dc1
=
(ec1 + ec2 )ec1 − ec1 ec1
(ec1 + ec2 )2
=
(ec1 + ec2 )ec1
(ec1 + ec2 )2
−
ec1 ec1
(ec1 + ec2 )2
=
ec1
(ec1 + ec2 )
−
ec1
(ec1 + ec2 )
2
=
ec1
(ec1 + ec2 )
1 −
ec1
(ec1 + ec2 )
=f1(1 − f1)
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 9 / 30
10. Differential of Softmax Function
Case: i = k
df1
dc2
= −
ec1 ec2
(ec1 + es2 )2
= −
ec1
(ec1 + es2 )
ec2
(ec1 + es2 )
= − f1f2
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 10 / 30
11. Differential of Softmax Function
Generalizing,
dfi
dck
=
fi (1 − fi ) i = k
−fi fk i = k
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 11 / 30
12. Differential of Cross Entropy Loss
Cross entropy loss between the output vector y and the target vector t of
size O is given by
L(y, t) = −
O
i
yi log(ti )
dL
dyi
= −
yi
ti
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 12 / 30
13. Differential of Rectified Linear Unit (ReLU)
ReLU is defined as
f (x) =max(0, x)
df
dx
=
1 x > 0
0 otherwise
In practise leaky ReLU is used, which is defined as
f (x) =max(c0, x)
df
dx
=
1 x > 0
c otherwise
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 13 / 30
14. Differential of Max Pooling
f (10, 20, 30, 40) = 40
Now if we increase 10 to 10.1, still f (10.1, 20, 30, 40) = 40
If we increase 30 to 30.1, still f (10.1, 20, 30, 40) = 40
Now if we increase 40 to 40.1, f (10.1, 20, 30, 40) = 40.1
if we increase 40 to 50, f (10.1, 20, 30, 50) = 50
The definition of differential is f (x) = limh→0
f (x+h)−f (x)
h
For the above function, the output changes linearly with the input(max).
So f (a, b, c, d) = 1
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 14 / 30
15. Backpropagation at Hidden to Output Layer
Input: X
Output: c
Target : t
z c
The output layer is divided into 2 sublayers shown in green and red. z is
the weighted sum and is passed to all the output nodes. The layer in red
actually performs the softmax operation and the softmax output is c.
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 15 / 30
17. Derivation Continued
= −tk +
o=k
tock + tkck
= −tk +
o
tock
= −tk + ck
o
to
= −tk + ck1
dL
dzk
= ck − tk
dzk
dvjk
=bj
dL
dvjk
=(ck − tk)bj
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 17 / 30
18. Backpropagation at Input to Hidden Layer
dL
duij
=
k=O
k
dL
dzk
dzk
dbj
dbj
dyj
dyj
duij
=
dbj
dyj
dyj
duij
k
dL
dzk
dzk
dbj
= bj (1 − bj )xi
k
(ck − tk)vjk
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 18 / 30
19. Backpropagation for the kernels
X
K J1
f
C1
S1
G J2
f
C2
S2
A
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 19 / 30
20. Backpropagation for kernel G
We need to find dL
dG and dL
dK
dL
dG
=
dL
dA
dA
dS2
dS2
dC2
dC2
dJ2
dJ2
dG
As A = S2 flattened,
dL
dG
=
dL
dS2
dS2
dC2
dC2
dJ2
dJ2
dG
S2 is a max pooled version of C2. Operation wise, it is effectively selecting
the max value out of a set of numbers. Change in non max value in the
input set does not affect the output. A small change in max value value in
the input set affect linearly in the output. So it can be said that maximum
value out of a set, is locally linear. So dS2
dC2
= 1.
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 20 / 30
21. Backpropagation for kernel G
dL
dG
=
dL
dS2
dC2
dJ2
dJ2
dG
=
dL
dJ2
dJ2
dG
dL
dS2
is dL
dA written in square form.
dL
dAi
=
j k
dL
dzk
dzk
dbj
dbj
dyj
dyj
dAi
=
j k
(ck − tk)vjkbj (1 − bj )uij
dL
dA
=
dL
dA1
dL
dA2
...
dL
dAI
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 21 / 30
22. Backpropagation for kernel G
dL
dS2
is dL
dA written in square form.
dL
dS2
=
dL
dA1
. .
. . .
. . dL
dAI
dC2
dJ2
=
1 x > 0
c otherwise
dL
dG
=
dL
dJ2
dJ2
dG
=conv(
dL
dJ2
, S1)
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 22 / 30
23. Backpropagation for the kernels
X
K J1
f
C1
S1
G J2
f
C2
S2
A
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 23 / 30
24. Backpropagation for kernel K
dL
dK
= conv(X,
dL
dJ1
)
We need to find dL
dJ1
.
dL
dJ1
=
dL
dS1
dS1
dC1
dC1
dJ1
dC1
dJ1
=
1 x > 0
c otherwise
dS1
dC1
=1
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 24 / 30
25. Backpropagation for kernel K
dL
dS1
=
dJ2
dG
Define G =1800
rotated G
Then,
dL
dS1
=fullconv(G ,
dL
dJ2
)
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 25 / 30
26. Full convolution explained
G11G12
G21G22 1800 rotate to
G22G21
G12G11
The convolution output is given as J2 =
J211 J212
J221 J222
The input matrix can be written as S1=
S131 S132 S133
S121 S122 S123
S111 S112 S113
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 26 / 30
28. Full convolution continued
S131 =G21
dL
dJ221
S132 =G22
dL
dJ221
+ G21
dL
dJ222
S133 =G22
dL
dJ222
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 28 / 30
29. Order of update
1 Update hidden to output weights
2 Update input to hidden weights
3 Update kernel G
4 Update kernel K
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 29 / 30
30. Queries??
Connect me at punnoose07@gmail.com
Punnoose A K (punnoose07@gmail.com) Backpropagation Derivation in Convolutional Neural NetworksAugust 10, 2020 30 / 30