LeNet & GoogLeNet

LeNet: Gradient-based learning applied to document recognition.
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner
AT&T Shannon Lab, 1998년 발표
1. CNN(LeNet)을 이용하여 직접 특징을 추출하지 않고 학습된
파라미터를 이용하여 결과 출력했다.
2. GTN(Graph Transformer Networks)을 통해 파라미터 튜닝,
레이블링, 경험에 의존한 방법들의 사용을 줄였다.
3. 더 많은 양의 데이터, 계산 능력 향상, 학습 알고리즘의 향상은
인식 시스템의 성능을 향상시킨다.
여기서는 LeNet의 여러 모델 중에 LeNet-5을 알아보도록 하겠습니다.
Example of LeNet-5 in action, http://yann.lecun.com/exdb/lenet/
InputLayer-5Layer-1 Layer-3
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

LeNet-5
참고,
Output size = (((Input_size - Filter_size) + (2 * Padding_size)) / (Stride_size)) + 1
Conv trainable params
= Weight + Bias
= (Filter_size X Before_Num_Filter X Current_Num_Filter) + Current_Num_Filter
Pool trainable params = (Coefficient + Bias) X Filters
FC trainable params = (Input_size X Output_size) + Bias_size
https://engmrk.com/lenet-5-a-classic-cnn-architecture/
Kernel Size Feature Map Stride Padding Output size Trainable Parameters Activation
Conv(C1) 5 X 5 6 1 0 (((32 - 5) + (0 * 2)) / 1) + 1 = 28 , 28X28 ((5 X 5) X 1 X 6) + 6 = 156 tanh
Pool(S2) 2 X 2 6 2 0 (((28 - 2) + (0 * 2)) / 2) + 1 = 14 , 14X14 (1 + 1) X 6 = 12 tanh
Conv(C3) 5 X 5 16 1 0 (((14 - 5) + (0 * 2)) / 1) + 1 = 10 , 10X10 ((5 X 5) X 6 X 10) + 16 = 1,516 tanh
Pool(S4) 2 X 2 16 2 0 (((10 - 2) + (0 * 2)) / 2) + 1 = 5 , 5X5 (1 + 1) X 16 = 32 tanh
Conv(C5) 5 X 5 120 1 0 (((5 - 5) + (0 * 2)) / 1) + 1 = 1, 1X1 ((5 X 5) X 16 X 120) + 120 = 48,120 tanh
FC(F6) - - - - 84 (120 X 84) + 84 = 10,164 tanh
FC(Output) - - - - 10 84 X 10 + 10 = 850 softmax
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

LeNet-5
Conv(C1) Pool(S2)
model.add(Conv2D(6, kernel_size=(5, 5), strides=(1, 1), padding='valid', activation='tanh', input_shape=input_shape))
model.add(AveragePooling2D((2, 2), strides=(2, 2)))

LeNet-5
Conv(C3)
model.add(Conv2D(16, kernel_size=(5, 5), strides=(1, 1), padding='valid', activation='tanh'))

LeNet-5
Conv(C5)Pool(S4)
model.add(AveragePooling2D((2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(120, activation='tanh'))

LeNet-5
FC(F6) FC(Output)
model.add(Dense(84, activation='tanh'))
model.add(Dense(num_labels, activation='softmax'))

참고문헌
1. LeCun, Yann, et al. "Object recognition with gradient-based learning." Shape, contour and grouping in computer vision. Springer, Berlin,
Heidelberg, 1999. 319-345.
2. LeNet-5 – A Classic CNN Architecture
3. 라온피플 머신러닝 아카데미, [Part V . Best CNN Architecture] 2. LeNet
4. [논문 요약 3] Gradient-Based Learning Applied to Document Recognition
5. KerasでLeNet-5を実装してKuzushiji-MNISTを分類する
6. 컨볼루션 신경망 레이어 이야기

GoogLeNet: Going Deeper with Convolutions
Szegedy, Christian, et al. "Going Deeper with Convolutions. arXiv e-prints, page." arXiv preprint arXiv:1409.4842 (2014).
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich
Google Inc. 2014년 발표
1. ImageNet Large-Scale Visual Recognition Challenge 2014(ILSVRC14)에서
코드네임 Inception이라 불리우는 Deep Convolutional Neural Network
Architecture 제안
2. Architecture의 주요 특징
a. 네트워크 안에서 컴퓨팅 리소스의 활용을 향상
b. 컴퓨팅 예산을 일정하게 유지시키면서, 네트워크의 깊이와 너비를
증가시킬 수 있도록 설계
3. ILSVRC14 에서 제출한 모델은 22레이어의 deep network구조를 갖는
GoogLeNet (여담이지만, 멋지게 작명했네요.^^)
이번 발표에서는, 한정된 컴퓨팅 예산안에서 네트워크의 깊이와 너비를 증가시킬 수
있었던 인셉션 모델에 대해서 같이 알아보도록 하겠습니다.
여기서 공유하는 내용은 2014년 발표한 GoogLeNet 으로 Inception Module은 계속해서
성능을 개선하지만 본 내용을 이해하면 추후 발표되는 여러 논문들도 충분히 이해가
가능합니다.

GoogLeNet
논문에서는 코드네임 Inception으로 불리우는 Computer Vision에서 효율적인 Deep Neural Network Architecture에 초점을 두고 있습니다.
여기서, Inception은 “We need to go deeper”라는 유명한 인터넷 밈과 Lin의 논문 Network-in-Network에서 유래했다고 하네요.
논문에서 “Deep”이란 단어는 두 가지를 의미합니다.
1. “Inception module” 이라는 형태로 새로운 수준의 조직을 도입하는 것
2. 네트워크의 깊이를 증가시키는 것
GoogLeNet은 인셉션 모듈을 이용하여 망을 깊게 구성하여 성능을 향상 시키면서,
연산량의 증가를 억제할 수 있는 CNN 구조를 개발하는데 성공!
오른쪽 그림은AlexNet과 GoogLeNet Architecture을 비교한 그림입니다.
GoogLeNet은 망의 깊이(22 레이어)는 훨씬 깊은데 trainable parameters의 수는
약 1/12 , 전체 연산량 수(1.5B)도 AlexNet (2B)에 비교하여 적습니다.
Szegedy, Christian, et al. "Going Deeper with Convolutions. arXiv e-prints, page." arXiv preprint arXiv:1409.4842 (2014).

GoogLeNet
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
일반적인 Conv 구조 Lin(2013)이 제안한 MLP Conv 구조
비선형적 관계를 표현하기 위해 일반적인 Conv 구조 사이에
MLP(Multi Layer Perception)을 넣었습니다.Network-in-Network
아이디어: 일반적인 Conv 구조는 순차적으로 Conv filter가 움직이면서 선형적으로 결과를 뽑아냅니다. 그런데, 데이터의 분포가 선형적으로 표현될 수
없는 비선형적 분포를 나타낸다면? 이를 해결하기 위해 Conv 레이어 사이에 MLP을 넣어서 비선형 관계를 표현 할 수 있도록 구성합니다.
MLP을 사용한 이유?
Conv 필터처럼 역전파을 통해 학습될 수
있고, 그 자체가 깊은 구조를 가질 수
있기 때문이라고 합니다.

GoogLeNet
Network-in-Network
논문에서 구성한 네트워크는 MLP Conv를 세 개 쌓고, 마지막에 Fully-Connected Layer를
넣는 대신 Global Average Pooling을 넣었습니다.
오버피팅을 방지하기 위해서, 마지막 단에 FC가 아니라,
Average Pooling을 구성한 것이 큰 특징 입니다.
여기서, i,j: feature map에서 픽셀 위치, k: feature map에서의 k번째 채널, n: MLP Conv의 n번째 레이어
일반적인 Conv 계산
그럼 이건 뭡니까!!?

GoogLeNet
Network-in-Network: Cascaded Cross Channel Parametric Pooling（CCCP）
CCCP는 일반적인 Convolution 연산을 거치고, 마지막에 하나의 값으로 매핑합니다.
결국 MLP를 통해 구하는 관계는 일반적인 CNN과 1x1 Conv의 결합으로도 표현할 수 있습니다.
결과적으로 1x1 Conv를 적절하게 사용하면 비선형적 함수를 더 잘 만들어낼 수 있게 되는 것 입니다. 또한 1x1 Conv의 장점은 이것만이 아닙니다.
1x1 Conv는 채널 단위에서 Pooling을 해줍니다. 즉 1x1 Conv의 수를 입력의 채널보다 작게 하면 dimension reduction, 차원 축소가 가능한 것이죠.
이전 feature map 정보를 해당 채널의 가중치에 곱하여 1X1 Conv 처럼 표현할 수 있습니다.

GoogLeNet
GoogleNet
아키텍처의 주요 아이디어
1. Convolutional vision network에서 optimal local sparse
structure와 dense components 을 구성하는 것
여기서, dense components의 구현을 위해 단순히 깊게
레이어를 쌓는다면 오버피팅과 계산량 증가의 문제가 생긴다.
따라서, 네트워크를 sparse 하게 구성한다.
이때, Inception module에서 kernel size은 1X1, 3X3, 5X5로
구성했는데, 필요성 보다는 편리함 때문에 위와 같이 구성.
2. Naive inception module의 경우, 한개의 커널을 사용하는
것보다 계산량이 많아진다. 이건, 처음 목표로 정했던 깊은 망을
구성하면서 연산량을 줄일 수 있는 컨셉에서 벗어난다.
이 문제를 해결하기 위해서, 본 논문에서는 1X1 Conv을 통해
차원을 축소하여 연산량을 줄일 수 있었다. 또한, 1X1 Conv은
ReLU(Rectified Linear Unit)가 포함되어 있다.
Naive Inception Module
Inception module with dimension reduction

GoogLeNet
Naive Inception Module의 연산량이 854M !! 더 깊게 망을 구성하기 위해서는 연산량을 줄일 필요가 있습니다. 본 논문에서는, 연산량을
줄이기 위해 feature depth을 줄일 수 있는 1X1 convolutions “bottleneck” layer을 추가합니다.
참고,
Output size = (((Input_size - Filter_size) + (2 * Padding_size)) / (Stride_size)) + 1
Conv trainable params
= Weight + Bias
= (Filter_size X Before_Num_Filter X Current_Num_Filter) + Current_Num_Filter
CS231n: Convolutional Neural Networks for Visual Recognition-Lecture 9: CNN Architectures(AlexNet, VGG, GoogLeNet, ResNet, etc)
Inception Module: Naive VS with dimension reduction

GoogLeNet
inception_3b_1x1 = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_3b/1x1', kernel_regularizer=l2(0.0002))(inception_3a_output)
inception_3b_3x3_reduce = Conv2D(128, (1,1), padding='same', activation='relu', name='inception_3b/3x3_reduce', kernel_regularizer=l2(0.0002))(inception_3a_output)
inception_3b_3x3_pad = ZeroPadding2D(padding=(1, 1))(inception_3b_3x3_reduce)
inception_3b_3x3 = Conv2D(192, (3,3), padding='valid', activation='relu', name='inception_3b/3x3', kernel_regularizer=l2(0.0002))(inception_3b_3x3_pad)
inception_3b_5x5_reduce = Conv2D(32, (1,1), padding='same', activation='relu', name='inception_3b/5x5_reduce', kernel_regularizer=l2(0.0002))(inception_3a_output)
inception_3b_5x5_pad = ZeroPadding2D(padding=(2, 2))(inception_3b_5x5_reduce)
inception_3b_5x5 = Conv2D(96, (5,5), padding='valid', activation='relu', name='inception_3b/5x5', kernel_regularizer=l2(0.0002))(inception_3b_5x5_pad)
inception_3b_pool = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding='same', name='inception_3b/pool')(inception_3a_output)
inception_3b_pool_proj = Conv2D(64, (1,1), padding='same', activation='relu', name='inception_3b/pool_proj', kernel_regularizer=l2(0.0002))(inception_3b_pool)
inception_3b_output = Concatenate(axis=1, name='inception_3b/output')([inception_3b_1x1,inception_3b_3x3,inception_3b_5x5,inception_3b_pool_proj])
GoogLeNet in Keras

GoogLeNet
Illustrated: 10 CNN Architectures
Classifier output에서는 마지막 컨볼루션 레이어
다음에 여러개의 값비싼 FC을 구성하는 대신에
Average Pooling을 해서 피쳐맵을 구성하고,
classification 전에 FC을 한번만 구성하여 연산량을
크게 줄였습니다.
Classifier output
Stem은 일반적인
Conv 레이어
구조를 갖는다.
Auxiliary Classifier
망을 깊게 쌓으면서 학습 과정에서 발생하는
vanishing gradient 문제를 해결하고, 수렴을 더
좋게 하기위해 구성하였습니다.
학습이 끝나고 실제 추론 과정에서는 제거합니다.
참고: 라온피플 - GoogLeNet(5)

참고문헌
1. Szegedy, Christian, et al. "Going Deeper with Convolutions. arXiv e-prints, page." arXiv preprint arXiv:1409.4842 (2014).
2. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
3. Network in Network（NIN）を理解してみる
4. CS231n: Convolutional Neural Networks for Visual Recognition-Lecture 9: CNN Architectures(AlexNet, VGG, GoogLeNet, ResNet, etc)
5. [논문] GoogLeNet - Inception 리뷰 : Going Deeper with Convolutions
6. 라온피플의 머신러닝 아카데미(12) - GoogLeNet (3)
7. Inception(GoogLeNet) 리뷰
8. Illustrated: 10 CNN Architectures
9. CNNのボトルネック層（1x1畳み込み）による計算効率向上を理解する
10. PR-034: Inception and Xception
11. GoogleNet
12. GoogLeNet in Keras

LeNet & GoogLeNet

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a LeNet & GoogLeNet

Semelhante a LeNet & GoogLeNet (20)

Mais de Institute of Agricultural Machinery, NARO

Mais de Institute of Agricultural Machinery, NARO (8)

LeNet & GoogLeNet