3. 2. Methods
1. Research Background 3
Class incremental learning
많은 real-world application에서, streaming data로부터 점진적으로 새로운 class들을 학습해야 하는
경우가 있는데, 이를 class incremental learning이라고 한다.
Incremental learning aims to learn efficient machine models from the data that gradually come in a sequence of
training phases. Closely related topics are referred to as continual learning and lifelong learning.
Xiao, Tianjun, et al. "Error-driven incremental learning in deep convolutional neural network for large-scale
image classification." Proceedings of the 22nd ACM international conference on Multimedia. 2014.
/ 23
4. 2. Methods
1. Research Background 4
• 새로운 incremental phase에서 학습된 모델은 old class를 쉽게 잊는다
Motivation - catastrophic forgetting issue
/ 23
5. 2. Methods
1. Research Background 5
(Recap from PR-339) Previous works - Replay based method
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
IL2M (ICCV 2019)
Figure from: Belouadah, Eden, and Adrian Popescu. "Il2m: Class incremental learning with dual memory." ICCV 2019.
/ 23
6. 2. Methods
1. Research Background 6
(Recap from PR-339) Previous works - Replay based method
Old data에 대한 additional memory를 만들고, 이를 활용해 catastrophic forgetting을 막는다.
[iCaRL] Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." CVPR 2017.
[IL2M] Belouadah, Eden, and Adrian Popescu. "Il2m: Class incremental learning with dual memory." ICCV 2019
[BiC] Wu, Yue, et al. "Large scale incremental learning." CVPR 2019.
• Training 후 Bias correction layer를 이용한
model output 수정
BiC (CVPR 2019)
• Old exemplar를 활용해 catastrophic
forgetting을 해결한 첫 시도
• Nearest Class Mean (NCM): old
exemplar의 average feature vector
를 이용해 class imbalance 보완
iCaRL (CVPR 2017) IL2M (ICCV 2019)
• Incremental Learning with
Dual Memory
• Dual Memory: old image &
과거 모델의 class statistics
• probability calibration
method
/ 23
7. 2. Methods
1. Research Background 7
the main problem: the stability-plasticity dilemma
The key issue of CIL is that the models trained at new phases easily “forget” old classes.
0th-phase (n classes)
Old exemplars
New data
1st-phase (n+k classes) i-th-phase (n+ik classes)
D0
0th-model
Θ0
Sampling D0 ∋ ℰ0
Training
Old exemplars
New data
D1
1st-model
Θ1
Training
ℰ0
Old exemplars
New data
Di
i-th-model
Θi
Training
ℰ0:i−1 = {ℰ0, . . . , ℰi−1}
Old exemplar memory 제한으로 인한 class imbalance
Higher plasticity
- forgetting of old classes
Higher stability
- weakens the model from learning
the data of new classes
/ 23
9. 2. Methods
2. Methods 9
Approach: Stable and Plastic Blocks
Plastic blocks: fully adopted to new class data
Stable blocks: partially fixed
Plasticity (새 class에 대한 학습) <-> Stability (old class knowledge를 잊지 않는)
More learnable parameter <-> Less learnable parameter
• We address the stability-plasticity dilemma by introducing a novel network architecture called
Adaptive Aggregation Networks (AANets).
/ 23
10. 2. Methods
2. Methods 10
AANets: Stable and Plastic Blocks (Taking the ResNet as a baseline architectures)
Image
More learnable parameter ( ) <-> Less learnable parameter ( )
η ϕ
output features
classifier
contains all the convolutional weights <-> contains neuron-level scaling weights
η ϕ
• Contribution 1: a novel and generic network architecture called AANets
Plastic feature
Stable feature Aggregation weights
/ 23
11. 2. Methods
2. Methods 11
bilevel optimization program (BOP)
Learning rate: 0.1
ℰ0:i−1 = {ℰ0, . . . , ℰi−1}
Di ℰi
Lower-level problem training
/ 23
12. 2. Methods
2. Methods 12
bilevel optimization program (BOP)
ℰ0:i−1 = {ℰ0, . . . , ℰi−1}
upper-level problem training
Learning rate: 1 × 10−8
Di ℰi
• Contribution 2: a BOP-based formulation and an end-to-end training solution for optimizing AANets
/ 23
13. 2. Methods
2. Methods 13
Previous works - Parameter-isolation-based methods in incremental learning
•Conditional Channel Gated Networks (Abati et al., CVPR 2020): conv layer 뒤에 task-specific gating module을 두고, 이
module이 새로운 task를 학습할 때 conv layer에서 어떤 filter를 적용할지 결정하게 함
•Random path selection for incremental learning (Rajasegaran et al., NeurIPS 2019): neural network을 module (conv-bn-
relu-conv-bn) 형태로 구성하고, new task에 대한 optimal path를 선택하도록 함
•Reinforced Continual Learning (Xu et al., NeurIPS 2018): 강화학습을 활용해 coming task에 가장 적합한 architecture를 탐색.
Conditional Channel Gated Networks Random path selection for incremental learning
/ 23
14. 2. Methods
2. Methods 14
AANets: Stable and Plastic Blocks (Taking the ResNet as a baseline architectures)
Image
output features
Stable feature Aggregation weights
•AANets architecture:
•1 initial convolution layer -> 3 residual blocks (in a single branch) -> average-pooling -> fully connect layer.
각 block (level)은 10 conv layer (3 x 3 kernels) 로 구성. Filter 갯수는 16개로 시작해 다음 블록에서는 2배로 증가.
•Hyperparameter:
•Aggregation weights constraint:
•CIFAR-100 (ImageNet), we train the model for 160 (90) epochs in each phase, and the learning rates are divided by 10
after 80 (30) and then after 120 (60) epochs.
•SGD optimizer with the momentum 0.9, batch size 128 to train the models in all settings.
αη + αϕ = 1
/ 23
16. 2. Methods
3. Experimental Results 16
Experimental settings
•Data:
•CIFAR-100: 60,000 samples (32 x 32 color images for 100 classes). There are 500 training and 100 test samples for
each class (600 samples x 100 classes).
•ImageNet: 1.3 million samples (224 x 224 color images for 1000 classes). There are approximately 1,300 training and
50 test samples for each class. The 100-class data for ImageNet-Subset are sampled from ImageNet
N=5
50 class 10 c
0-th phase
10 c 10 c 10 c 10 c
1 ~ 5-th phase
50 class 5 c
0-th phase 1 ~ 10-th phase
5 c 5 c 5 c 5 c 5 c 5 c 5 c 5 c 5 c
N=10
•CIL settings example for CIFAR-100 & ImageNet-Subset:
/ 23
17. 2. Methods
3. Experimental Results 17
“Stable block” + “Plastic block” approach의 성능 확인
“All” + “frozen” -> plastic block은 모든 weight를 학습시키고, stable block은 freeze 시켰다는 의미.
/ 23
18. 2. Methods
3. Experimental Results 18
Ablation study
1) Balanced subset을 활용해 aggregation weights alpha를 학습하는 것이 성능에 중요한 영향을 미쳤다.
2) Memory overhead가 26%, 14.5% 감소한 것에 비해 성능 감소는 작았다. (CIFAR-100 N=5에서 0.3% 감소)
Row 6-8 setting: “all” + “frozen”
Row 6:
Row 7
Row 6
Row 8
Row 7: w/o adapted alpha:
Row 8: memory overhead 감소 (20 images / class -> 13, 16 images / class)
/ 23
20. 2. Methods
3. Experimental Results 20
Stable/plastic block의 역할을 class activation map을 통해 확인
•5-phase class incremental learning (ImageNet-Subset) 을 마친 후 최종 모델에 대해 실험
•과거 phase에 포함되어 있는 class에 해당하는 image를 어떻게 표시할지 확인
1) Stable block은 과거 phase에서 학습한 class, plastic block은 최근에 학습한 class의 지식을 잘 담고 있다.
2) 학습된 aggregation weights가 두 block의 지식을 잘 조율해 높은 성능을 만들었다.
/ 23
21. 2. Methods
3. Experimental Results 21
Aggregation weights가 CIL과정에서 어떻게 변화하는지 확인
• 저자들은 Level 1 은 모든 class에서 공유할 수 있는 low-level feature를 encoding하기 때문에 새 class가 추가될
수록 plastic block feature의 비중이 높고, level 3는 classifier와 가까워 새로운 class를 학습한 plastic block
feature의 비중이 높다고 설명한다.
(stable block의 aggregation weight)
αϕ
(plastic block의 aggregation weight)
αη
/ 23
23. 2. Methods
4. Conclusions 23
Thank you.
• 이 논문에서는 Class incremental learning에서의 catastrophic forgetting 문제를 해결
하기 위한 새로운 architecture AANets를 제시했다.
• Two types of residual block (Plastic block: learn plasticity, Stable block: learn stability)
을 따로 학습시켰다.
• 각 feature를 조합하는 최적의 aggregation weights를 learnable parameter로 두어 학습
시켰다.
• 새로 제시한 방법으로 ImageNet-1000, ImageNet-100, CIFAR-100에 대한 class
incremental learning task를 시도한 결과, 기존 CIL method보다 더 나은 성능을 보였다.
• AANets는 generic approach로서, 기존의 CIL method에 적용되어 성능을 높일 수 있다.
/ 23