SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
김 성 철
Contents
1. 용어 정리
2. Semi-supervised learning
3. FixMatch
4. Related work
5. Experiments
2
용어 정리
3
• Supervised Learning & Unsupervised Learning
• Weak-supervised Learning
• Weak label (e.g. dog, cat, …)을 가지고 strong label (e.g. segmentation mask)을 예측하자!
• FickleNet, … (following을 안하고 있어서…)
• Semi-supervised Learning
• Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자!
• UDA, MixMatch, ReMixMatch, FixMatch, …
• Self-supervised Learning
• Unlabeled data를 이용해 데이터 고유의 특징을 뽑아내는 pretrained weight를 잘 만들어보자!
• solving jigsaw puzzle, CPC, BERT, GPT, MoCo, SimCLR, BYOL, …
Semi-supervised learning
4
• 목표
• Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자!
https://steemit.com/kr/@jiwoopapa/realistic-evaluation-of-semi-supervised-learning-algorithms
Semi-supervised learning
5
20202017 2018 2019
Entropy
Minimization
(2005)
Consistency
Regularization
Generic
Regularization
Self-Training
VTA
(MixUp) MixMatch
Mean TeacherΠ-Model
ReMixMatch
FixMatch
UDA
Pseudo Labeling (2013)
Noise
Student
Semi-supervised learning
6
• Entropy Minimization
• 예측값(softmax)의 confidenc를 높이기 위해 사용
• 주로 softmax temperature를 이용
• Temperature를 1보다 적게 설정할수록 entropy가 작아짐
https://papers.nips.cc/paper/2740-semi-supervised-learning-by-entropy-minimization.pdf
http://dsba.korea.ac.kr/seminar/?mod=document&uid=68
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
7
• Consistency Regularization
1. 모델을 이용해 unlabeled data의 분포 예측
2. Unlabeled data에 noise 추가 (data augmentation)
3. 예측한 분포를 augmented data의 정답 label로 사용해 모델로 학습
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
8
• Unsupervised Data Augmentation (UDA, 2019)
https://arxiv.org/abs/1904.12848
Semi-supervised learning
9
• MixMatch (NeurIPS 2019)
https://arxiv.org/abs/1905.02249
http://dsba.korea.ac.kr/seminar/?mod=document&uid=68
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
10
• ReMixMatch (ICLR 2020)
https://arxiv.org/abs/1911.09785
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
Semi-supervised learning
11
• FixMatch (NeurIPS 2020)
https://arxiv.org/abs/2001.07685
12
• Background
• Consistency regularization
• Pseudo-labeling
• Unlabeled data에 artificial label을 얻기 위해 모델 사용
• Argmax로 hard label을 만들고 유지
1
𝜇𝐵
෍
𝑏=1
𝜇𝐵
𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑞 𝑏
• 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝑢 𝑏 , ො𝑞 𝑏 = argmax 𝑞 𝑏
• 𝜏 : threshold hyperparameter
• Hard label을 만드는 것은 entropy minimization과 깊은 관련이 있음
FixMatch
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
13
• FixMatch
• Notation
• 𝒳 = 𝑥 𝑏, 𝑝 𝑏 : 𝑏 ∈ 1, … , 𝐵 : a batch of 𝐵 labeled examples
• 𝑥 𝑏, 𝑝 𝑏 : the training examples, one-hot labels
• 𝒰 = 𝑢 𝑏: 𝑏 ∈ 1, … , 𝜇𝐵 : a batch of 𝜇𝐵 unlabeled examples
• 𝜇 : a hyperparameter that determines the relative sizes of 𝒳 and 𝒰
• 𝑝 𝑚 𝑦|𝑥 : the predicted class distribution produced by the model for input 𝑥
• 𝒜 ⋅ , 𝛼 ⋅ : strong and weak augmentation
FixMatch
14
• FixMatch
• Two cross-entropy loss term
• Labeled data에 적용되는 supervised loss 𝑙 𝑠
𝑙 𝑠 =
1
𝐵
෍
𝑏=1
𝐵
H 𝑝 𝑏, 𝑝 𝑚 𝑦|𝛼 𝑥 𝑏
• Unsupervised loss 𝑙 𝑢
• Artificial label을 각 example에 대해 만들고 standard cross-entropy loss에 사용
• 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝛼 𝑢 𝑏 : weakly-augmented unlabeled image에 대한 model의 predicted class distribution
• ො𝑞 𝑏 = argmax 𝑞 𝑏 : pseudo-label로 사용. Strongly-augmented image와 cross-entropy loss
𝑙 𝑢 =
1
𝜇𝐵
෍
𝑏=1
𝜇𝐵
𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑝 𝑚 𝑦|𝒜 𝑢 𝑏
FixMatch
15
• FixMatch
• 𝑙 𝑠 + 𝜆 𝑢 𝑙 𝑢를 최소화!
• 𝜆 𝑢를 키우는 것이 최근 SSL algorithm의 경향이지만 FixMatch에서는 불필요
• 학습 초기에는 max 𝑞 𝑏 가 𝜏보다 작고, 학습이 진행되면서 더 confident되고 max 𝑞 𝑏 > 𝜏인 경우가 잦아짐
FixMatch
http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
16
• Augmentation
• Two kinds of augmentation
• Weak
• Standard flip-and-shift augmentation
• Randomly horizontally flipping with 50%
• Randomly translating with up to 12.5% vertically and horizontally
• Strong
• AutoAugment
• RandAugment
• CTAugment (Control Theory Augment, in ReMixMatch)
+ Cutout
FixMatch
17
• Additional important factors
• Semi-supervised learning의 성능은 알고리즘 이외에 regularization의 정도 등과 같은
다른 요소의 영향을 상당부분 받을 수 있음 (labeled data가 적은 경우에 특히!)
• Regularization : simple weight decay
• Optimizer : standard SGD with momentum
• Learning rate scheduler : cosine learning rate decay ( 𝜂 cos
7𝜋𝑘
16𝐾
)
• Exponential moving average of model parameters
FixMatch
18
Related work
Experiments
• Dataset
• CIFAR-10
• CIFAR-100
• SVHN
• STL-10
• 96x96 pixels, 10 classes
• 5,000 labeled images
• 100,000 unlabeled images
• ImageNet
19
CIFAR-10 SVHN (Street View House Number)
STL-10https://www.cs.toronto.edu/~kriz/cifar.html
http://ufldl.stanford.edu/housenumbers/
https://cs.stanford.edu/~acoates/stl10/
Experiments
• CIFAR-10, CIFAR-100, SVHN
• Wide ResNet-28-2
• 5 different “folds” of labeled data
• CIFAR-100에서 ReMixMatch보다 낮은 성능
• Distribution Alignment가 큰 영향을 미치는 것으로 보임
→ FixMatch + Distribution Alignment : 40.14% error rate with 400 labeled examples!
20
Experiments
• STL-10
• Wide ResNet-37-2
• Five of the predefined folds of 1,000 labeled images each
21
Experiments
• ImageNet
• ResNet-50 & RandAugment
• Like UDA, 10% labeled data + 나머지 unlabeled data
• Top-1 error rate
• FixMatch : 28.54±0.52% (UDA보다 2.68% 낮음)
• Top-5 error rate : 10.87±0.28%
• S4L : supervised → pseudo-label re-training (semi) → supervised fine-tuning (self)
→ pseudo-label re-training에서는 좋은 성능
→ 모든 과정을 통과했을 때 거의 유사한 성능을 얻을 수 있었음
22
https://arxiv.org/abs/1905.03670
Experiments
• Barely Supervised Learning
• CIFAR-10에서 class당 하나의 example만 사용해서 진행
• 4개의 데이터셋 + 각 데이터셋에 대해 4번 학습
• 48.58~85.32% (median of 64.28%) test accuracy
• 첫 번째 데이터셋으로 학습한 4개의 모델은 모두 61~67%의 accuracy
• 두 번째 데이터셋으로 학습한 4개의 모델은 모두 68~75%의 accuracy
• Low-quality example을 선택한 경우 모델이 특정 class를 학습하는데 더 어렵지 않을까?
• 8개의 training dataset을 구축 (prototypicality)
• 여러 CIFAR-10 모델로 데이터를 representative한 순으로 sorting
• 8개의 bucket으로 나누고 bucket별로 8개의 one labeled example per class dataset 구축
→ 가장 representative한 데이터는 첫 번째, outlier들은 마지막
• 가장 prototypical한 example은 78%, middle of the distribution은 65%, outlier들은 10% accuracy
23
Experiments
• Barely Supervised Learning
24
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Sharpening and Thresholding
• Temperature 𝑇와 confidence threshold 𝜏에 대해 실험 진행
• Confidence threshold 𝜏 = 0.95
• 𝜏가 적용되었다면 temperature 𝑇는 큰 영향을 미치지 않음
25
0.95
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Augmentation Strategy
• Label guessing (pseudo-label)을 strong augmentation으로 교체
• 학습 초기에 발산 → pseudo-label은 weakly augmented data로 생성되어야함
• Model’s prediction에 weak augmentation 적용
• 45% accuracy에 도달했지만, 학습이 안정하지 않고 점차 12%로 떨어짐
→ model prediction에는 strong data augmentation이 필요
• 이런 결과는 supervised learning에서도 확인됨
26
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Ratio of Unlabeled Data
• Unlabeled data의 비율에 따른 성능 비교
• Unlabeled data를 많이 사용할수록 성능이 좋아짐 → UDA와 동일
• Learning rate 𝜂를 batch size에 따라 linear하게 변화를 주면 좋음 (특히 𝜂가 작을 때!)
27
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Optimizer and Learning Rate Schedule
• 이전 SSL 연구들에서는 실험이 거의 진행되지 않은 optimizers, hyperparameters
• SGD with momentum of 0.9가 제일 좋음 (4.84%)
• Momentum이 없을 때 5.19%
• Nesterov variant of momentum은 5% 아래의 error를 얻는데 필요X
• Adam은 별로
• Cosine learning rate decay가 많이 사용되는데,
FixMatch에서는 linear learning rate decay도 나쁘지 않음
• Cosine learning rate decay는 적절한 decaying rate를 고르는 것이 중요
• No decay는 0.86% accuracy가 감소
28
Experiments
• Ablation Study
• 250 label split from CIFAR-10 + CTAugment
• Weight Decay
• Labeled data가 적은 경우에 weight decay가 굉장히 중요함
• Optimal보다 크거나 작은 값을 선택하면 10%이상의 성능차가 나타날 수 있음
29
감 사 합 니 다
30

Mais conteúdo relacionado

Mais procurados

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Universitat Politècnica de Catalunya
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive lossestaeseon ryu
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
 
adversarial training.pptx
adversarial training.pptxadversarial training.pptx
adversarial training.pptxssuserc45ddf
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
[DL輪読会]In Search of Lost Domain Generalization
[DL輪読会]In Search of Lost Domain Generalization[DL輪読会]In Search of Lost Domain Generalization
[DL輪読会]In Search of Lost Domain GeneralizationDeep Learning JP
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読Deep Learning JP
 
NIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision Tree
NIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision TreeNIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision Tree
NIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision TreeTakami Sato
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 

Mais procurados (20)

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive losses
 
BERT
BERTBERT
BERT
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Transformers
TransformersTransformers
Transformers
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
adversarial training.pptx
adversarial training.pptxadversarial training.pptx
adversarial training.pptx
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
[DL輪読会]In Search of Lost Domain Generalization
[DL輪読会]In Search of Lost Domain Generalization[DL輪読会]In Search of Lost Domain Generalization
[DL輪読会]In Search of Lost Domain Generalization
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読
 
NIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision Tree
NIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision TreeNIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision Tree
NIPS2017読み会 LightGBM: A Highly Efficient Gradient Boosting Decision Tree
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 

Semelhante a FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝Haesun Park
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive LearningSungchul Kim
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...LEE HOSEONG
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning성재 최
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiionSubin An
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...LEE HOSEONG
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)SANG WON PARK
 
SQL performance and UDF
SQL performance and UDFSQL performance and UDF
SQL performance and UDFJAEGEUN YU
 
3.neural networks
3.neural networks3.neural networks
3.neural networksHaesun Park
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationLEE HOSEONG
 
Learning by association
Learning by associationLearning by association
Learning by association홍배 김
 
분석과 설계
분석과 설계분석과 설계
분석과 설계Haeil Yi
 
ESM Mid term Review
ESM Mid term ReviewESM Mid term Review
ESM Mid term ReviewMario Cho
 
2017 빅콘테스트
2017 빅콘테스트2017 빅콘테스트
2017 빅콘테스트Sanghyun Kim
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basicsJinho Lee
 

Semelhante a FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence (20)

[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
 
Image data augmentatiion
Image data augmentatiionImage data augmentatiion
Image data augmentatiion
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)
 
SQL performance and UDF
SQL performance and UDFSQL performance and UDF
SQL performance and UDF
 
3.neural networks
3.neural networks3.neural networks
3.neural networks
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
Openface
OpenfaceOpenface
Openface
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
Learning by association
Learning by associationLearning by association
Learning by association
 
분석과 설계
분석과 설계분석과 설계
분석과 설계
 
ESM Mid term Review
ESM Mid term ReviewESM Mid term Review
ESM Mid term Review
 
2017 빅콘테스트
2017 빅콘테스트2017 빅콘테스트
2017 빅콘테스트
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
 
DL from scratch(6)
DL from scratch(6)DL from scratch(6)
DL from scratch(6)
 

Mais de Sungchul Kim

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionSungchul Kim
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorSungchul Kim
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with ConvolutionsSungchul Kim
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationSungchul Kim
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Sungchul Kim
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic SegmentationSungchul Kim
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondSungchul Kim
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksSungchul Kim
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksSungchul Kim
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design SpacesSungchul Kim
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningSungchul Kim
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Sungchul Kim
 

Mais de Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 

Último

JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP Korea
 
공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화JMP Korea
 
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP Korea
 
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP Korea
 
JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP Korea
 
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석JMP Korea
 
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?Jay Park
 
데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법JMP Korea
 

Último (8)

JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!
 
공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화
 
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
 
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
 
JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례
 
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
 
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?
 
데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법
 

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

  • 2. Contents 1. 용어 정리 2. Semi-supervised learning 3. FixMatch 4. Related work 5. Experiments 2
  • 3. 용어 정리 3 • Supervised Learning & Unsupervised Learning • Weak-supervised Learning • Weak label (e.g. dog, cat, …)을 가지고 strong label (e.g. segmentation mask)을 예측하자! • FickleNet, … (following을 안하고 있어서…) • Semi-supervised Learning • Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자! • UDA, MixMatch, ReMixMatch, FixMatch, … • Self-supervised Learning • Unlabeled data를 이용해 데이터 고유의 특징을 뽑아내는 pretrained weight를 잘 만들어보자! • solving jigsaw puzzle, CPC, BERT, GPT, MoCo, SimCLR, BYOL, …
  • 4. Semi-supervised learning 4 • 목표 • Labeled data가 많지 않은 상황에서 unlabeled data의 도움을 받아 성능을 높이자! https://steemit.com/kr/@jiwoopapa/realistic-evaluation-of-semi-supervised-learning-algorithms
  • 5. Semi-supervised learning 5 20202017 2018 2019 Entropy Minimization (2005) Consistency Regularization Generic Regularization Self-Training VTA (MixUp) MixMatch Mean TeacherΠ-Model ReMixMatch FixMatch UDA Pseudo Labeling (2013) Noise Student
  • 6. Semi-supervised learning 6 • Entropy Minimization • 예측값(softmax)의 confidenc를 높이기 위해 사용 • 주로 softmax temperature를 이용 • Temperature를 1보다 적게 설정할수록 entropy가 작아짐 https://papers.nips.cc/paper/2740-semi-supervised-learning-by-entropy-minimization.pdf http://dsba.korea.ac.kr/seminar/?mod=document&uid=68 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 7. Semi-supervised learning 7 • Consistency Regularization 1. 모델을 이용해 unlabeled data의 분포 예측 2. Unlabeled data에 noise 추가 (data augmentation) 3. 예측한 분포를 augmented data의 정답 label로 사용해 모델로 학습 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 8. Semi-supervised learning 8 • Unsupervised Data Augmentation (UDA, 2019) https://arxiv.org/abs/1904.12848
  • 9. Semi-supervised learning 9 • MixMatch (NeurIPS 2019) https://arxiv.org/abs/1905.02249 http://dsba.korea.ac.kr/seminar/?mod=document&uid=68 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 10. Semi-supervised learning 10 • ReMixMatch (ICLR 2020) https://arxiv.org/abs/1911.09785 http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 11. Semi-supervised learning 11 • FixMatch (NeurIPS 2020) https://arxiv.org/abs/2001.07685
  • 12. 12 • Background • Consistency regularization • Pseudo-labeling • Unlabeled data에 artificial label을 얻기 위해 모델 사용 • Argmax로 hard label을 만들고 유지 1 𝜇𝐵 ෍ 𝑏=1 𝜇𝐵 𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑞 𝑏 • 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝑢 𝑏 , ො𝑞 𝑏 = argmax 𝑞 𝑏 • 𝜏 : threshold hyperparameter • Hard label을 만드는 것은 entropy minimization과 깊은 관련이 있음 FixMatch http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 13. 13 • FixMatch • Notation • 𝒳 = 𝑥 𝑏, 𝑝 𝑏 : 𝑏 ∈ 1, … , 𝐵 : a batch of 𝐵 labeled examples • 𝑥 𝑏, 𝑝 𝑏 : the training examples, one-hot labels • 𝒰 = 𝑢 𝑏: 𝑏 ∈ 1, … , 𝜇𝐵 : a batch of 𝜇𝐵 unlabeled examples • 𝜇 : a hyperparameter that determines the relative sizes of 𝒳 and 𝒰 • 𝑝 𝑚 𝑦|𝑥 : the predicted class distribution produced by the model for input 𝑥 • 𝒜 ⋅ , 𝛼 ⋅ : strong and weak augmentation FixMatch
  • 14. 14 • FixMatch • Two cross-entropy loss term • Labeled data에 적용되는 supervised loss 𝑙 𝑠 𝑙 𝑠 = 1 𝐵 ෍ 𝑏=1 𝐵 H 𝑝 𝑏, 𝑝 𝑚 𝑦|𝛼 𝑥 𝑏 • Unsupervised loss 𝑙 𝑢 • Artificial label을 각 example에 대해 만들고 standard cross-entropy loss에 사용 • 𝑞 𝑏 = 𝑝 𝑚 𝑦|𝛼 𝑢 𝑏 : weakly-augmented unlabeled image에 대한 model의 predicted class distribution • ො𝑞 𝑏 = argmax 𝑞 𝑏 : pseudo-label로 사용. Strongly-augmented image와 cross-entropy loss 𝑙 𝑢 = 1 𝜇𝐵 ෍ 𝑏=1 𝜇𝐵 𝟏 max 𝑞 𝑏 ≥ 𝜏 H ො𝑞 𝑏, 𝑝 𝑚 𝑦|𝒜 𝑢 𝑏 FixMatch
  • 15. 15 • FixMatch • 𝑙 𝑠 + 𝜆 𝑢 𝑙 𝑢를 최소화! • 𝜆 𝑢를 키우는 것이 최근 SSL algorithm의 경향이지만 FixMatch에서는 불필요 • 학습 초기에는 max 𝑞 𝑏 가 𝜏보다 작고, 학습이 진행되면서 더 confident되고 max 𝑞 𝑏 > 𝜏인 경우가 잦아짐 FixMatch http://dsba.korea.ac.kr/seminar/?mod=document&uid=248
  • 16. 16 • Augmentation • Two kinds of augmentation • Weak • Standard flip-and-shift augmentation • Randomly horizontally flipping with 50% • Randomly translating with up to 12.5% vertically and horizontally • Strong • AutoAugment • RandAugment • CTAugment (Control Theory Augment, in ReMixMatch) + Cutout FixMatch
  • 17. 17 • Additional important factors • Semi-supervised learning의 성능은 알고리즘 이외에 regularization의 정도 등과 같은 다른 요소의 영향을 상당부분 받을 수 있음 (labeled data가 적은 경우에 특히!) • Regularization : simple weight decay • Optimizer : standard SGD with momentum • Learning rate scheduler : cosine learning rate decay ( 𝜂 cos 7𝜋𝑘 16𝐾 ) • Exponential moving average of model parameters FixMatch
  • 19. Experiments • Dataset • CIFAR-10 • CIFAR-100 • SVHN • STL-10 • 96x96 pixels, 10 classes • 5,000 labeled images • 100,000 unlabeled images • ImageNet 19 CIFAR-10 SVHN (Street View House Number) STL-10https://www.cs.toronto.edu/~kriz/cifar.html http://ufldl.stanford.edu/housenumbers/ https://cs.stanford.edu/~acoates/stl10/
  • 20. Experiments • CIFAR-10, CIFAR-100, SVHN • Wide ResNet-28-2 • 5 different “folds” of labeled data • CIFAR-100에서 ReMixMatch보다 낮은 성능 • Distribution Alignment가 큰 영향을 미치는 것으로 보임 → FixMatch + Distribution Alignment : 40.14% error rate with 400 labeled examples! 20
  • 21. Experiments • STL-10 • Wide ResNet-37-2 • Five of the predefined folds of 1,000 labeled images each 21
  • 22. Experiments • ImageNet • ResNet-50 & RandAugment • Like UDA, 10% labeled data + 나머지 unlabeled data • Top-1 error rate • FixMatch : 28.54±0.52% (UDA보다 2.68% 낮음) • Top-5 error rate : 10.87±0.28% • S4L : supervised → pseudo-label re-training (semi) → supervised fine-tuning (self) → pseudo-label re-training에서는 좋은 성능 → 모든 과정을 통과했을 때 거의 유사한 성능을 얻을 수 있었음 22 https://arxiv.org/abs/1905.03670
  • 23. Experiments • Barely Supervised Learning • CIFAR-10에서 class당 하나의 example만 사용해서 진행 • 4개의 데이터셋 + 각 데이터셋에 대해 4번 학습 • 48.58~85.32% (median of 64.28%) test accuracy • 첫 번째 데이터셋으로 학습한 4개의 모델은 모두 61~67%의 accuracy • 두 번째 데이터셋으로 학습한 4개의 모델은 모두 68~75%의 accuracy • Low-quality example을 선택한 경우 모델이 특정 class를 학습하는데 더 어렵지 않을까? • 8개의 training dataset을 구축 (prototypicality) • 여러 CIFAR-10 모델로 데이터를 representative한 순으로 sorting • 8개의 bucket으로 나누고 bucket별로 8개의 one labeled example per class dataset 구축 → 가장 representative한 데이터는 첫 번째, outlier들은 마지막 • 가장 prototypical한 example은 78%, middle of the distribution은 65%, outlier들은 10% accuracy 23
  • 25. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Sharpening and Thresholding • Temperature 𝑇와 confidence threshold 𝜏에 대해 실험 진행 • Confidence threshold 𝜏 = 0.95 • 𝜏가 적용되었다면 temperature 𝑇는 큰 영향을 미치지 않음 25 0.95
  • 26. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Augmentation Strategy • Label guessing (pseudo-label)을 strong augmentation으로 교체 • 학습 초기에 발산 → pseudo-label은 weakly augmented data로 생성되어야함 • Model’s prediction에 weak augmentation 적용 • 45% accuracy에 도달했지만, 학습이 안정하지 않고 점차 12%로 떨어짐 → model prediction에는 strong data augmentation이 필요 • 이런 결과는 supervised learning에서도 확인됨 26
  • 27. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Ratio of Unlabeled Data • Unlabeled data의 비율에 따른 성능 비교 • Unlabeled data를 많이 사용할수록 성능이 좋아짐 → UDA와 동일 • Learning rate 𝜂를 batch size에 따라 linear하게 변화를 주면 좋음 (특히 𝜂가 작을 때!) 27
  • 28. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Optimizer and Learning Rate Schedule • 이전 SSL 연구들에서는 실험이 거의 진행되지 않은 optimizers, hyperparameters • SGD with momentum of 0.9가 제일 좋음 (4.84%) • Momentum이 없을 때 5.19% • Nesterov variant of momentum은 5% 아래의 error를 얻는데 필요X • Adam은 별로 • Cosine learning rate decay가 많이 사용되는데, FixMatch에서는 linear learning rate decay도 나쁘지 않음 • Cosine learning rate decay는 적절한 decaying rate를 고르는 것이 중요 • No decay는 0.86% accuracy가 감소 28
  • 29. Experiments • Ablation Study • 250 label split from CIFAR-10 + CTAugment • Weight Decay • Labeled data가 적은 경우에 weight decay가 굉장히 중요함 • Optimal보다 크거나 작은 값을 선택하면 10%이상의 성능차가 나타날 수 있음 29
  • 30. 감 사 합 니 다 30