SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
2th February 2020
PR12 Paper Review
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab KR
2019 CVPR
PR-222: Revisiting Self-Supervised Visual Representation Learning 1
Contents
• Introduction
• Self-Supervised Study Setup
• Architectures of CNN models
• Self-supervised techniques in this study
• Evaluation
• Datasets
• Experiments and Results
• Conclusion
PR-222: Revisiting Self-Supervised Visual Representation Learning 2
Before Start..
[PR-208] Unsupervised Visual Representation Learning Overview: Toward Self-Supervision
• Video Link: https://youtu.be/eDDHsbMgOJQ
• I highly recommend watching the video above(PR-208) before listening to this presentation!!
PR-222: Revisiting Self-Supervised Visual Representation Learning 3
Introduction
“Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR
• Many the pretext tasks for self-supervised learning have been studied
• But.. Still low performance than supervised setting
• Other important aspects, such as CNN architecture has not received equal attention
PR-222: Revisiting Self-Supervised Visual Representation Learning 4
“Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR
• Other important aspects, such as CNN architecture has not received equal attention
• So, revisit previously proposed self-supervised models and conduct a large-scale study
Introduction
PR-222: Revisiting Self-Supervised Visual Representation Learning 5
3.1. Architectures of CNN models
• A large part of the self-supervised techniques for visual representation approaches use AlexNet
• Employ modern network architectures
• ResNet50, pre-logits of size 512*k
• RevNet (The Reversible ResNet), but do not use G like real NVP paper
• VGG with batch-normalization, initial conv layer has 8*k channels, fc layer has 512*k channels
Self-Supervised Study Setup
Why use an old-fashioned architecture?!
reference: The Reversible Residual Network: Backpropagation Without Storing Activations, 2017 NIPS
ResNet RevNet
widening factor k, k ∈ {4, 8, 12, 16}
PR-222: Revisiting Self-Supervised Visual Representation Learning 6
3.2. Self-supervised techniques in this study
• Use 4 self-supervised techniques for experiments
• Rotation
• Exemplar
• Jigsaw
• Relative Patch Location
Self-Supervised Study Setup
PR-222: Revisiting Self-Supervised Visual Representation Learning 7
3.3. Evaluation
• Follow common rule - Training a linear logistic regression model to solve multi-class classification task
• Exact the representation from the frozen network at the pre-logit level
• Train the logistic regression using L-BFGS except in Table 2
• For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2
Self-Supervised Study Setup
Table 2
PR-222: Revisiting Self-Supervised Visual Representation Learning 8
3.4. Datasets
• ImageNet (Train + Validation)
• In order to avoid overfitting, use own validation split (50,000 random images from training split) for
all studies except in Table 2
• All self-supervised models are trained on ImageNet(without labels)
• Places205 (Validation only)
• Qualitatively different from ImageNet → good candidate for evaluating how well the learned
representations generalize to new unseen data of different nature
• Same procedure as for ImageNet regarding validation splits (random splitting)
Self-Supervised Study Setup
PR-222: Revisiting Self-Supervised Visual Representation Learning 9
4.1. Evaluation on ImageNet and Places205
• Measure the representation quality produced by 6 different CNN with various widening factors
• Increasing the number of channels improves performance of self-supervised models
Experiments and Results
Widening
factor
Random
Initialize
Without
ReLU before
GAP layer
PR-222: Revisiting Self-Supervised Visual Representation Learning 10
4.1. Evaluation on ImageNet and Places205
• neither is the ranking of architectures consistent across different methods, nor is the ranking of
methods consistent across architectures
• Ranking of Places205 is consistent with that of ImageNet → generalized to new dataset
• VGG19-BN consistently demonstrates worst performance, even though it achieve performance similar to
ResNet 50 on standard vision benchmark (fully supervised setting)
Experiments and Results
Rotation → RevNet50
Exemplar → ResNet50 v1
Rel. Patch Loc. → ResNet50 v1
Jigsaw → ResNet50 v1
VGG19-BN → Worst performance in all case
PR-222: Revisiting Self-Supervised Visual Representation Learning 11
4.2. Comparison to prior work
• For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2
• As a result of selecting the right architecture, significantly outperform previous reported results
Experiments and Results
Prev. Result
PR-222: Revisiting Self-Supervised Visual Representation Learning 12
4.3. A linear model is adequate for evaluation
• Consider an alternative evaluation scenario – use MLP for solving the evaluation task
• Add a single hidden layer with 1000 channels with ReLU, Dropout to become non-linear model
• MLP provides only marginal improvement over the linear evaluation
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 13
4.4. Better performance on the pretext task does not always translate to better
representations
• Performance on the pretext task is a good proxy, but not always..
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 14
4.5. Skip-connections prevent degradation of representation quality towards the end of
CNNs
• VGG-BN get worse towards the end of the network, but not ResNet, RevNet
• Model specialize to the pretext task and discard more general semantic features in the later layers
• Using skip-connections preserve information learned in intermediate layers
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 15
4.6. Model width and representation size strongly influence the representation quality
• Check whether the increase in performance is due to increased network capacity or the use of higher-
dimensional representations, or to the interplay of both
• Disentangle the network width from the representation size(pre-logits channels)
• Increasing the widening factor consistently boosts performance in both the full and low-data regimes.
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 16
4.7. SGD for training linear model takes long time to converge
• Previous works use short training time
• Investigate the importance of the SGD optimization schedule for training logistic regression
• The first decay has a large influence on the final accuracy
Experiments and Results
PR-222: Revisiting Self-Supervised Visual Representation Learning 17
Revisit previously proposed self-supervised models and conduct a large-scale study
• Architecture design in the fully-supervised setting necessarily do not translate to the self-supervised
setting (VGG19-BN)
• Using skip-connections can achieve consistently good results in contrast to AlexNet
• Widening factor of CNNs has a drastic effect on performance of self-supervised techniques
• SGD training of linear logistic regression require very long time to converge
• Ranking of architectures  X → Ranking of methods
Conclusion
PR-222: Revisiting Self-Supervised Visual Representation Learning 18

Mais conteúdo relacionado

Mais procurados

YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separationNAVER Engineering
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchSujit Pal
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Sergey Karayev
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsJaeJun Yoo
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionKaty Lee
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization太一郎 遠藤
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IWanjin Yu
 

Mais procurados (20)

YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 

Semelhante a Modern CNN Architectures and Training Methods Boost Self-Supervised Visual Representation Learning

PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really WorksBasivi Reddy Junna
 
TIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfTIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfBoahKim2
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityBalázs Hidasi
 
Student feedback system
Student feedback systemStudent feedback system
Student feedback systemmsandbhor
 
Refactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test AutomationRefactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test AutomationStephen Fuqua
 
18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven DevelopmentESEM 2014
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceSoheila Dehghanzadeh
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Modelsandyr91
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process ModelsAtul Karmyal
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxssuser2624f71
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisSeunghyun Hwang
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 

Semelhante a Modern CNN Architectures and Training Methods Boost Self-Supervised Visual Representation Learning (20)

PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Vue.js Use Cases
Vue.js Use CasesVue.js Use Cases
Vue.js Use Cases
 
An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really Works
 
TIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdfTIP_TAViT_presentation.pdf
TIP_TAViT_presentation.pdf
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
 
Student feedback system
Student feedback systemStudent feedback system
Student feedback system
 
Refactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test AutomationRefactoring Legacy Web Forms for Test Automation
Refactoring Legacy Web Forms for Test Automation
 
18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development18 - Impact of Process Conformance on the Effects of Test-driven Development
18 - Impact of Process Conformance on the Effects of Test-driven Development
 
Sudhakar Resume
Sudhakar ResumeSudhakar Resume
Sudhakar Resume
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenance
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
tip oopt pse-summit2017
tip oopt pse-summit2017tip oopt pse-summit2017
tip oopt pse-summit2017
 
MSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for ADMSCV Capstone Spring 2020 Presentation - RL for AD
MSCV Capstone Spring 2020 Presentation - RL for AD
 
ABC of developer test
ABC of developer testABC of developer test
ABC of developer test
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
 
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image SynthesisLarge Scale GAN Training for High Fidelity Natural Image Synthesis
Large Scale GAN Training for High Fidelity Natural Image Synthesis
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 

Mais de LEE HOSEONG

Unsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationUnsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationLEE HOSEONG
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to ZLEE HOSEONG
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationLEE HOSEONG
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
2019 ICLR Best Paper Review
2019 ICLR Best Paper Review2019 ICLR Best Paper Review
2019 ICLR Best Paper ReviewLEE HOSEONG
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper ReviewLEE HOSEONG
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper ReviewLEE HOSEONG
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...LEE HOSEONG
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper ReviewLEE HOSEONG
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...LEE HOSEONG
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...LEE HOSEONG
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper ReviewLEE HOSEONG
 

Mais de LEE HOSEONG (14)

Unsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationUnsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillation
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
2019 ICLR Best Paper Review
2019 ICLR Best Paper Review2019 ICLR Best Paper Review
2019 ICLR Best Paper Review
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Modern CNN Architectures and Training Methods Boost Self-Supervised Visual Representation Learning

  • 1. 2th February 2020 PR12 Paper Review Ho Seong Lee (hoya012) Cognex Deep Learning Lab KR 2019 CVPR PR-222: Revisiting Self-Supervised Visual Representation Learning 1
  • 2. Contents • Introduction • Self-Supervised Study Setup • Architectures of CNN models • Self-supervised techniques in this study • Evaluation • Datasets • Experiments and Results • Conclusion PR-222: Revisiting Self-Supervised Visual Representation Learning 2
  • 3. Before Start.. [PR-208] Unsupervised Visual Representation Learning Overview: Toward Self-Supervision • Video Link: https://youtu.be/eDDHsbMgOJQ • I highly recommend watching the video above(PR-208) before listening to this presentation!! PR-222: Revisiting Self-Supervised Visual Representation Learning 3
  • 4. Introduction “Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR • Many the pretext tasks for self-supervised learning have been studied • But.. Still low performance than supervised setting • Other important aspects, such as CNN architecture has not received equal attention PR-222: Revisiting Self-Supervised Visual Representation Learning 4
  • 5. “Revisiting Self-Supervised Visual Representation Learning”, 2019 CVPR • Other important aspects, such as CNN architecture has not received equal attention • So, revisit previously proposed self-supervised models and conduct a large-scale study Introduction PR-222: Revisiting Self-Supervised Visual Representation Learning 5
  • 6. 3.1. Architectures of CNN models • A large part of the self-supervised techniques for visual representation approaches use AlexNet • Employ modern network architectures • ResNet50, pre-logits of size 512*k • RevNet (The Reversible ResNet), but do not use G like real NVP paper • VGG with batch-normalization, initial conv layer has 8*k channels, fc layer has 512*k channels Self-Supervised Study Setup Why use an old-fashioned architecture?! reference: The Reversible Residual Network: Backpropagation Without Storing Activations, 2017 NIPS ResNet RevNet widening factor k, k ∈ {4, 8, 12, 16} PR-222: Revisiting Self-Supervised Visual Representation Learning 6
  • 7. 3.2. Self-supervised techniques in this study • Use 4 self-supervised techniques for experiments • Rotation • Exemplar • Jigsaw • Relative Patch Location Self-Supervised Study Setup PR-222: Revisiting Self-Supervised Visual Representation Learning 7
  • 8. 3.3. Evaluation • Follow common rule - Training a linear logistic regression model to solve multi-class classification task • Exact the representation from the frozen network at the pre-logit level • Train the logistic regression using L-BFGS except in Table 2 • For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2 Self-Supervised Study Setup Table 2 PR-222: Revisiting Self-Supervised Visual Representation Learning 8
  • 9. 3.4. Datasets • ImageNet (Train + Validation) • In order to avoid overfitting, use own validation split (50,000 random images from training split) for all studies except in Table 2 • All self-supervised models are trained on ImageNet(without labels) • Places205 (Validation only) • Qualitatively different from ImageNet → good candidate for evaluating how well the learned representations generalize to new unseen data of different nature • Same procedure as for ImageNet regarding validation splits (random splitting) Self-Supervised Study Setup PR-222: Revisiting Self-Supervised Visual Representation Learning 9
  • 10. 4.1. Evaluation on ImageNet and Places205 • Measure the representation quality produced by 6 different CNN with various widening factors • Increasing the number of channels improves performance of self-supervised models Experiments and Results Widening factor Random Initialize Without ReLU before GAP layer PR-222: Revisiting Self-Supervised Visual Representation Learning 10
  • 11. 4.1. Evaluation on ImageNet and Places205 • neither is the ranking of architectures consistent across different methods, nor is the ranking of methods consistent across architectures • Ranking of Places205 is consistent with that of ImageNet → generalized to new dataset • VGG19-BN consistently demonstrates worst performance, even though it achieve performance similar to ResNet 50 on standard vision benchmark (fully supervised setting) Experiments and Results Rotation → RevNet50 Exemplar → ResNet50 v1 Rel. Patch Loc. → ResNet50 v1 Jigsaw → ResNet50 v1 VGG19-BN → Worst performance in all case PR-222: Revisiting Self-Supervised Visual Representation Learning 11
  • 12. 4.2. Comparison to prior work • For consistency and fair evaluation, use SGD with momentum, augmentation in Table 2 • As a result of selecting the right architecture, significantly outperform previous reported results Experiments and Results Prev. Result PR-222: Revisiting Self-Supervised Visual Representation Learning 12
  • 13. 4.3. A linear model is adequate for evaluation • Consider an alternative evaluation scenario – use MLP for solving the evaluation task • Add a single hidden layer with 1000 channels with ReLU, Dropout to become non-linear model • MLP provides only marginal improvement over the linear evaluation Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 13
  • 14. 4.4. Better performance on the pretext task does not always translate to better representations • Performance on the pretext task is a good proxy, but not always.. Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 14
  • 15. 4.5. Skip-connections prevent degradation of representation quality towards the end of CNNs • VGG-BN get worse towards the end of the network, but not ResNet, RevNet • Model specialize to the pretext task and discard more general semantic features in the later layers • Using skip-connections preserve information learned in intermediate layers Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 15
  • 16. 4.6. Model width and representation size strongly influence the representation quality • Check whether the increase in performance is due to increased network capacity or the use of higher- dimensional representations, or to the interplay of both • Disentangle the network width from the representation size(pre-logits channels) • Increasing the widening factor consistently boosts performance in both the full and low-data regimes. Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 16
  • 17. 4.7. SGD for training linear model takes long time to converge • Previous works use short training time • Investigate the importance of the SGD optimization schedule for training logistic regression • The first decay has a large influence on the final accuracy Experiments and Results PR-222: Revisiting Self-Supervised Visual Representation Learning 17
  • 18. Revisit previously proposed self-supervised models and conduct a large-scale study • Architecture design in the fully-supervised setting necessarily do not translate to the self-supervised setting (VGG19-BN) • Using skip-connections can achieve consistently good results in contrast to AlexNet • Widening factor of CNNs has a drastic effect on performance of self-supervised techniques • SGD training of linear logistic regression require very long time to converge • Ranking of architectures  X → Ranking of methods Conclusion PR-222: Revisiting Self-Supervised Visual Representation Learning 18