SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
CNN Popular Architectures
and Transfer Learning
Palacode Narayana Iyer Anantharaman
16th Oct 2018
Motivation : Why study these?
Understanding popular architectures help us in many ways:
• We develop a better understanding of the subject by studying high performant
architectures
• This helps perform our own research on newer models
• Learning their design philosophy help us to design our models more effectively.
• Use them as a backbone for transfer learning, selecting the right architecture
References
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/
Current trend: Deeper Models work better
• CNNs consistently outperform other
approaches for the core tasks of CV
• Deeper models work better
• Increasing the number of parameters in layers
of CNN without increasing their depth is not
effective at increasing test set performance.
• Shallow models overfit at around 20 million
parameters while deep ones can benefit from
having over 60 million.
• Key insight: Model performs better when it is
architected to reflect composition of simpler
functions than a single complex function. This
may also be explained off viewing the
computation as a chain of dependencies
VGG Net
VGG net
ResNet
Recent Results (Credits: CS231n Stanford)
Resnet Motivation
ResNet Approach
• Generally: Deeper the better - See the fig
• Issue: Hard to train deeper networks effectively :
Validation error goes up not just due to
overfitting but increase in training error
• Solution: Use skip connections to propagate the
activations to reduce the impact
• Rationale: Imagine how to get an identity output
through a deep network accurately.
ResNet Hypothesis : Rationale
• Deeper models should perform at least as well as shallower models
• More parameters, more degrees of freedom to get the training error down
• More depth, more ability to model abstractions
• Solution by construction is copying the learned layers of a shallower
model and setting the additional layers to identity mapping
ResNet
• Very deep network: Uses 152 layers
• Shallower versions (e.g. Resnet50) are
available
• The “go to” backbone network for many
applications such as Faster RCNN
• Pre trained weights are available for
Keras, TensorFlow
ResNet Details
• Default input size: (224, 224, 3)
• Each stage reduces the width, height dimensions by a factor of 2
• This property is leveraged in later implementations such as pyramidal networks
ImageNet Results summary table
GoogleNet
• Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each
convolution layer?
• Why not try all of them and choose the best?
• Trying each possible value on every layer and experimenting manually is not a solution
• Inception layer allows multiple filter sizes and learn their contributions through
parameters automatically
Inception Layer Naïve Architecture
Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
Inception Architecture with bottleneck layer
Bottleneck Layer
Example
• Consider an input volume (28, 28, 192) and an output volume (28, 28, 32)
• How many computations are needed if we use a 5 x 5 filter?
• Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and
we have 32 of them
• 28 x 28 x 32 x 5 x 5 x 192 = 120M
• If we need multiple such filters, we need to add up corresponding computations
for each of them
• On a very deep network these many computations are prohibitively large even
when we use powerful hardware
• By reducing the dimensionality of the input before final convolutions, we get a
manageable number of computations
Example with bottleneck layer
• In our example, we can transform the 28 x 28 x 192 in to same sized surface but
much reduced depth (say 16) using 1 x 1 convolutions
• The bottleneck layer has the shape (28 x 28 x 16)
• Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate
the final output volume
• Computations: #computations between input to bottleneck layer +
#computations between bottleneck to output. In our example this is 12M
GoogleNet architecture with Inception layer
State of the art : SENet
• ImageNet 2017 topper in multiple categories
• A novel technique to weight the contributions of the channels of a
convolutional layer
SeNet : Squeeze and Excitation Network
• SeNet is the winning architecture of ImageNet 2017 in multiple categories
• Error rate on image classification: 2.251%
• Key Idea:
• In authors’ words: “Improve the representational power of the network by explicitly
modelling interdependencies between channels of its convolutional features”
• Simple explanation: Add parameters to each channel of a convolutional block so that network
can adaptively adjust the weighting of each feature map
SeNet Rationale
• Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower
layers have higher resolution and can extract basic elements of information
• Higher layers can detect faces or generate text etc and deal with abstract information
• All of this works by fusing the spatial and channel information of an image.
• The network weights each of its channels equally when creating the output feature maps.
• SENets change this by adding a content aware mechanism to weight each channel
adaptively. In it’s most basic form this could mean adding a single parameter to each
channel and giving it a linear scalar how relevant each one is.
SENet Architecture
• Get a global understanding of each channel by squeezing the feature maps to a
single numeric value. This results in a vector of size n, where n is equal to the
number of convolutional channels.
• Afterwards, it is fed through a two-layer neural network, which outputs a vector
of the same size. These n values can now be used as weights on the original
features maps, scaling each channel based on its importance.
Code Illustration of the key idea
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
High level steps
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
Adding squeeze excitation technique to ResNet

Mais conteúdo relacionado

Mais procurados

CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketakiKetaki Patwari
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 

Mais procurados (20)

CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Cnn method
Cnn methodCnn method
Cnn method
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
AlexNet
AlexNetAlexNet
AlexNet
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
AlexNet.pptx
AlexNet.pptxAlexNet.pptx
AlexNet.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 

Semelhante a Convolutional Neural Networks : Popular Architectures

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksmilad abbasi
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]SubhradeepMaji
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture SearchDaeJin Kim
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...thanhdowork
 

Semelhante a Convolutional Neural Networks : Popular Architectures (20)

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
TensorFlow.pptx
TensorFlow.pptxTensorFlow.pptx
TensorFlow.pptx
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 

Mais de ananth

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligenceananth
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligenceananth
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1ananth
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overviewananth
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distanceananth
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 

Mais de ananth (20)

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Convolutional Neural Networks : Popular Architectures

  • 1. CNN Popular Architectures and Transfer Learning Palacode Narayana Iyer Anantharaman 16th Oct 2018
  • 2. Motivation : Why study these? Understanding popular architectures help us in many ways: • We develop a better understanding of the subject by studying high performant architectures • This helps perform our own research on newer models • Learning their design philosophy help us to design our models more effectively. • Use them as a backbone for transfer learning, selecting the right architecture
  • 4. Current trend: Deeper Models work better • CNNs consistently outperform other approaches for the core tasks of CV • Deeper models work better • Increasing the number of parameters in layers of CNN without increasing their depth is not effective at increasing test set performance. • Shallow models overfit at around 20 million parameters while deep ones can benefit from having over 60 million. • Key insight: Model performs better when it is architected to reflect composition of simpler functions than a single complex function. This may also be explained off viewing the computation as a chain of dependencies
  • 5.
  • 9. Recent Results (Credits: CS231n Stanford)
  • 11. ResNet Approach • Generally: Deeper the better - See the fig • Issue: Hard to train deeper networks effectively : Validation error goes up not just due to overfitting but increase in training error • Solution: Use skip connections to propagate the activations to reduce the impact • Rationale: Imagine how to get an identity output through a deep network accurately.
  • 12. ResNet Hypothesis : Rationale • Deeper models should perform at least as well as shallower models • More parameters, more degrees of freedom to get the training error down • More depth, more ability to model abstractions • Solution by construction is copying the learned layers of a shallower model and setting the additional layers to identity mapping
  • 13. ResNet • Very deep network: Uses 152 layers • Shallower versions (e.g. Resnet50) are available • The “go to” backbone network for many applications such as Faster RCNN • Pre trained weights are available for Keras, TensorFlow
  • 14. ResNet Details • Default input size: (224, 224, 3) • Each stage reduces the width, height dimensions by a factor of 2 • This property is leveraged in later implementations such as pyramidal networks
  • 16. GoogleNet • Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each convolution layer? • Why not try all of them and choose the best? • Trying each possible value on every layer and experimenting manually is not a solution • Inception layer allows multiple filter sizes and learn their contributions through parameters automatically
  • 17. Inception Layer Naïve Architecture
  • 18. Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
  • 19. Inception Architecture with bottleneck layer
  • 21. Example • Consider an input volume (28, 28, 192) and an output volume (28, 28, 32) • How many computations are needed if we use a 5 x 5 filter? • Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and we have 32 of them • 28 x 28 x 32 x 5 x 5 x 192 = 120M • If we need multiple such filters, we need to add up corresponding computations for each of them • On a very deep network these many computations are prohibitively large even when we use powerful hardware • By reducing the dimensionality of the input before final convolutions, we get a manageable number of computations
  • 22. Example with bottleneck layer • In our example, we can transform the 28 x 28 x 192 in to same sized surface but much reduced depth (say 16) using 1 x 1 convolutions • The bottleneck layer has the shape (28 x 28 x 16) • Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate the final output volume • Computations: #computations between input to bottleneck layer + #computations between bottleneck to output. In our example this is 12M
  • 23. GoogleNet architecture with Inception layer
  • 24. State of the art : SENet • ImageNet 2017 topper in multiple categories • A novel technique to weight the contributions of the channels of a convolutional layer
  • 25. SeNet : Squeeze and Excitation Network • SeNet is the winning architecture of ImageNet 2017 in multiple categories • Error rate on image classification: 2.251% • Key Idea: • In authors’ words: “Improve the representational power of the network by explicitly modelling interdependencies between channels of its convolutional features” • Simple explanation: Add parameters to each channel of a convolutional block so that network can adaptively adjust the weighting of each feature map
  • 26. SeNet Rationale • Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower layers have higher resolution and can extract basic elements of information • Higher layers can detect faces or generate text etc and deal with abstract information • All of this works by fusing the spatial and channel information of an image. • The network weights each of its channels equally when creating the output feature maps. • SENets change this by adding a content aware mechanism to weight each channel adaptively. In it’s most basic form this could mean adding a single parameter to each channel and giving it a linear scalar how relevant each one is.
  • 27. SENet Architecture • Get a global understanding of each channel by squeezing the feature maps to a single numeric value. This results in a vector of size n, where n is equal to the number of convolutional channels. • Afterwards, it is fed through a two-layer neural network, which outputs a vector of the same size. These n values can now be used as weights on the original features maps, scaling each channel based on its importance.
  • 28. Code Illustration of the key idea Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 29. High level steps Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 30. Adding squeeze excitation technique to ResNet