SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
SqueezeNext:
Hardware-Aware Neural Network Design
+
AmirGholami, et al., “SqueezeNext: Hardware-Aware Neural Network Design”, CVPR 2018
Forrest N. Iandola, et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”, ICLR 2017
AlexanderWong, et al., “NetScore:Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks
for PracticalOn-Device Edge Usage”, arxiv:1806.05512
24th February, 2019
PR12 Paper Review
JinWon Lee
Samsung Electronics
SqueezeNet:
AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
NetScore:
Towards Universal Metrics for Large-scale Performance Analysis of Deep
Neural Networks for Practical On-Device Edge Usage
Related Papers in PR12
• MobileNet
 PR-044: https://youtu.be/7UoOFKcyIvM
• MobileNetV2
 PR-108: https://youtu.be/mT5Y-Zumbbw
• ShuffleNet
 PR-054: https://youtu.be/pNuBdj53Hbc
• ShuffleNetV2
 PR-120: https://youtu.be/lrU6uXiJ_9Y
CNN Benchmark from “NetScore”
NetScore
Introduction
• Much of the focus in the design of deep neural networks has been on
improving accuracy, leading to more powerful yet highly complex
network architectures.
• But, they are difficult to deploy in practical scenarios, particularly on
edge devices such as mobile and other consumer devices.
• The design of deep neural networks that strike a balance between
accuracy and complexity has become a very hot area of research
focus.
Information Density
• One of the most widely cited metrics in research literature for
assessing the performance of DNNs that accounts for both accuracy
and architectural complexity.
 D(N) : Information Density
 a(N) : Accuracy
 p(N) :The Number of Parameters
• The information density metric does not account for the fact that the
architecture complexity does not necessarily reflect the
computational requirements for performing network inference.
• Designed specifically to provide a quantitative assessment of the
balance between accuracy, computational complexity, and network
architecture complexity of a DNN.
 Ω(N) : NetScore
 a(N) : accuracy (Top-1 accuracy of ILSVRC 2012 dataset)
 p(N) : the number of parameters in the network
 m(N) : the number of multiply-accumulate(MAC) operations during inference
 α = 2, β = 0.5, γ = 0.5
Architectural and computational complexity are both very important factors.
But, the most important metric remains accuracy given that networks with
unreasonably low model accuracy are not useful in practical scenarios regardless of size
and speed.
NetScore
Logarithmic scaling to account
for large dynamic range,
inspired by the decibel scale of
signal processing
SqueezeNet
SqueezeNet
• Architectural Design Startegies
1. Replace 3x3 filters with 1x1 filters
2. Decrease the number of input channels to 3x3 filters
Total quantity of parameters in 3x3 conv layer is (number of input channels) x (number
of filters) x (3x3)
3. Downsample late in the network so that convolution layers have large
activation maps
large activation maps (due to delayed downsampling) can lead to higher classification
accuracy
• Strategies 1 and 2 are about judiciously decreasing the quantity of
parameters in a CNN while attempting to preserve accuracy.
• Strategy 3 is about maximizing accuracy on a limited budget of
parameters.
The Fire Module
MacroarchitecturalView
SqueezeNetArchitecture
CNN Microarchitecture Metaparameters
CNN Macroarchitecture Design Space Exploration
Best performance
Results
Network Pruning & Deep Compression
PR-072: Deep Compression byTaeoh Kim
https://youtu.be/9mFZmpIbMDs
The Impact of SqueezeNet
• SqueezeDet & SqueezeSeg
SqueezeNext
Motivation
• A general trend of neural network design has been to find larger and
deeper models to get better accuracy without considering the
memory or power budget.
• However, increase in transistor speed due to semiconductor process
improvements has slowed dramatically, and it seems unlikely that
mobile processors will meet computational requirements on a
limited power budget.
Contributions
• Use a more aggressive channel reduction by incorporating a two-
stage squeeze module.
• Use separable 3x3 convolutions to further reduce the model size, and
remove the additional 1x1 branch after the squeeze module.
• Use an element-wise addition skip connection similar to that of
ResNet architecture.
• Optimize the baseline SqueezeNext architecture by simulating its
performance on a multi-processor embedded system.
Design – Low Rank Filters
• Decompose the K x K convolutions into two separable convolutions
of size 1 x K and K x 1
• This effectively reduces the number of parameters from K2 to 2K,
and also increases the depth of the network.
Design – Bottleneck Module
• Use a variation of bottleneck approach by using a two stage squeeze
layer
• Use two bottleneck modules each reducing the channel size by a
factor of 2, which is followed by two separable convolutions
• Also incorporate a final 1 x 1 expansion module, which further
reduces the number of output channels for the separable
convolutions.
Design – Fully Connected Layers
• In the case of AlexNet, the majority of the network parameters are in
Fully Connected layers, accounting for 96% of the total model size.
• SqueezeNext incorporates a final bottleneck layer to reduce the
input channel size to the last fully connected layer, which
considerably reduces the total number of model parameters.
Comparison of Building Blocks
SqueezeNext Block
Block Arrangement in 1.0-SqNxt-23
Breakdown of the
1.0-SqNxt-23
architecture
6
6
8
1For skip connection
Hardware Platform
• Weight Stationary & Output Stationary
• The x and y loops form the innermost
loop in theWS data flow, whereas the c, i,
and j loops form the innermost loop in
the OS data flow
Hardware Simulation Setup
• 16x16 or 8x8 array of PEs.
• A 128KB or 32KB global buffer and a
DMA controller to transfer data between
DRAM and the buffer.
• A PE has a 16-bit integer multiply-and-
accumulate(MAC) unit and a local
register file.
• The performance estimator computes
the number of clock cycles required to
process each layer and sums all the
results.
Classification Performance Results
• 23 module architecture exceeds AlexNet’s performance with a 2% margin
with 84x smaller number of parameters.
• The version with twice the width and 44 modules(2.0-SqNxt-44) is able to
matchVGG-19’s performance with 31x smaller number of parameters.
Hardware Performance Results
SqueezeNext v2~v5
• In the 1.0-SqNxt-23, the first 7 x 7 convolutional layer accounts for 26% of the
total inference time.
• Therefore, the first optimization we make is replacing this 7 x 7 layer with a 5
x 5 convolution, and construct 1.0-SqNxt-23-v2 model.
• Note the significant drop in efficiency for the layers in the first module.The
reason for this drop is that the initial layers have very small number of
channels which needs to be applied a large input activation map.
• In the v3/v4 variation, authors reduce the number of the blocks in the first
module by 2/4 and instead add it to the second module, respectively. In the
v5 variation, authors reduce the blocks of the first two modules and instead
increase the blocks in the third module.
Results
1.0-SqNxt-23v5
Results
Further Discussion
• What are we trying to get by reducing the number of computations
and the number of parameters?
• In many cases it will be speed or low energy.
• Then, can small number of computations and fewer parameters
guarantee speed or lower energy?
Speed and the Number of Computations
From ShuffleNetV2
Energy/Power Efficiency and the Number of
Parameters
SlideCredit : “How to Estimate the Energy Consumption of DNNs”
byTien-JuYang(MIT)
SlideCredit : Movidius @Hotchips 2016
Key Insights of Energy Consumption
Slide Credit : “How to Estimate the Energy Consumption of DNNs “ byTien-JuYang(MIT)
Thank you

Mais conteúdo relacionado

Mais procurados

Model compression
Model compressionModel compression
Model compressionNanhee Kim
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIQualcomm Research
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Basic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDLBasic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDLJoohan KIM
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Ashwin Shroff
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
Low Power VLSI Design Presentation_final
Low Power VLSI Design Presentation_finalLow Power VLSI Design Presentation_final
Low Power VLSI Design Presentation_finalJITENDER -
 

Mais procurados (20)

Model compression
Model compressionModel compression
Model compression
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AI
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Basic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDLBasic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDL
 
Vgg
VggVgg
Vgg
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Low Power VLSI Design Presentation_final
Low Power VLSI Design Presentation_finalLow Power VLSI Design Presentation_final
Low Power VLSI Design Presentation_final
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 

Semelhante a PR-144: SqueezeNext: Hardware-Aware Neural Network Design

Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterIRJET Journal
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
 
Quantiphi squeezenets
Quantiphi   squeezenetsQuantiphi   squeezenets
Quantiphi squeezenetsAnup Joseph
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
Low Power and Area Efficient Multiplier Layout using Transmission Gate
Low Power and Area Efficient Multiplier Layout using Transmission GateLow Power and Area Efficient Multiplier Layout using Transmission Gate
Low Power and Area Efficient Multiplier Layout using Transmission GateIJEEE
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...VIT-AP University
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...IRJET Journal
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]taeseon ryu
 
An Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS GateAn Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS GateIJERA Editor
 
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
 BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I... BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...Nexgen Technology
 
Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014nexgentech
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processingare you
 

Semelhante a PR-144: SqueezeNext: Hardware-Aware Neural Network Design (20)

Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
MobileNet V3
MobileNet V3MobileNet V3
MobileNet V3
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Design and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip RouterDesign and Performance Analysis of 8 x 8 Network on Chip Router
Design and Performance Analysis of 8 x 8 Network on Chip Router
 
GoogLeNet.pptx
GoogLeNet.pptxGoogLeNet.pptx
GoogLeNet.pptx
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemes
 
Quantiphi squeezenets
Quantiphi   squeezenetsQuantiphi   squeezenets
Quantiphi squeezenets
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Low Power and Area Efficient Multiplier Layout using Transmission Gate
Low Power and Area Efficient Multiplier Layout using Transmission GateLow Power and Area Efficient Multiplier Layout using Transmission Gate
Low Power and Area Efficient Multiplier Layout using Transmission Gate
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
An Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS GateAn Area Efficient and High Speed Reversible Multiplier Using NS Gate
An Area Efficient and High Speed Reversible Multiplier Using NS Gate
 
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
 BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I... BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
BULK IEEE PROJECTS IN VLSI ,BULK IEEE PROJECTS, IEEE 2015-16 VLSI PROJECTS I...
 
Nexgen tech vlsi 2015 2014
Nexgen  tech vlsi 2015 2014Nexgen  tech vlsi 2015 2014
Nexgen tech vlsi 2015 2014
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 

Mais de Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

Mais de Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

PR-144: SqueezeNext: Hardware-Aware Neural Network Design

  • 1. SqueezeNext: Hardware-Aware Neural Network Design + AmirGholami, et al., “SqueezeNext: Hardware-Aware Neural Network Design”, CVPR 2018 Forrest N. Iandola, et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”, ICLR 2017 AlexanderWong, et al., “NetScore:Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks for PracticalOn-Device Edge Usage”, arxiv:1806.05512 24th February, 2019 PR12 Paper Review JinWon Lee Samsung Electronics SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size NetScore: Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage
  • 2. Related Papers in PR12 • MobileNet  PR-044: https://youtu.be/7UoOFKcyIvM • MobileNetV2  PR-108: https://youtu.be/mT5Y-Zumbbw • ShuffleNet  PR-054: https://youtu.be/pNuBdj53Hbc • ShuffleNetV2  PR-120: https://youtu.be/lrU6uXiJ_9Y
  • 3. CNN Benchmark from “NetScore”
  • 5. Introduction • Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures. • But, they are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices. • The design of deep neural networks that strike a balance between accuracy and complexity has become a very hot area of research focus.
  • 6. Information Density • One of the most widely cited metrics in research literature for assessing the performance of DNNs that accounts for both accuracy and architectural complexity.  D(N) : Information Density  a(N) : Accuracy  p(N) :The Number of Parameters • The information density metric does not account for the fact that the architecture complexity does not necessarily reflect the computational requirements for performing network inference.
  • 7. • Designed specifically to provide a quantitative assessment of the balance between accuracy, computational complexity, and network architecture complexity of a DNN.  Ω(N) : NetScore  a(N) : accuracy (Top-1 accuracy of ILSVRC 2012 dataset)  p(N) : the number of parameters in the network  m(N) : the number of multiply-accumulate(MAC) operations during inference  α = 2, β = 0.5, γ = 0.5 Architectural and computational complexity are both very important factors. But, the most important metric remains accuracy given that networks with unreasonably low model accuracy are not useful in practical scenarios regardless of size and speed. NetScore Logarithmic scaling to account for large dynamic range, inspired by the decibel scale of signal processing
  • 8.
  • 9.
  • 11. SqueezeNet • Architectural Design Startegies 1. Replace 3x3 filters with 1x1 filters 2. Decrease the number of input channels to 3x3 filters Total quantity of parameters in 3x3 conv layer is (number of input channels) x (number of filters) x (3x3) 3. Downsample late in the network so that convolution layers have large activation maps large activation maps (due to delayed downsampling) can lead to higher classification accuracy • Strategies 1 and 2 are about judiciously decreasing the quantity of parameters in a CNN while attempting to preserve accuracy. • Strategy 3 is about maximizing accuracy on a limited budget of parameters.
  • 16. CNN Macroarchitecture Design Space Exploration Best performance
  • 18. Network Pruning & Deep Compression PR-072: Deep Compression byTaeoh Kim https://youtu.be/9mFZmpIbMDs
  • 19. The Impact of SqueezeNet • SqueezeDet & SqueezeSeg
  • 21. Motivation • A general trend of neural network design has been to find larger and deeper models to get better accuracy without considering the memory or power budget. • However, increase in transistor speed due to semiconductor process improvements has slowed dramatically, and it seems unlikely that mobile processors will meet computational requirements on a limited power budget.
  • 22. Contributions • Use a more aggressive channel reduction by incorporating a two- stage squeeze module. • Use separable 3x3 convolutions to further reduce the model size, and remove the additional 1x1 branch after the squeeze module. • Use an element-wise addition skip connection similar to that of ResNet architecture. • Optimize the baseline SqueezeNext architecture by simulating its performance on a multi-processor embedded system.
  • 23. Design – Low Rank Filters • Decompose the K x K convolutions into two separable convolutions of size 1 x K and K x 1 • This effectively reduces the number of parameters from K2 to 2K, and also increases the depth of the network.
  • 24. Design – Bottleneck Module • Use a variation of bottleneck approach by using a two stage squeeze layer • Use two bottleneck modules each reducing the channel size by a factor of 2, which is followed by two separable convolutions • Also incorporate a final 1 x 1 expansion module, which further reduces the number of output channels for the separable convolutions.
  • 25. Design – Fully Connected Layers • In the case of AlexNet, the majority of the network parameters are in Fully Connected layers, accounting for 96% of the total model size. • SqueezeNext incorporates a final bottleneck layer to reduce the input channel size to the last fully connected layer, which considerably reduces the total number of model parameters.
  • 28. Block Arrangement in 1.0-SqNxt-23
  • 30. Hardware Platform • Weight Stationary & Output Stationary • The x and y loops form the innermost loop in theWS data flow, whereas the c, i, and j loops form the innermost loop in the OS data flow
  • 31. Hardware Simulation Setup • 16x16 or 8x8 array of PEs. • A 128KB or 32KB global buffer and a DMA controller to transfer data between DRAM and the buffer. • A PE has a 16-bit integer multiply-and- accumulate(MAC) unit and a local register file. • The performance estimator computes the number of clock cycles required to process each layer and sums all the results.
  • 32. Classification Performance Results • 23 module architecture exceeds AlexNet’s performance with a 2% margin with 84x smaller number of parameters. • The version with twice the width and 44 modules(2.0-SqNxt-44) is able to matchVGG-19’s performance with 31x smaller number of parameters.
  • 34. SqueezeNext v2~v5 • In the 1.0-SqNxt-23, the first 7 x 7 convolutional layer accounts for 26% of the total inference time. • Therefore, the first optimization we make is replacing this 7 x 7 layer with a 5 x 5 convolution, and construct 1.0-SqNxt-23-v2 model. • Note the significant drop in efficiency for the layers in the first module.The reason for this drop is that the initial layers have very small number of channels which needs to be applied a large input activation map. • In the v3/v4 variation, authors reduce the number of the blocks in the first module by 2/4 and instead add it to the second module, respectively. In the v5 variation, authors reduce the blocks of the first two modules and instead increase the blocks in the third module.
  • 38. Further Discussion • What are we trying to get by reducing the number of computations and the number of parameters? • In many cases it will be speed or low energy. • Then, can small number of computations and fewer parameters guarantee speed or lower energy?
  • 39. Speed and the Number of Computations From ShuffleNetV2
  • 40. Energy/Power Efficiency and the Number of Parameters SlideCredit : “How to Estimate the Energy Consumption of DNNs” byTien-JuYang(MIT) SlideCredit : Movidius @Hotchips 2016
  • 41. Key Insights of Energy Consumption Slide Credit : “How to Estimate the Energy Consumption of DNNs “ byTien-JuYang(MIT)