SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
EfficientDet:
Scalable and Efficient Object Detection
Mingxing Tan, et al., “EfficientDet: Scalable and Efficient Object Detection
5th January, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
References
• Hoya012’s Research Blog
 https://hoya012.github.io/blog/EfficientDet-Review/
• PR12 – EfficientNet (PR-169)
 https://youtu.be/Vhz0quyvR7I
• PR12 – NAS-FPN (PR-166)
 https://youtu.be/FAAt0jejWOA
EfficientDet
Intro.
• State-of-the-art object detectors become increasingly more
expensive.
 AmoebaNet-based NAS-FPN detector requires 167M parameters and 3045B
FLOPS(30x more than RetinaNet)
• Given real-world resource constraints such as robotics and self-
driving cars, model efficiency becomes increasingly important for
object detection.
• Although previous works tend to achieve better efficiency, they
usually sacrifice accuracy and they only focus on a specific or a small
range of resource requirements.
A Natural Question
Is it possible to build a scalable detection architecture with
both higher accuracy and better efficiency across a wide
spectrum of resource constraints?
Two Challenges
1. Efficient multi-scale feature fusion
 FPN has been widely used for multi-scale feature fusion.
 PANet, NAS-FPN and other studies have developed more network structures
for cross-scale feature fusion.
 Most previous works simply sum them up without distinction.
 However, they usually contribute to the fused output feature unequally.
2. Model Scaling
 Inspired by EfficientNet, the authors propose a compound scaling method
for object detectors, which jointly scales up the resolution/depth/width for all
backbone, feature network, box/class prediction network.
• Combining EfficientNet backbones with BiFPN and compound
scaling  EfficientDet
Contributions
• Propose BiFPN, a weighted bidirectional feature network for easy
and fast multi-scale feature fusion.
• Propose a new compound scaling method, which jointly scales up
backbone, feature network, box/class network, and resolution, in a
principled way.
• Develop EfficientDet, a new family of one-stage detectors with
significantly better accuracy and efficiency across a wide spectrum of
resource constraints.
BiFPN – Problem Formulation
• Formaly, given a list of multi-scale features 𝑃 𝑖𝑛 = (𝑃𝑙1
𝑖𝑛
, 𝑃𝑙2
𝑖𝑛
, … ),
where 𝑃𝑙 𝑖
𝑖𝑛
represents the feature at level 𝑙𝑖, the goal is to find a
transformation 𝑓 that can effectively aggregate different features
and output a list of new features: 𝑃 𝑜𝑢𝑡 = 𝑓(𝑃 𝑖𝑛)
BiFPN – Problem Formulation
• FPN takes level 3-7 input features 𝑃 𝑖𝑛
= (𝑃3
𝑖𝑛
, … , 𝑃7
𝑖𝑛
), where 𝑃𝑖
𝑖𝑛
represents
a feature level with resolution of Τ1 2𝑖 of the input images.
• For instance, if input resolution is 640x640, then 𝑃3
𝑖𝑛
represents feature level
3 (640/23 = 80) with resolution 80x80, while 𝑃7
𝑖𝑛
represents feature level 7
(640/27 = 5) with resolution 5x5
• The conventional FPN aggregates multi-scale features
In a top-down manner:
𝑃7
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣(𝑃7
𝑖𝑛
)
𝑃6
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣 𝑃6
𝑖𝑛
+ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7
𝑜𝑢𝑡
…
𝑃3
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣 𝑃3
𝑖𝑛
+ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃4
𝑜𝑢𝑡
BiFPN – Cross-Scale Connections
• PANet adds an extra bottom-up path aggregation network.
• NAS-FPN employs neural architecture search to search for better
cross-scale feature network topology, but it requires thousands of
GPU hours during search and the found network is irregular and
difficult to interpret or modify.
PANet
• Augmenting a top-down path propagates semantically strong
features and enhances all features with reasonable classification
capability in FPN
• Augmenting a bottom-up path of low-level patterns based on the
fact that high response to edges or instance parts is a strong
indicator to accurately localize instances.
NAS-FPN
• Adopt Neural Architecture Search and discover a
new feature pyramid architecture in a novel
scalable search space covering all cross-scale
connections.
• The discovered architecture, named NAS-FPN,
consists of a combination of top-down and
bottom-up connections to fuse features across
scales.
BiFPN – Cross-Scale Connections
• PANet achieves better
accuracy then FPN and NAS-
FPN, but with the cost of more
parameters and computations.
• First, removing those nodes
that only have one input edge
with no feature fusion, then it
will have less contribution to
feature network that aims at
fusing different features.
BiFPN – Cross-Scale Connections
• Second, adding an extra edge from the
original input to output node if they are
at the same level, in order to fuse more
features without adding much cost.
• Third, unlike PANet that only has one
top-down and one bottom-up path, we
treat each bidirectional (top-down &
bottom-up) path as one feature network
layer, and repeat the same layer multiple
times to enable more high-level feature
fusion.
BiFPN –Weighted Feature Fusion
• When fusing multiple input features with
different resolutions, a common way is to first
resize them to the same resolution and then
sum them up.
• Pyramid attention network introduces global
self-attention upsampling to recover pixel
localization.
• Since different input features are at different
resolutions, they usually contribute to the
output feature unequally. So, adding an
additional weight for each input during feature
fusion, making the network to learn the
importance of each input feature.
Unbounded Fusion
𝑂 = ෍
𝑖
𝑤𝑖 ∙ 𝐼𝑖
• where 𝑤𝑖 is a learnable weight that can be a scalar (per-feature), a
vector(per-channel), or a multi-dimensional tensor (per-pixel).
• Authors find a scale can achieve comparable accuracy to other
approaches with minimal computational costs. However, since the
scalar weight is unbounded, it could potentially cause training
instability.
Softmax-Based Fusion
𝑂 = ෍
𝑖
𝑒 𝑤 𝑖
σ 𝑗 𝑒 𝑤 𝑗
∙ 𝐼𝑖
• An intuitive idea is to apply softmax to each weight, such that all
weights are normalized to be a probability with value range from 0 to
1, representing the importance of each input.
• However, the extra softmax leads to significant slowdown on GPU
hardware.
Fast Normalized Fusion
𝑂 = ෍
𝑖
𝑤𝑖
𝜖 + σ 𝑗 𝑤𝑗
∙ 𝐼𝑖
• where 𝑤𝑖 ≥ 0is ensured by applying a Relu after each 𝑤𝑖,
and 𝜖 = 0.0001 is a small value to avoid numerical instability.
• Ablation study shows this fast fusion approach has very similar
learning behavior and accuracy as the softmax-based fusion, but runs
up to 30% faster on GPUs .
BiFPN
• BiFPN integrates both the bidirectional cross-scale
connections and the fast normalized fusion.
𝑃6
𝑡𝑑
= 𝐶𝑜𝑛𝑣
𝑤1 ∙ 𝑃6
𝑖𝑛
+ 𝑤2 ∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7
𝑖𝑛
𝑤1 + 𝑤2 + 𝜖
𝑃6
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣(
𝑤1
′
∙ 𝑃6
𝑖𝑛
+ 𝑤2
′
∙ 𝑃6
𝑡𝑑
+ 𝑤3
′
∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃5
𝑜𝑢𝑡
𝑤1 + 𝑤2 + 𝜖
)
Where 𝑃6
𝑡𝑑
is the intermediate feature at level 6 on the
top-down pathway, and 𝑃6
𝑜𝑢𝑡
is the output feature at
level 6 on the bottom-up pathway.
• Depthwise separable convolution is used for feature
fusion.
𝑷 𝟔
𝒕𝒅
𝑷 𝟔
𝒐𝒖𝒕
𝑷 𝟓
𝒐𝒖𝒕
EfficientDet Architecture
• ImageNet-pretrained EfficientNet is employed as the backbone.
• The class and box network weights are shared across all levels of features.
EfficientNet – Compound Scaling
EfficientNet – Compound Scaling Method
• , ,  are constants that can be determined by a small grid search.
• Intuitively, 𝜙 is a user-specified coefficient that controls how many
more resources are available for model scaling.
EfficientNet – Compound Scaling
Method
• Notably, the FLOPS of a regular convolution op is proportional to d,
w2, r2.
 Doubling network depth will double FLOPS, but doubling network width or
resolution will increase FLOPS by four times. Since convolution ops usually
dominate the computation cost in ConvNets, scaling a ConvNet with above
equation will approximately increase total FLOPS by
• In this paper, total FLOPs approximately increase by
Compound Scaling
• Inspired by EfficientNet, a new compound scaling method which
uses a simple compound coefficient 𝜙 to jointly scale up all
dimensions of backbone network, BiFPN network, class/box network,
and resolution.
• Unfortunately, object detectors have much more scaling dimensions
than image classification models, so a heuristic-based scaling
approach is used.
Compound Scaling
• Backbone network
 Same width/depth scaling coefficient of EfficientNet-B0 to B6
• BiFPN network
 Exponentially grow BiFPN width 𝑊𝑏𝑖𝑓𝑝𝑛(#channels), but linearly increase depth
𝐷 𝑏𝑖𝑓𝑝𝑛(#layers) since depth needs to be rounded to small integers.
 𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙
, 𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙
• Box/class prediction network
 Fix their width to always the same as BiFPN(i.e., 𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛)
 But, linearly increase the depth(#layers) using equation:
 𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3
• Input image resolution
 Since feature level 3-7 are used in BiFPN, the input resolution must be dividable by
27=128
 𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128
Compound Scaling
Input resolution
𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128
Backbone Network
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑁𝑒𝑡 − 𝐵0~𝐵6
#layers
𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙
#channels
𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙
#layers
𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3
#channels
𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛
Experiments
Titan-V Single-thread Xeon
Experiments
Ablation Study
• Disentangling Backbone and BiFPN
• BiFPN Cross-Scale Connections
FPN and PANet only have one top-down or bottom-up flow so they were repeated 5 times
Ablation Study
• Softmax vs Fast Normalized Fusion
Ablation Study
• Compound Scaling
Conclusion
• Weighted bidirectional feature network and customized compound
scaling method are proposed.
• These are for improving both accuracy and efficiency.
• EfficientDet-D7 achieves state-of-the-art 51.0 mAP on COCO dataset
with 52M parameters and 326B FLOPs, being 4x smaller and using
9.3x fewer FLOPS yet still more accurate(+0.3% mAP) than the best
previous detector.
• EfficientDet is also up to 3.2x faster on GPUs and 8.1x faster on CPUs.

Mais conteúdo relacionado

Mais procurados

Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning Asma-AH
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearningAbhishek Sharma
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 

Mais procurados (20)

Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
MobileNet V3
MobileNet V3MobileNet V3
MobileNet V3
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Activation function
Activation functionActivation function
Activation function
 
Resnet
ResnetResnet
Resnet
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 

Semelhante a PR-217: EfficientDet: Scalable and Efficient Object Detection

Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71
 
artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxJohnPrasad14
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flowNaresh Kumar
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksParrotAI
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 
Scalable image recognition model with deep embedding
Scalable image recognition model with deep embeddingScalable image recognition model with deep embedding
Scalable image recognition model with deep embedding捷恩 蔡
 

Semelhante a PR-217: EfficientDet: Scalable and Efficient Object Detection (20)

Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptx
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Development of Deep Learning Architecture
Development of Deep Learning ArchitectureDevelopment of Deep Learning Architecture
Development of Deep Learning Architecture
 
SPPNet
SPPNetSPPNet
SPPNet
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flow
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
cnn.pdf
cnn.pdfcnn.pdf
cnn.pdf
 
Cnn
CnnCnn
Cnn
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
Scalable image recognition model with deep embedding
Scalable image recognition model with deep embeddingScalable image recognition model with deep embedding
Scalable image recognition model with deep embedding
 

Mais de Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

Mais de Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Último

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

PR-217: EfficientDet: Scalable and Efficient Object Detection

  • 1. EfficientDet: Scalable and Efficient Object Detection Mingxing Tan, et al., “EfficientDet: Scalable and Efficient Object Detection 5th January, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 2. References • Hoya012’s Research Blog  https://hoya012.github.io/blog/EfficientDet-Review/ • PR12 – EfficientNet (PR-169)  https://youtu.be/Vhz0quyvR7I • PR12 – NAS-FPN (PR-166)  https://youtu.be/FAAt0jejWOA
  • 4. Intro. • State-of-the-art object detectors become increasingly more expensive.  AmoebaNet-based NAS-FPN detector requires 167M parameters and 3045B FLOPS(30x more than RetinaNet) • Given real-world resource constraints such as robotics and self- driving cars, model efficiency becomes increasingly important for object detection. • Although previous works tend to achieve better efficiency, they usually sacrifice accuracy and they only focus on a specific or a small range of resource requirements.
  • 5. A Natural Question Is it possible to build a scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints?
  • 6. Two Challenges 1. Efficient multi-scale feature fusion  FPN has been widely used for multi-scale feature fusion.  PANet, NAS-FPN and other studies have developed more network structures for cross-scale feature fusion.  Most previous works simply sum them up without distinction.  However, they usually contribute to the fused output feature unequally. 2. Model Scaling  Inspired by EfficientNet, the authors propose a compound scaling method for object detectors, which jointly scales up the resolution/depth/width for all backbone, feature network, box/class prediction network. • Combining EfficientNet backbones with BiFPN and compound scaling  EfficientDet
  • 7. Contributions • Propose BiFPN, a weighted bidirectional feature network for easy and fast multi-scale feature fusion. • Propose a new compound scaling method, which jointly scales up backbone, feature network, box/class network, and resolution, in a principled way. • Develop EfficientDet, a new family of one-stage detectors with significantly better accuracy and efficiency across a wide spectrum of resource constraints.
  • 8. BiFPN – Problem Formulation • Formaly, given a list of multi-scale features 𝑃 𝑖𝑛 = (𝑃𝑙1 𝑖𝑛 , 𝑃𝑙2 𝑖𝑛 , … ), where 𝑃𝑙 𝑖 𝑖𝑛 represents the feature at level 𝑙𝑖, the goal is to find a transformation 𝑓 that can effectively aggregate different features and output a list of new features: 𝑃 𝑜𝑢𝑡 = 𝑓(𝑃 𝑖𝑛)
  • 9. BiFPN – Problem Formulation • FPN takes level 3-7 input features 𝑃 𝑖𝑛 = (𝑃3 𝑖𝑛 , … , 𝑃7 𝑖𝑛 ), where 𝑃𝑖 𝑖𝑛 represents a feature level with resolution of Τ1 2𝑖 of the input images. • For instance, if input resolution is 640x640, then 𝑃3 𝑖𝑛 represents feature level 3 (640/23 = 80) with resolution 80x80, while 𝑃7 𝑖𝑛 represents feature level 7 (640/27 = 5) with resolution 5x5 • The conventional FPN aggregates multi-scale features In a top-down manner: 𝑃7 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣(𝑃7 𝑖𝑛 ) 𝑃6 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣 𝑃6 𝑖𝑛 + 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7 𝑜𝑢𝑡 … 𝑃3 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣 𝑃3 𝑖𝑛 + 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃4 𝑜𝑢𝑡
  • 10. BiFPN – Cross-Scale Connections • PANet adds an extra bottom-up path aggregation network. • NAS-FPN employs neural architecture search to search for better cross-scale feature network topology, but it requires thousands of GPU hours during search and the found network is irregular and difficult to interpret or modify.
  • 11. PANet • Augmenting a top-down path propagates semantically strong features and enhances all features with reasonable classification capability in FPN • Augmenting a bottom-up path of low-level patterns based on the fact that high response to edges or instance parts is a strong indicator to accurately localize instances.
  • 12. NAS-FPN • Adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. • The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales.
  • 13. BiFPN – Cross-Scale Connections • PANet achieves better accuracy then FPN and NAS- FPN, but with the cost of more parameters and computations. • First, removing those nodes that only have one input edge with no feature fusion, then it will have less contribution to feature network that aims at fusing different features.
  • 14. BiFPN – Cross-Scale Connections • Second, adding an extra edge from the original input to output node if they are at the same level, in order to fuse more features without adding much cost. • Third, unlike PANet that only has one top-down and one bottom-up path, we treat each bidirectional (top-down & bottom-up) path as one feature network layer, and repeat the same layer multiple times to enable more high-level feature fusion.
  • 15. BiFPN –Weighted Feature Fusion • When fusing multiple input features with different resolutions, a common way is to first resize them to the same resolution and then sum them up. • Pyramid attention network introduces global self-attention upsampling to recover pixel localization. • Since different input features are at different resolutions, they usually contribute to the output feature unequally. So, adding an additional weight for each input during feature fusion, making the network to learn the importance of each input feature.
  • 16. Unbounded Fusion 𝑂 = ෍ 𝑖 𝑤𝑖 ∙ 𝐼𝑖 • where 𝑤𝑖 is a learnable weight that can be a scalar (per-feature), a vector(per-channel), or a multi-dimensional tensor (per-pixel). • Authors find a scale can achieve comparable accuracy to other approaches with minimal computational costs. However, since the scalar weight is unbounded, it could potentially cause training instability.
  • 17. Softmax-Based Fusion 𝑂 = ෍ 𝑖 𝑒 𝑤 𝑖 σ 𝑗 𝑒 𝑤 𝑗 ∙ 𝐼𝑖 • An intuitive idea is to apply softmax to each weight, such that all weights are normalized to be a probability with value range from 0 to 1, representing the importance of each input. • However, the extra softmax leads to significant slowdown on GPU hardware.
  • 18. Fast Normalized Fusion 𝑂 = ෍ 𝑖 𝑤𝑖 𝜖 + σ 𝑗 𝑤𝑗 ∙ 𝐼𝑖 • where 𝑤𝑖 ≥ 0is ensured by applying a Relu after each 𝑤𝑖, and 𝜖 = 0.0001 is a small value to avoid numerical instability. • Ablation study shows this fast fusion approach has very similar learning behavior and accuracy as the softmax-based fusion, but runs up to 30% faster on GPUs .
  • 19. BiFPN • BiFPN integrates both the bidirectional cross-scale connections and the fast normalized fusion. 𝑃6 𝑡𝑑 = 𝐶𝑜𝑛𝑣 𝑤1 ∙ 𝑃6 𝑖𝑛 + 𝑤2 ∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7 𝑖𝑛 𝑤1 + 𝑤2 + 𝜖 𝑃6 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣( 𝑤1 ′ ∙ 𝑃6 𝑖𝑛 + 𝑤2 ′ ∙ 𝑃6 𝑡𝑑 + 𝑤3 ′ ∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃5 𝑜𝑢𝑡 𝑤1 + 𝑤2 + 𝜖 ) Where 𝑃6 𝑡𝑑 is the intermediate feature at level 6 on the top-down pathway, and 𝑃6 𝑜𝑢𝑡 is the output feature at level 6 on the bottom-up pathway. • Depthwise separable convolution is used for feature fusion. 𝑷 𝟔 𝒕𝒅 𝑷 𝟔 𝒐𝒖𝒕 𝑷 𝟓 𝒐𝒖𝒕
  • 20. EfficientDet Architecture • ImageNet-pretrained EfficientNet is employed as the backbone. • The class and box network weights are shared across all levels of features.
  • 22. EfficientNet – Compound Scaling Method • , ,  are constants that can be determined by a small grid search. • Intuitively, 𝜙 is a user-specified coefficient that controls how many more resources are available for model scaling.
  • 23. EfficientNet – Compound Scaling Method • Notably, the FLOPS of a regular convolution op is proportional to d, w2, r2.  Doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times. Since convolution ops usually dominate the computation cost in ConvNets, scaling a ConvNet with above equation will approximately increase total FLOPS by • In this paper, total FLOPs approximately increase by
  • 24. Compound Scaling • Inspired by EfficientNet, a new compound scaling method which uses a simple compound coefficient 𝜙 to jointly scale up all dimensions of backbone network, BiFPN network, class/box network, and resolution. • Unfortunately, object detectors have much more scaling dimensions than image classification models, so a heuristic-based scaling approach is used.
  • 25. Compound Scaling • Backbone network  Same width/depth scaling coefficient of EfficientNet-B0 to B6 • BiFPN network  Exponentially grow BiFPN width 𝑊𝑏𝑖𝑓𝑝𝑛(#channels), but linearly increase depth 𝐷 𝑏𝑖𝑓𝑝𝑛(#layers) since depth needs to be rounded to small integers.  𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙 , 𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙 • Box/class prediction network  Fix their width to always the same as BiFPN(i.e., 𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛)  But, linearly increase the depth(#layers) using equation:  𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3 • Input image resolution  Since feature level 3-7 are used in BiFPN, the input resolution must be dividable by 27=128  𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128
  • 26. Compound Scaling Input resolution 𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128 Backbone Network 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑁𝑒𝑡 − 𝐵0~𝐵6 #layers 𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙 #channels 𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙 #layers 𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3 #channels 𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛
  • 29. Ablation Study • Disentangling Backbone and BiFPN • BiFPN Cross-Scale Connections FPN and PANet only have one top-down or bottom-up flow so they were repeated 5 times
  • 30. Ablation Study • Softmax vs Fast Normalized Fusion
  • 32. Conclusion • Weighted bidirectional feature network and customized compound scaling method are proposed. • These are for improving both accuracy and efficiency. • EfficientDet-D7 achieves state-of-the-art 51.0 mAP on COCO dataset with 52M parameters and 326B FLOPs, being 4x smaller and using 9.3x fewer FLOPS yet still more accurate(+0.3% mAP) than the best previous detector. • EfficientDet is also up to 3.2x faster on GPUs and 8.1x faster on CPUs.