SlideShare a Scribd company logo
1 of 22
DEEP LEARNING JP
[DL Seminar]
EfficientDet: Scalable and Efficient Object Detection
Hiromi Nakagawa ACES, Inc.
https://deeplearning.jp
• Mingxing Tan, Ruoming Pang, Quoc V. Le(Google Research, Brain Team)
– EfficientNet の著者チーム
– Submitted to arXiv on 2019/11/20
• 物体検出でEfficientNetする
– Weighted Bi-directional Feature Pyramid Network (BiFPN):
マルチスケールの特徴を効率的に抽出
– Compound Scaling:
resolution, depth, widthを一つの変数でスケール
• COCOで精度/サイズ/速度などでSoTAを更新
– #Params: 4x smaller
– FLOPs: 9.3x fewer
2
Overview
Introduction
• 近年のObject Detectionのモデルは巨大化しがち
– AmoebaNet-based NAS-FPN:167M parameters, 3045B FLOPs(30x more than RetinaNet)
– ロボティクスや自動運転といったReal-worldへのdeployの妨げに
– モデルをEfficientにすることの重要性が高まっている
• 軽量化の傾向もあるが、精度が犠牲になっている
– One-stage, Anchor-free, Compression
• 特定のリソースに最適化するだけでもダメ。いろんなリソース制約に対応できるモデルがほしい
– 3B FLOPs ~ 300B FLOPs ?
4
Introduction
• 高精度と高効率を両立することはできるか?Detectorの設計について体系的に調査
• Challenge 1: Efficient Multi-Scale Feature Fusion
– マルチスケールの特徴を簡潔かつ効果的に抽出する Bidirectional Feature Pyramid Network (BiFPN) を提案
• Challenge 2: Model Scaling
– 入力画像の解像度に加えてネットワークの幅や深さなどをまとめてスケーリングするCompound Scalingを提案
• そもそも強いEfficientNetもBackboneに使う
5
Introduction
Proposed Method
• Multi-scale fusion => aggregate features at different resolutions:𝑃 𝑖𝑛
= (𝑃𝑙1
𝑖𝑛
, … , 𝑃𝑙 𝑛
𝑖𝑛
)
7
BiFPN: Bi-directional Feature Pyramid Network
[Lin+CVPR’17] Feature Pyramid Networks
ex. Faster-RCNN,YOLO
上層の解像度が低くなる
ex. SSD
下層の特徴抽出が不十分
下層も大域特徴(コンテキスト)を
利用でき、解像度も高い
Ref. https://www.slideshare.net/ren4yu/single-shot
• (a) Conventional top-down FPN
– Limited by the one-way information flow
8
BiFPN: Bi-directional Feature Pyramid Network
• (b) PANet
– Adds extra bottom-up path aggregation
network
9
BiFPN: Bi-directional Feature Pyramid Network
• (c) NAS-FPN
– Neural architecture search
– Requires thousands of GPU hours for search
– Irregular network, difficult to interpret or modify
• (e) Simplified PANet
– PANet: Accurate but needs more parameters
and computations
– Remove the nodes whit only 1 input edge
10
BiFPN: Bi-directional Feature Pyramid Network
• (f) BiFPN
– Extra edges from input to output at the same level
– Repeat feature network layer (=bidirectional path)
• Weighted feature fusion:How to fuse multi-scale features?
– Equally sum? → x
– Introduce additional weights, let the network to learn the importance of each input feature
– Unbound fusion:
• 𝑤𝑖:scalar(per-feature), vector(per-channel), tensor(per-pixel)
• scalar is enough but needs bounding for stable training
– Soft-max fusion:
• Slowdown on GPU
– Fast normalized fusion:
• Efficient
11
BiFPN: Bi-directional Feature Pyramid Network
• Backbone: ImageNet pretrained EfficientNet
• Repeat BiFPN Layer
• Class & Box prediction networks share weights across all level of features
12
EfficientDet Architecture
• Use compound coefficient 𝝓 to jointly scale up all dimensions
– Object detection model has much more scaling dimensions than image classification models
13
Compound Scaling
Input size
𝑅𝑖𝑛𝑝𝑢𝑡
#channels
𝑊𝑏𝑖𝑓𝑝𝑛
#layers
𝐷 𝑏𝑖𝑓𝑝𝑛
#layers
𝐷𝑐𝑙𝑎𝑠𝑠
Backbone Network
𝐵0, … , 𝐵6 = 64 ∙ (1.35 𝜙
) = 3 + 𝜙/3
= 2 + 𝜙
= 512 + 𝜙 ∙ 128
Experiments
• Trained with batch size 128 on 32 TPUv3 chips
• COCO2017で精度/パラメータ数/速度などでSoTAを達成
15
Experiments
• Trained with batch size 128 on 32 TPUv3 chips
• COCO2017で精度/パラメータ数/速度などでSoTAを達成
16
Experiments
• Real-world latency:Run 10 times with batch size 1
• GPU( Titan-V ): Up to 3.2x faster
• CPU( Single-thread Xeon ):Up to 8.1x faster
17
Experiments
• Ablation Study
18
Experiments
 EfficientNet BackboneにするだけでもRetinaNetから改善
 FPNをBiFPNにすると更に改善
 BiFPNは他のfeature networksに比べて
高精度かつ少パラメータ/低FLOPs
• Ablation Study
19
Experiments
 Feature fusionをSoftmaxからFast Fusionにすると
ほとんど精度低下せずに30%ほど高速化できる
 Compound Scalingによって個別にスケールを最適化
するより優れたmAP/FLOPsのモデルが得られる
Softmax Fusion Fast Fusion
Conclusion
• 高速・高精度・省計算な物体検出モデルであるEfficientDetを提案
– EfficientNetをBackboneに
– マルチスケールの特徴を効率的に抽出するBiFPNモジュールを提案、複数積み重ねて高次の特徴も抽出
– 共通の変数で解像度/幅/深さを複合的にスケーリングするCompound Scalingによる効率的なパラメータ探索
• COCOデータでSoTAの精度/速度を達成
– 4x smaller and 9.3x fewer FLOPs
– Latency:3.2x faster @GPU、8.1x faster@CPU
21
まとめ
• シンプルな工夫/拡張で精度/速度を改善。そりゃ良くなるよな、という感じ
– NAS-FPNみたいな魔改造感がない
• YOLOv3(arXiv18.04)の某グラフと比べると進展の速さを感じる
• その他
– Efficientだし精度もSoTAを更新した。
より精度を上げるためにEfficientさを捨てるとしたらどの方向?
– 最小解像度が512からの比較。それより小さくなると?
– 他の評価指標(mAPxx)やデータセットでのパフォーマンスは?
– Compound Scalingにおけるヒューリスティック、どれくらいセンシティブ?
– Keypointベースのアプローチと組み合わせるとどんな感じになる?
22
感想
ここらへん?

More Related Content

What's hot

深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -tmtm otm
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説弘毅 露崎
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Preferred Networks
 
[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object Detection[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object DetectionDeep Learning JP
 
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object DetectionDeep Learning JP
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!TransformerArithmer Inc.
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習Deep Learning JP
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化Yusuke Uchida
 
R-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れR-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れKazuki Motohashi
 
CV分野におけるサーベイ方法
CV分野におけるサーベイ方法CV分野におけるサーベイ方法
CV分野におけるサーベイ方法Hirokatsu Kataoka
 
画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないこと画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないことNorishige Fukushima
 
[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with TransformersDeep Learning JP
 
Go-ICP: グローバル最適(Globally optimal) なICPの解説
Go-ICP: グローバル最適(Globally optimal) なICPの解説Go-ICP: グローバル最適(Globally optimal) なICPの解説
Go-ICP: グローバル最適(Globally optimal) なICPの解説Yusuke Sekikawa
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報Deep Learning JP
 
Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)Yamato OKAMOTO
 
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time SeriesDeep Learning JP
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...Deep Learning JP
 

What's hot (20)

深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object Detection[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object Detection
 
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化
 
R-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れR-CNNの原理とここ数年の流れ
R-CNNの原理とここ数年の流れ
 
CV分野におけるサーベイ方法
CV分野におけるサーベイ方法CV分野におけるサーベイ方法
CV分野におけるサーベイ方法
 
画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないこと画像処理ライブラリ OpenCV で 出来ること・出来ないこと
画像処理ライブラリ OpenCV で 出来ること・出来ないこと
 
[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers
 
Go-ICP: グローバル最適(Globally optimal) なICPの解説
Go-ICP: グローバル最適(Globally optimal) なICPの解説Go-ICP: グローバル最適(Globally optimal) なICPの解説
Go-ICP: グローバル最適(Globally optimal) なICPの解説
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
 
Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)
 
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
[DL輪読会]SOM-VAE: Interpretable Discrete Representation Learning on Time Series
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
 

Similar to [DL輪読会]EfficientDet: Scalable and Efficient Object Detection

CVPR 2011 ImageNet Challenge 文献紹介
CVPR 2011 ImageNet Challenge 文献紹介CVPR 2011 ImageNet Challenge 文献紹介
CVPR 2011 ImageNet Challenge 文献紹介Narihira Takuya
 
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Satoshi Kato
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksDeep Learning JP
 
効率的学習 / Efficient Training(メタサーベイ)
効率的学習 / Efficient Training(メタサーベイ)効率的学習 / Efficient Training(メタサーベイ)
効率的学習 / Efficient Training(メタサーベイ)cvpaper. challenge
 
DAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hare
DAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hareDAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hare
DAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也harePreferred Networks
 
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGAAn Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGALeapMind Inc
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイyohei okawa
 
ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)Hideki Okada
 
Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)
Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)
Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)Preferred Networks
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)MasanoriSuganuma
 
2012-03-08 MSS研究会
2012-03-08 MSS研究会2012-03-08 MSS研究会
2012-03-08 MSS研究会Kimikazu Kato
 
[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A Survey[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A SurveyDeep Learning JP
 
【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks
【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks
【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networkscvpaper. challenge
 
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose TrackingDeep Learning JP
 
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded SystemsFastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded Systemsharmonylab
 
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
2値化CNN on FPGAでGPUとガチンコバトル(公開版)2値化CNN on FPGAでGPUとガチンコバトル(公開版)
2値化CNN on FPGAでGPUとガチンコバトル(公開版)Hiroki Nakahara
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus WorkshopHitoshi Sato
 
Image net classification with Deep Convolutional Neural Networks
Image net classification with Deep Convolutional Neural NetworksImage net classification with Deep Convolutional Neural Networks
Image net classification with Deep Convolutional Neural NetworksShingo Horiuchi
 

Similar to [DL輪読会]EfficientDet: Scalable and Efficient Object Detection (20)

CVPR 2011 ImageNet Challenge 文献紹介
CVPR 2011 ImageNet Challenge 文献紹介CVPR 2011 ImageNet Challenge 文献紹介
CVPR 2011 ImageNet Challenge 文献紹介
 
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
効率的学習 / Efficient Training(メタサーベイ)
効率的学習 / Efficient Training(メタサーベイ)効率的学習 / Efficient Training(メタサーベイ)
効率的学習 / Efficient Training(メタサーベイ)
 
DAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hare
DAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hareDAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hare
DAシンポジウム2019招待講演「深層学習モデルの高速なTraining/InferenceのためのHW/SW技術」 金子紘也hare
 
研究を加速するChainerファミリー
研究を加速するChainerファミリー研究を加速するChainerファミリー
研究を加速するChainerファミリー
 
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGAAn Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
 
semantic segmentation サーベイ
semantic segmentation サーベイsemantic segmentation サーベイ
semantic segmentation サーベイ
 
ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)
 
Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)
Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)
Deep learningの発展と化学反応への応用 - 日本化学会第101春季大会(2021)
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)
 
2012-03-08 MSS研究会
2012-03-08 MSS研究会2012-03-08 MSS研究会
2012-03-08 MSS研究会
 
[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A Survey[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A Survey
 
【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks
【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks
【CVPR 2020 メタサーベイ】Efficient Training and Inference Methods for Networks
 
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
 
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded SystemsFastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
 
IEEE/ACM SC2013報告
IEEE/ACM SC2013報告IEEE/ACM SC2013報告
IEEE/ACM SC2013報告
 
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
2値化CNN on FPGAでGPUとガチンコバトル(公開版)2値化CNN on FPGAでGPUとガチンコバトル(公開版)
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
 
MemoryPlus Workshop
MemoryPlus WorkshopMemoryPlus Workshop
MemoryPlus Workshop
 
Image net classification with Deep Convolutional Neural Networks
Image net classification with Deep Convolutional Neural NetworksImage net classification with Deep Convolutional Neural Networks
Image net classification with Deep Convolutional Neural Networks
 

More from Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 

[DL輪読会]EfficientDet: Scalable and Efficient Object Detection

  • 1. DEEP LEARNING JP [DL Seminar] EfficientDet: Scalable and Efficient Object Detection Hiromi Nakagawa ACES, Inc. https://deeplearning.jp
  • 2. • Mingxing Tan, Ruoming Pang, Quoc V. Le(Google Research, Brain Team) – EfficientNet の著者チーム – Submitted to arXiv on 2019/11/20 • 物体検出でEfficientNetする – Weighted Bi-directional Feature Pyramid Network (BiFPN): マルチスケールの特徴を効率的に抽出 – Compound Scaling: resolution, depth, widthを一つの変数でスケール • COCOで精度/サイズ/速度などでSoTAを更新 – #Params: 4x smaller – FLOPs: 9.3x fewer 2 Overview
  • 4. • 近年のObject Detectionのモデルは巨大化しがち – AmoebaNet-based NAS-FPN:167M parameters, 3045B FLOPs(30x more than RetinaNet) – ロボティクスや自動運転といったReal-worldへのdeployの妨げに – モデルをEfficientにすることの重要性が高まっている • 軽量化の傾向もあるが、精度が犠牲になっている – One-stage, Anchor-free, Compression • 特定のリソースに最適化するだけでもダメ。いろんなリソース制約に対応できるモデルがほしい – 3B FLOPs ~ 300B FLOPs ? 4 Introduction
  • 5. • 高精度と高効率を両立することはできるか?Detectorの設計について体系的に調査 • Challenge 1: Efficient Multi-Scale Feature Fusion – マルチスケールの特徴を簡潔かつ効果的に抽出する Bidirectional Feature Pyramid Network (BiFPN) を提案 • Challenge 2: Model Scaling – 入力画像の解像度に加えてネットワークの幅や深さなどをまとめてスケーリングするCompound Scalingを提案 • そもそも強いEfficientNetもBackboneに使う 5 Introduction
  • 7. • Multi-scale fusion => aggregate features at different resolutions:𝑃 𝑖𝑛 = (𝑃𝑙1 𝑖𝑛 , … , 𝑃𝑙 𝑛 𝑖𝑛 ) 7 BiFPN: Bi-directional Feature Pyramid Network [Lin+CVPR’17] Feature Pyramid Networks ex. Faster-RCNN,YOLO 上層の解像度が低くなる ex. SSD 下層の特徴抽出が不十分 下層も大域特徴(コンテキスト)を 利用でき、解像度も高い Ref. https://www.slideshare.net/ren4yu/single-shot
  • 8. • (a) Conventional top-down FPN – Limited by the one-way information flow 8 BiFPN: Bi-directional Feature Pyramid Network
  • 9. • (b) PANet – Adds extra bottom-up path aggregation network 9 BiFPN: Bi-directional Feature Pyramid Network • (c) NAS-FPN – Neural architecture search – Requires thousands of GPU hours for search – Irregular network, difficult to interpret or modify
  • 10. • (e) Simplified PANet – PANet: Accurate but needs more parameters and computations – Remove the nodes whit only 1 input edge 10 BiFPN: Bi-directional Feature Pyramid Network • (f) BiFPN – Extra edges from input to output at the same level – Repeat feature network layer (=bidirectional path)
  • 11. • Weighted feature fusion:How to fuse multi-scale features? – Equally sum? → x – Introduce additional weights, let the network to learn the importance of each input feature – Unbound fusion: • 𝑤𝑖:scalar(per-feature), vector(per-channel), tensor(per-pixel) • scalar is enough but needs bounding for stable training – Soft-max fusion: • Slowdown on GPU – Fast normalized fusion: • Efficient 11 BiFPN: Bi-directional Feature Pyramid Network
  • 12. • Backbone: ImageNet pretrained EfficientNet • Repeat BiFPN Layer • Class & Box prediction networks share weights across all level of features 12 EfficientDet Architecture
  • 13. • Use compound coefficient 𝝓 to jointly scale up all dimensions – Object detection model has much more scaling dimensions than image classification models 13 Compound Scaling Input size 𝑅𝑖𝑛𝑝𝑢𝑡 #channels 𝑊𝑏𝑖𝑓𝑝𝑛 #layers 𝐷 𝑏𝑖𝑓𝑝𝑛 #layers 𝐷𝑐𝑙𝑎𝑠𝑠 Backbone Network 𝐵0, … , 𝐵6 = 64 ∙ (1.35 𝜙 ) = 3 + 𝜙/3 = 2 + 𝜙 = 512 + 𝜙 ∙ 128
  • 15. • Trained with batch size 128 on 32 TPUv3 chips • COCO2017で精度/パラメータ数/速度などでSoTAを達成 15 Experiments
  • 16. • Trained with batch size 128 on 32 TPUv3 chips • COCO2017で精度/パラメータ数/速度などでSoTAを達成 16 Experiments
  • 17. • Real-world latency:Run 10 times with batch size 1 • GPU( Titan-V ): Up to 3.2x faster • CPU( Single-thread Xeon ):Up to 8.1x faster 17 Experiments
  • 18. • Ablation Study 18 Experiments  EfficientNet BackboneにするだけでもRetinaNetから改善  FPNをBiFPNにすると更に改善  BiFPNは他のfeature networksに比べて 高精度かつ少パラメータ/低FLOPs
  • 19. • Ablation Study 19 Experiments  Feature fusionをSoftmaxからFast Fusionにすると ほとんど精度低下せずに30%ほど高速化できる  Compound Scalingによって個別にスケールを最適化 するより優れたmAP/FLOPsのモデルが得られる Softmax Fusion Fast Fusion
  • 21. • 高速・高精度・省計算な物体検出モデルであるEfficientDetを提案 – EfficientNetをBackboneに – マルチスケールの特徴を効率的に抽出するBiFPNモジュールを提案、複数積み重ねて高次の特徴も抽出 – 共通の変数で解像度/幅/深さを複合的にスケーリングするCompound Scalingによる効率的なパラメータ探索 • COCOデータでSoTAの精度/速度を達成 – 4x smaller and 9.3x fewer FLOPs – Latency:3.2x faster @GPU、8.1x faster@CPU 21 まとめ
  • 22. • シンプルな工夫/拡張で精度/速度を改善。そりゃ良くなるよな、という感じ – NAS-FPNみたいな魔改造感がない • YOLOv3(arXiv18.04)の某グラフと比べると進展の速さを感じる • その他 – Efficientだし精度もSoTAを更新した。 より精度を上げるためにEfficientさを捨てるとしたらどの方向? – 最小解像度が512からの比較。それより小さくなると? – 他の評価指標(mAPxx)やデータセットでのパフォーマンスは? – Compound Scalingにおけるヒューリスティック、どれくらいセンシティブ? – Keypointベースのアプローチと組み合わせるとどんな感じになる? 22 感想 ここらへん?