SlideShare a Scribd company logo
1 of 38
Download to read offline
A Survey of
Deep Learning-Based
Object Detection
Jiao, Licheng and Zhang, Fan and Liu, Fang and Yang,
Shuyuan and Li, Lingling and Feng, Zhixi and Qu, Rong
IEEE Access, 2019
,
2022/06/17
◼
◼
• two-stage
• one-stage
• 2019
◼
◼
◼
◼
◼
•
◼
•
•
•
•
•
•
VisDrone2018
[Shindai+, ICRA 2019]
[Chen+, CVPR2018]
two-stage one-stage
◼two-stage
• Faster R-CNN [Ren+, NeurIPS2015]
◼one-stage
◼one-stage
• YOLO [Redmon+, CVPR2016]
• SSD [Liu+, ECCV2016]
◼two-stage
BBox
two-stage one-stage
end-to-end
two-stage
R-CNN Fast R-CNN
◼R-CNN [Girshick+, CVPR2014]
• CNN
• SVM
•
• CNN
•
◼Fast R-CNN [Girshick, ICCV2015]
•
• RoI region of interest pooling
• region proposal
R-CNN
◼Faster R-CNN [Ren+, NeurIPS2015]
• RPN region proposal network multi-scale anchors
Fast R-CNN
•
◼Mask R-CNN [He+, ICCV2017]
• ResNet [He+, CVPR2016] -FPN
[Lin+, CVPR2017]
• RoI pooling RoIAlign
• 1
◼Cascade R-CNN
[Cai and Vasconcelos, CVPR2018]
• IoU
RoIAlign
one-stage
SSD Single Shot Detection
◼ DBox
• BBox NMS
Localization, confidence
38
38
19
19
19
19
10
10
5
5
3
3
1
1
Non-
maximum
suppression
conv
conv
conv
conv
conv
conv
300 300
[Liu+, ECCV2016]
NMS Non-maximum Suppression
◼BBox
• confidence score BBox
• BBox IoU
confidence score BBox
Non-
maximum
suppression
BBox
[Liu+, ECCV2016]
𝐼𝑜𝑈 =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑂𝑣𝑒𝑟𝑙𝑎𝑝
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛
one-stage
◼Feature Pyramid Networks
• RetinaNet [Lin+, ICCV2017]
• Focal Loss
• M2Det [Zhao+, AAAI2018]
• Multi-Level FPN
◼RefineDet [Zhang+, CVPR2018]
• one-stage two-stage
RetinaNet RefineDet
2019
Relatioal Networks [Hu+, CVPR2018]
◼SSD NMS BBox
•
◼object relation module
•
•
• end to end BBox object relation module
DCNv2 [Zhu+, CVPR2019]
◼DCN [Dai+, ICCV2017]
• receptive field
◼Modulated deformable convolution
• Modulation deformable RoI pooling
standard convolution deformable convolution
3 3
NAS-FPN [Ghiasi+, CVPR2019]
◼NAS Neural Architecture Search FPN
• RNN Controller
(b)-(f)
NAS-FPN / Proxy task AP
1
DCN
DCNv2
one-stage
2
one-stage
◼PASCAL VOC
◼COCO
• COCO mAP
◼ImageNet
◼VisDrone 2018
◼Open Images
◼Pedestrian detection datasets
• Caltech
• KITTI
• CityPersons
• TDC
• EuroCity Persons
AP mAP COCO mAP
◼Precision Recall IoU 0.5
• Precision =
BBox(IoU≥0.5)
BBox (all)
• Recall =
BBox(IoU≥0.5)
Gt BBox (all)
◼AP Average Precision
• AP = ‫׬‬
0
1
p r dr
• Recall vs Precision AP
•
◼mAP
• AP
• COCO IoU = [0.5, 0.55, … , 0.95] mAP
BBox / BBox
BBox / BBox
◼FPN
• MASK R-CNN, NAS-FPN, FCOS [Tian+, ICCV2019]
◼SSD
• WeaveNet [Chen+, arXiv2017] ESSD [Zheng+, arXiv2018]
◼
• RefineDet, R-DAD [Bae, AAAI2019]
◼
• Attention mechanism [Zhang & Kim, CVPR2019]
• SSD [Kong+, ECCV2018]
◼
• DCN DCNv2 15
loss
◼IoU loss
• Unit Box [Yu+, ACM MM 2016]
◼ BBox regression loss
• BBox
[He+, CVPR2019]
• Softer-NMS [He+, arXiv2019]
◼
• Axially Localized Detection
[Cabriel+, nature
communicaitions2019]
◼one-stage
• Hard negative mining
[Bucher+, arXiv2016]
◼ Hard mining
• IoU-balanced sampling
[Pang+, CVPR2019]
◼loss
• RetinaNet
• AP-loss
[Chen+, CVPR2019]
NMS
◼NMS
• Relation Networks 14
◼ BBox Gt BBox IoU
• IoU-Net learning [Jiang+, ECCV2018]
◼IoU Confidence score
• Fitness NMS [Tychsen-Smith & Petersson, CVPR2018]
◼NMS
• Softer-NMS [He+, arXiv2019]
1
◼
•
◼SSD
• [Jeong+, arXiv2017]
• Context-Aware SSD
[Xiang+, arXiv2018]
◼GAN [Goodfellow, NeurIPS2014]
• Perceptual GAN [Li+, CVPR2017]
◼
◼
• Face Attention Network
[Wang+, arXiv2017]
◼
• Reputation loss
[Wang+, IEEE Access 2018]
• Occlusion-aware R-CNN
[Zhang+, ECCV2018]
2
◼
•
•
• anchor BBox
Faster R-CNN
SSD
anchor-free
◼anchor
• anchor
• anchor
•
◼anchor-free
• CornerNet [Law and Deng, ECCV2018]
• FCOS [Tian+, ICCV2019]
•
• CenterNet [Duan+, ICCV2019]
◼
• YOLO YOLO9000 [Redmon & Farhadi, CVPR2017]
• WeaveNet [Chen+, arXiv2017] ESSD [Zheng+, arXiv2018]
• Pelee [Wang+, NeurIPS2018]
◼
• RetinaNet 12
• RFBNet [Liu+, ECCV2018]
• pRF
RFBNet RFB module
◼
• ScrachDet [Zhu+, CVPR2019]
•
◼
• DetNet [Li+, ECCV2018]
•
• Light-Head R-CNN [Li+, arXiv2017]
• two-stage
◼
[Hu+, CVPR2017]
◼
[Braun, arXiv2018]
1
2
3
4
◼
•
ISPRS dataset [Audebert+, MDPI 2017]
True positive false positive Grand truth
◼
[Li+, arXiv2017]
◼
[Li+, arXiv2019]
1
2
3 CAM[Zhou+, arXiv2015]
4 ablation study
◼
1 3 RetinaNet
2 3
SKU-110K
[Goldman+, CVPR2019]
RetinaNet
◼
•
◼
•
• NMS
• confidence
◼

More Related Content

More from Toru Tamaki

More from Toru Tamaki (20)

論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 
論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions論文紹介:Vision Transformer Adapter for Dense Predictions
論文紹介:Vision Transformer Adapter for Dense Predictions
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
 
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
論文紹介:Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning
 
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
論文紹介:ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models
 
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
論文紹介:Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
 
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
論文紹介:Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
 
論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action Recognition論文紹介:Video Test-Time Adaptation for Action Recognition
論文紹介:Video Test-Time Adaptation for Action Recognition
 

Recently uploaded

Recently uploaded (9)

新人研修 後半 2024/04/26の勉強会で発表されたものです。
新人研修 後半        2024/04/26の勉強会で発表されたものです。新人研修 後半        2024/04/26の勉強会で発表されたものです。
新人研修 後半 2024/04/26の勉強会で発表されたものです。
 
Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。
 
論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
 
Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。
 
Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)
Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)
Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)
 
知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx
知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx
知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx
 
LoRaWAN スマート距離検出デバイスDS20L日本語マニュアル
LoRaWAN スマート距離検出デバイスDS20L日本語マニュアルLoRaWAN スマート距離検出デバイスDS20L日本語マニュアル
LoRaWAN スマート距離検出デバイスDS20L日本語マニュアル
 
LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス
LoRaWANスマート距離検出センサー  DS20L  カタログ  LiDARデバイスLoRaWANスマート距離検出センサー  DS20L  カタログ  LiDARデバイス
LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス
 
Utilizing Ballerina for Cloud Native Integrations
Utilizing Ballerina for Cloud Native IntegrationsUtilizing Ballerina for Cloud Native Integrations
Utilizing Ballerina for Cloud Native Integrations
 

文献紹介:A Survey of Deep Learning-Based Object Detection