SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
Recent Object Detection Development
& Person Detection Survey
kv
Outline
- Review Object Detection
- Research Trends: Anchor-free detector
- Person Detection
Review Object Detection
Object Detection
● Is deep learning dominated domain
● Modularized design, reuable
○ Components
○ Pipeline
○ Feature scaling design
Object Detection in 20 Years: A Survey
RCNN
General Object Detector Arch.
Backbone Neck Head
Backbone
Neck
Head
Dense
Head
One-Stage
● YOLO
● SSD
● RetinaNet
Two-Stage
● Faster-RCNN
● TunderNet
Component in Object Detection Pipeline
Backbone (feature extractor)
- ResNet50, ResNeXt, MobileNet
- Hourglass, DLA
Neck (in-net preprocessor)
- RPN
Dense Head
- FPN, BPN, HRPN
Head (task)
- AnchorHead
- retina, ssd
- fcos, ctdet
- BoxHead
Loss function
- CE, BCE
- Focal loss
- L1, Smooth L1
Computation Module
- Deformable Conv (v1, v2)
- GN (Group Normalization)
- SyncBN
- NMS, SoftNMS
- GA (Guided Anchoring)
Two-Stage: Faster RCNN
per ROI computation
per image computation
ResNet
RPN
Softmax
RoIPool
BoxReg
MLP
Scale in Object Detection
Scale in Object Detection
Backbone
● Without scale
○ ConvNet
● With scale
○ DLA
○ Hourglass
○ Modified-ResNet
Backbone Parameters
Backbone name Top1 # of parameters FLOPs/2
ResNet-50 22.28 25,557,032 3,877.95M
DLA-34 25.36 15,742,104 3,071.37M
ResNet-101 21.90 44,549,160 7,597.95M
Hourglass
reference: https://github.com/osmr/imgclsmob/blob/master/pytorch/README.md
Person Detection
Object Detection & Person Detection
Person detection ≈ class-agnostic object detection with crowdness prob.
Object Detection & Person Detection
● Crowdedness & Occlusion
● Scale & fine-grained
● Unusal pose
● Non-person, distractor
● Night scene
● Background distribution (domain shift)
Datasets
COCOPerson
CrowdHuman
Caltech
pedestrian
WiderPerson
WiderPerson19
CUHK Person
dataset #of img #of person density
COCO
Person
64,115 257,252 4.01
CrowdHuman 15,000 339,565 22.64
WiderPerson 9,000 399,786 39.87
CUHK Person 18,184 99,809 5.48
WiderPerson19
sur/ad
8,240/
88,260
58,190/
248,993
7.05/
2.82
Caltech
pedestrian
72,782 13,674 0.32
CityPerson 2,975 19,654 6.61
train, test, benchmark
Dataset: COCOPerson
Dataset: CrowdHuman
Annotations
● Full box
● Visible box
● Head box
Features
● Aim Crowdness issue
Dataset: CrowdHuman
Dataset: WiderPerson
TMM2019 http://www.cbsr.ia.ac.cn/users/jwan/papers/TMM2019-WiderPerson.pdf
Features
● Questionable annotation quality
● Limited scence distribution (by observation)
Annotations
● Full box
● class, tag
Dataset: WiderPerson
TMM2019 http://www.cbsr.ia.ac.cn/users/jwan/papers/TMM2019-WiderPerson.pdf
Features
● More balanced location distribution
Dataset: WiderPerson
Dataset: WiderPerson2019
https://wider-challenge.org/2019.html
Features
● vehicle & surveillance
● low quality but high
resolution images
Observations
COCOPerson
CrowdHuman
Caltech
pedestrian
WiderPerson
General Image
Vehicle
Surveillance
CUHK Person
Market1501
WiderPerson19
Observations
● Model train on COCOPerson can not perform well on real scenario (Not confirmed)
● COCOPerson contains some not reasonable annotation
● WilderPerson dataset is too noisy to use directly
● Full box is hard; visible box may cause higher fp rate
● CrowdHuman is hard but it aims to conquer crowdedness problem
Crowdedness Problem: Repulsion Loss
Attraction
RepGT (Repulsion Term)
RepBox (Repulsion Term)
Crowdedness Problem: Repulsion Loss
Crowdedness Problem: Apative-NMS
Apative-NMS
● Dynamic suppression according to
target density
● Subnetwork to learn density
scores
Crowdedness Problem: Apative-NMS
Drawbacks of anchor box
● Large #of anchors (SSD 40k, Retinanet 100k)
○ faster-rcnn low proposal still performs good
● Introduce extra hyperparameters
● May fail when mult-scale senario
● Imbalance between positive & negative anchors
Recent Trend in Object Detection
Era of anchor-free detector
One-Stage: Fast, Simple
Two-Stage: High Precision
(Recall)
Anchor-Free: Hybrid both
methods
2018
- 8/3 CornerNet (pair)
2019
- 1/23 ExtremeNet (4 pts)
- 4/2 FCOS
- 4/8 FoveaBox
- 4/18 CornerNet-Lite
- 4/19 CenterNet (triplet)
- 4/23 Center and Scale Prediction (CSP)
- 4/25 Objects as Points (CenterNet)
- 10/21 CSID (CSP+ID)
Algo Relations
Anchor-Free
TripletExtremeNet
FCOS
Single
point
CSP
CSID
Multiple
points
CornerNet
CenterNet
CornerNet
Object as paired keypoints
CornerNet
Object as a pair of keypoints (top-left & bottom-right)
Find Corner
Associative
Embedding
Grouping
CornerNet
Corner Pooling
Top-Left
Bottom-Right
Backbone matters:
Hourglass provides 8 AP
than FPN
CornerNet
Corner Pooling
Top-Left
Bottom-Right
● One dimensional embedding
CornerNet: Loss function
● Pixel-wise regression on heatmap with focal loss
● Smooth L1 on offset map
Heatmap OffsetGrouping
CornerNet
CenterNet: Keypoint Triplets
Problem of CornerNet
● Sensitive due to edge (top 100)
● High false positive rate
Improvement
● Correct prediction by checking the
central parts
Object as a keypoint triplet
CenterNet: Keypoint Triplets
Corner Pool
Associative
Embedding
Grouping
Center Pool
CenterNet: Keypoint Triplets
CenterNet: Keypoint Triplets
FCOS
Object as a point + 4d vector ● Balance between postivie &
negative samples
● Ambiguous case ~ 1.4% in COCO
● Hint for center
FCOS
Backbone + PFN + Head (classical arch)
FCOS: Centerness
Important Feature
● Center-ness eliminates ambiguous
samples
● Class score times center-ness score
@NMS
FCOS: Centerness
FCOS: Improvements
● 1x and 2x mean the model is
trained for 90K and 180K
iterations, respectively.
● center means center sample is
used in our training.
● liou means the model use linear
iou loss function. (1 - iou)
● giou means the use giou loss
function. (1 - giou)
Objects as Points (+2 vals)
● Simple method
○ One feature map that represents all scales
○ No bounding box matching
○ No non maximum suppression
● Better speed-accuracy trade-off
Objects as Points: “The true CenterNet”
Hourglass
● Use DCNv2 instead Conv
● Heatmap supports 2D, 3D, pose
estimation
Objects as Points: “The true CenterNet”
● Pixel-wise regression with focal loss
● Not normalize scale map
● Size reg. constant 0.1
● L1 loss (rather Smooth L1) on offset loss
● Training longer performs better (140 to 230)
CSP: Center & Scale Prediction
Prediction
● Center (Heatmap)
● Scale (Height)
Fix aspect ratio @0.41
(according to dataset)
Object as a point + 1 scalar
CSP: Center & Scale Prediction
CSP: Center & Scale Prediction
Why Choose Height?
Why Predict Center?
CSID: Center, Scale, Identity and Density aware
ID-Map learns two measures simultaneously
● Density of predicted center
● Identity of predicted center
CSID: Center, Scale, Identity and Density aware
ID-NMS
Algo Relations
Anchor-Free
TripletExtremeNet
FCOS
Single
point
CSP
CSID
Multiple
points
CornerNet
CenterNet
How points are groupped?
● Pooling
● Associative
embeddings
How ceneter is located?
● Centerness reg.
● Center target
● Domain contraints
Comparison
Algorithm CornerNet Triplet FCOS CenterNet CSP CSID
#of points 2 3 1 1 1 1, 1
Scale Backbone Backbone FPN Backbone FPN Backbone
Grouping
method
Corner Pool
Loss
Center Pool
Corner Pool
Loss
- - - ID Loss
Density loss
Key feature Pool
Embedding
Pool Centerness Simple Const.
aspect ratio
ID Map
Post-processing NMS Soft-NMS NMS - NMS ID-NMS
Benchmarks: COCO
Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl
inference
time
YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps
RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps
CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps
FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 -
FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 -
CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps
CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps
Centernet-Triple
t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps
Centernet-Triple
t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
Benchmarks: COCO
Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl
inference
time
YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps
RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps
CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps
FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 -
FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 -
CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps
CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps
Centernet-Triple
t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps
Centernet-Triple
t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
Benchmarks: COCO
Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl
inference
time
YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps
RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps
CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps
FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 -
FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 -
CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps
CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps
Centernet-Triple
t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps
Centernet-Triple
t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
Benchmarks: CityPerson
Algorithm
Name Backbone Reasonable Heavy Partial Bare inference time
FRCNN VGG-16 15.4 - - - -
OR-CNN VGG-16 12.8 55.7 15.3 6.7 -
RepLoss ResNet-50 13.2 56.9 16.8 7.6 -
CSP ResNet-50 11 49.3 10.4 7.3 3 fps
Adaptive-NMS ResNet-50 10.8 54 11.4 6.2 -
CSID DLA-34 8.8 46.6 8.3 5.8 6.25 fps
Training Frameworks
● Tensorflow Object Detection API
● mmdetection (CUHK)
● simpledet (TuSimple)
● Detectron, Detectron2
Conclusions
● Crowdedness is the major obstacle in person detection
● Anchor-free detector seems flexible & extensible to object task
● Center-based method + post-processing + specialized loss
○ CSID
○ CenterNet + A-NMS + RepLoss
● Trade-off between backbone & scaling level
○ ConvNet + FPN
○ DLA
● Still a challenging topic
Paper Lists: Person Detection
● CityPersons: A Diverse Dataset for Pedestrian Detection
● WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild
● CrowdHuman: A Benchmark for Detecting Human in a Crowd
● CenterNet: Keypoint Triplets for Object Detection
● Objects as Points
● FoveaBox: Beyond Anchor-based Object Detector
● Feature Selective Anchor-Free Module for Single-Shot Object Detection
● FCOS: Fully Convolutional One-Stage Object Detection
● Center and Scale Prediction: A Box-free Approach for Object Detection
● Bottom-up Object Detection by Grouping Extreme and Center Points
● CSID: Center, Scale, Identity and Density-aware Pedestrian Detection in a Crowd
● Repulsion Loss: Detecting Pedestrians in a Crowd
● Adaptive NMS: Refining Pedestrian Detection in a Crowd
● Discriminative Feature Transformation for Occluded Pedestrian Detection
● PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes
● Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd
● Double Anchor R-CNN for Human Detection in a Crowd

Mais conteúdo relacionado

Mais procurados

Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation경훈 김
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
Object Detection with Tensorflow
Object Detection with TensorflowObject Detection with Tensorflow
Object Detection with TensorflowElifTech
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNanna8885
 

Mais procurados (20)

YOLO
YOLOYOLO
YOLO
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Object detection
Object detectionObject detection
Object detection
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Yolo
YoloYolo
Yolo
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
You only look once
You only look onceYou only look once
You only look once
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Object Detection with Tensorflow
Object Detection with TensorflowObject Detection with Tensorflow
Object Detection with Tensorflow
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 

Semelhante a Recent Object Detection Research & Person Detection

customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAShien-Chun Luo
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsKen Chatfield
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...Edge AI and Vision Alliance
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...ssuser9357dd
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...Thom Lane
 

Semelhante a Recent Object Detection Research & Person Detection (20)

customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
Detection
DetectionDetection
Detection
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 

Mais de Kai-Wen Zhao

Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human labelKai-Wen Zhao
 
Deep Double Descent
Deep Double DescentDeep Double Descent
Deep Double DescentKai-Wen Zhao
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOKai-Wen Zhao
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Kai-Wen Zhao
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...Kai-Wen Zhao
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 

Mais de Kai-Wen Zhao (8)

Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
Deep Double Descent
Deep Double DescentDeep Double Descent
Deep Double Descent
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 

Último

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Último (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Recent Object Detection Research & Person Detection

  • 1. Recent Object Detection Development & Person Detection Survey kv
  • 2. Outline - Review Object Detection - Research Trends: Anchor-free detector - Person Detection
  • 4. Object Detection ● Is deep learning dominated domain ● Modularized design, reuable ○ Components ○ Pipeline ○ Feature scaling design Object Detection in 20 Years: A Survey RCNN
  • 5. General Object Detector Arch. Backbone Neck Head Backbone Neck Head Dense Head One-Stage ● YOLO ● SSD ● RetinaNet Two-Stage ● Faster-RCNN ● TunderNet
  • 6. Component in Object Detection Pipeline Backbone (feature extractor) - ResNet50, ResNeXt, MobileNet - Hourglass, DLA Neck (in-net preprocessor) - RPN Dense Head - FPN, BPN, HRPN Head (task) - AnchorHead - retina, ssd - fcos, ctdet - BoxHead Loss function - CE, BCE - Focal loss - L1, Smooth L1 Computation Module - Deformable Conv (v1, v2) - GN (Group Normalization) - SyncBN - NMS, SoftNMS - GA (Guided Anchoring)
  • 7. Two-Stage: Faster RCNN per ROI computation per image computation ResNet RPN Softmax RoIPool BoxReg MLP
  • 8. Scale in Object Detection
  • 9. Scale in Object Detection Backbone ● Without scale ○ ConvNet ● With scale ○ DLA ○ Hourglass ○ Modified-ResNet
  • 10. Backbone Parameters Backbone name Top1 # of parameters FLOPs/2 ResNet-50 22.28 25,557,032 3,877.95M DLA-34 25.36 15,742,104 3,071.37M ResNet-101 21.90 44,549,160 7,597.95M Hourglass reference: https://github.com/osmr/imgclsmob/blob/master/pytorch/README.md
  • 12. Object Detection & Person Detection Person detection ≈ class-agnostic object detection with crowdness prob.
  • 13. Object Detection & Person Detection ● Crowdedness & Occlusion ● Scale & fine-grained ● Unusal pose ● Non-person, distractor ● Night scene ● Background distribution (domain shift)
  • 14. Datasets COCOPerson CrowdHuman Caltech pedestrian WiderPerson WiderPerson19 CUHK Person dataset #of img #of person density COCO Person 64,115 257,252 4.01 CrowdHuman 15,000 339,565 22.64 WiderPerson 9,000 399,786 39.87 CUHK Person 18,184 99,809 5.48 WiderPerson19 sur/ad 8,240/ 88,260 58,190/ 248,993 7.05/ 2.82 Caltech pedestrian 72,782 13,674 0.32 CityPerson 2,975 19,654 6.61 train, test, benchmark
  • 16. Dataset: CrowdHuman Annotations ● Full box ● Visible box ● Head box Features ● Aim Crowdness issue
  • 18. Dataset: WiderPerson TMM2019 http://www.cbsr.ia.ac.cn/users/jwan/papers/TMM2019-WiderPerson.pdf Features ● Questionable annotation quality ● Limited scence distribution (by observation) Annotations ● Full box ● class, tag
  • 21. Dataset: WiderPerson2019 https://wider-challenge.org/2019.html Features ● vehicle & surveillance ● low quality but high resolution images
  • 23. Observations ● Model train on COCOPerson can not perform well on real scenario (Not confirmed) ● COCOPerson contains some not reasonable annotation ● WilderPerson dataset is too noisy to use directly ● Full box is hard; visible box may cause higher fp rate ● CrowdHuman is hard but it aims to conquer crowdedness problem
  • 24. Crowdedness Problem: Repulsion Loss Attraction RepGT (Repulsion Term) RepBox (Repulsion Term)
  • 26. Crowdedness Problem: Apative-NMS Apative-NMS ● Dynamic suppression according to target density ● Subnetwork to learn density scores
  • 28. Drawbacks of anchor box ● Large #of anchors (SSD 40k, Retinanet 100k) ○ faster-rcnn low proposal still performs good ● Introduce extra hyperparameters ● May fail when mult-scale senario ● Imbalance between positive & negative anchors
  • 29. Recent Trend in Object Detection
  • 30. Era of anchor-free detector One-Stage: Fast, Simple Two-Stage: High Precision (Recall) Anchor-Free: Hybrid both methods 2018 - 8/3 CornerNet (pair) 2019 - 1/23 ExtremeNet (4 pts) - 4/2 FCOS - 4/8 FoveaBox - 4/18 CornerNet-Lite - 4/19 CenterNet (triplet) - 4/23 Center and Scale Prediction (CSP) - 4/25 Objects as Points (CenterNet) - 10/21 CSID (CSP+ID)
  • 33. CornerNet Object as a pair of keypoints (top-left & bottom-right) Find Corner Associative Embedding Grouping
  • 36. CornerNet: Loss function ● Pixel-wise regression on heatmap with focal loss ● Smooth L1 on offset map Heatmap OffsetGrouping
  • 38. CenterNet: Keypoint Triplets Problem of CornerNet ● Sensitive due to edge (top 100) ● High false positive rate Improvement ● Correct prediction by checking the central parts Object as a keypoint triplet
  • 39. CenterNet: Keypoint Triplets Corner Pool Associative Embedding Grouping Center Pool
  • 42. FCOS Object as a point + 4d vector ● Balance between postivie & negative samples ● Ambiguous case ~ 1.4% in COCO ● Hint for center
  • 43. FCOS Backbone + PFN + Head (classical arch)
  • 44. FCOS: Centerness Important Feature ● Center-ness eliminates ambiguous samples ● Class score times center-ness score @NMS
  • 46. FCOS: Improvements ● 1x and 2x mean the model is trained for 90K and 180K iterations, respectively. ● center means center sample is used in our training. ● liou means the model use linear iou loss function. (1 - iou) ● giou means the use giou loss function. (1 - giou)
  • 47. Objects as Points (+2 vals) ● Simple method ○ One feature map that represents all scales ○ No bounding box matching ○ No non maximum suppression ● Better speed-accuracy trade-off
  • 48. Objects as Points: “The true CenterNet” Hourglass ● Use DCNv2 instead Conv ● Heatmap supports 2D, 3D, pose estimation
  • 49. Objects as Points: “The true CenterNet” ● Pixel-wise regression with focal loss ● Not normalize scale map ● Size reg. constant 0.1 ● L1 loss (rather Smooth L1) on offset loss ● Training longer performs better (140 to 230)
  • 50. CSP: Center & Scale Prediction Prediction ● Center (Heatmap) ● Scale (Height) Fix aspect ratio @0.41 (according to dataset) Object as a point + 1 scalar
  • 51. CSP: Center & Scale Prediction
  • 52. CSP: Center & Scale Prediction Why Choose Height? Why Predict Center?
  • 53. CSID: Center, Scale, Identity and Density aware ID-Map learns two measures simultaneously ● Density of predicted center ● Identity of predicted center
  • 54. CSID: Center, Scale, Identity and Density aware ID-NMS
  • 55. Algo Relations Anchor-Free TripletExtremeNet FCOS Single point CSP CSID Multiple points CornerNet CenterNet How points are groupped? ● Pooling ● Associative embeddings How ceneter is located? ● Centerness reg. ● Center target ● Domain contraints
  • 56. Comparison Algorithm CornerNet Triplet FCOS CenterNet CSP CSID #of points 2 3 1 1 1 1, 1 Scale Backbone Backbone FPN Backbone FPN Backbone Grouping method Corner Pool Loss Center Pool Corner Pool Loss - - - ID Loss Density loss Key feature Pool Embedding Pool Centerness Simple Const. aspect ratio ID Map Post-processing NMS Soft-NMS NMS - NMS ID-NMS
  • 57. Benchmarks: COCO Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl inference time YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 - FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 - CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps Centernet-Triple t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps Centernet-Triple t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
  • 58. Benchmarks: COCO Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl inference time YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 - FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 - CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps Centernet-Triple t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps Centernet-Triple t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
  • 59. Benchmarks: COCO Algorithm Backbone AP AP@0.50 AP@0.75 APs APm APl inference time YOLOv3 DarkNet-53 33 57 34.4 18.3 25.4 41.9 20 fps RetinaNet ResNeXt-101-FPN 40.8 61.1 44.1 24.1 44.2 51.2 5.4 fps CornerNet Hourglass-104 40.5 56.5 43.1 19.4 42.7 53.9 4.1 fps FCOS ResNet-101-FPN 41.5 60.7 45 24.4 44.8 51.6 - FCOS + imp ResNeXt-64x4d-101-FPN 44.7 64.1 48.4 27.6 47.5 55.6 - CenterNet DLA-34 39.2 57.1 42.8 19.9 43 51.4 28 fps CenterNet Hourglass-104 42.1 61.1 45.9 24.1 45.5 52.8 7.8 fps Centernet-Triple t Hourglass-52 41.6 59.4 44.2 22.5 43.1 54.1 3.7 fps Centernet-Triple t Hourglass-104 44.9 62.4 48.1 25.6 47.4 57.4 2.9 fps
  • 60. Benchmarks: CityPerson Algorithm Name Backbone Reasonable Heavy Partial Bare inference time FRCNN VGG-16 15.4 - - - - OR-CNN VGG-16 12.8 55.7 15.3 6.7 - RepLoss ResNet-50 13.2 56.9 16.8 7.6 - CSP ResNet-50 11 49.3 10.4 7.3 3 fps Adaptive-NMS ResNet-50 10.8 54 11.4 6.2 - CSID DLA-34 8.8 46.6 8.3 5.8 6.25 fps
  • 61. Training Frameworks ● Tensorflow Object Detection API ● mmdetection (CUHK) ● simpledet (TuSimple) ● Detectron, Detectron2
  • 62. Conclusions ● Crowdedness is the major obstacle in person detection ● Anchor-free detector seems flexible & extensible to object task ● Center-based method + post-processing + specialized loss ○ CSID ○ CenterNet + A-NMS + RepLoss ● Trade-off between backbone & scaling level ○ ConvNet + FPN ○ DLA ● Still a challenging topic
  • 63. Paper Lists: Person Detection ● CityPersons: A Diverse Dataset for Pedestrian Detection ● WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild ● CrowdHuman: A Benchmark for Detecting Human in a Crowd ● CenterNet: Keypoint Triplets for Object Detection ● Objects as Points ● FoveaBox: Beyond Anchor-based Object Detector ● Feature Selective Anchor-Free Module for Single-Shot Object Detection ● FCOS: Fully Convolutional One-Stage Object Detection ● Center and Scale Prediction: A Box-free Approach for Object Detection ● Bottom-up Object Detection by Grouping Extreme and Center Points ● CSID: Center, Scale, Identity and Density-aware Pedestrian Detection in a Crowd ● Repulsion Loss: Detecting Pedestrians in a Crowd ● Adaptive NMS: Refining Pedestrian Detection in a Crowd ● Discriminative Feature Transformation for Occluded Pedestrian Detection ● PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes ● Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd ● Double Anchor R-CNN for Human Detection in a Crowd