23. Observations
● Model train on COCOPerson can not perform well on real scenario (Not confirmed)
● COCOPerson contains some not reasonable annotation
● WilderPerson dataset is too noisy to use directly
● Full box is hard; visible box may cause higher fp rate
● CrowdHuman is hard but it aims to conquer crowdedness problem
28. Drawbacks of anchor box
● Large #of anchors (SSD 40k, Retinanet 100k)
○ faster-rcnn low proposal still performs good
● Introduce extra hyperparameters
● May fail when mult-scale senario
● Imbalance between positive & negative anchors
38. CenterNet: Keypoint Triplets
Problem of CornerNet
● Sensitive due to edge (top 100)
● High false positive rate
Improvement
● Correct prediction by checking the
central parts
Object as a keypoint triplet
46. FCOS: Improvements
● 1x and 2x mean the model is
trained for 90K and 180K
iterations, respectively.
● center means center sample is
used in our training.
● liou means the model use linear
iou loss function. (1 - iou)
● giou means the use giou loss
function. (1 - giou)
47. Objects as Points (+2 vals)
● Simple method
○ One feature map that represents all scales
○ No bounding box matching
○ No non maximum suppression
● Better speed-accuracy trade-off
48. Objects as Points: “The true CenterNet”
Hourglass
● Use DCNv2 instead Conv
● Heatmap supports 2D, 3D, pose
estimation
49. Objects as Points: “The true CenterNet”
● Pixel-wise regression with focal loss
● Not normalize scale map
● Size reg. constant 0.1
● L1 loss (rather Smooth L1) on offset loss
● Training longer performs better (140 to 230)
50. CSP: Center & Scale Prediction
Prediction
● Center (Heatmap)
● Scale (Height)
Fix aspect ratio @0.41
(according to dataset)
Object as a point + 1 scalar
53. CSID: Center, Scale, Identity and Density aware
ID-Map learns two measures simultaneously
● Density of predicted center
● Identity of predicted center
56. Comparison
Algorithm CornerNet Triplet FCOS CenterNet CSP CSID
#of points 2 3 1 1 1 1, 1
Scale Backbone Backbone FPN Backbone FPN Backbone
Grouping
method
Corner Pool
Loss
Center Pool
Corner Pool
Loss
- - - ID Loss
Density loss
Key feature Pool
Embedding
Pool Centerness Simple Const.
aspect ratio
ID Map
Post-processing NMS Soft-NMS NMS - NMS ID-NMS
62. Conclusions
● Crowdedness is the major obstacle in person detection
● Anchor-free detector seems flexible & extensible to object task
● Center-based method + post-processing + specialized loss
○ CSID
○ CenterNet + A-NMS + RepLoss
● Trade-off between backbone & scaling level
○ ConvNet + FPN
○ DLA
● Still a challenging topic
63. Paper Lists: Person Detection
● CityPersons: A Diverse Dataset for Pedestrian Detection
● WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild
● CrowdHuman: A Benchmark for Detecting Human in a Crowd
● CenterNet: Keypoint Triplets for Object Detection
● Objects as Points
● FoveaBox: Beyond Anchor-based Object Detector
● Feature Selective Anchor-Free Module for Single-Shot Object Detection
● FCOS: Fully Convolutional One-Stage Object Detection
● Center and Scale Prediction: A Box-free Approach for Object Detection
● Bottom-up Object Detection by Grouping Extreme and Center Points
● CSID: Center, Scale, Identity and Density-aware Pedestrian Detection in a Crowd
● Repulsion Loss: Detecting Pedestrians in a Crowd
● Adaptive NMS: Refining Pedestrian Detection in a Crowd
● Discriminative Feature Transformation for Occluded Pedestrian Detection
● PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes
● Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd
● Double Anchor R-CNN for Human Detection in a Crowd