[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
1. You Only Look Once:
Unified, Real-Time Object Detection (2016)
Taegyun Jeon
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection."
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
2. Slide courtesy of DeepSystem.io “YOLO: You only look once Review”
Evaluation on VOC2007
3. Main Concept
• Object Detection
• Regression problem
• YOLO
• Only One Feedforward
• Global context
• Unified (Real-time detection)
• YOLO: 45 FPS
• Fast YOLO: 155 FPS
• General representation
• Robust on various background
• Other domain
(Real-time system: https://cs.stackexchange.com/questions/56569/what-is-real-time-in-a-computer-vision-context)
4. Object Detection as Regression Problem
• Previous: Repurpose classifiers to perform detection
• Deformable Parts Models (DPM)
• Sliding window
• R-CNN based methods
• 1) generate potential bounding boxes.
• 2) run classifiers on these proposed boxes
• 3) post-processing (refinement, elimination, rescore)
5. Object Detection as Regression Problem
• YOLO: Single Regression Problem
• Image → bounding box coordinate and class probability.
• Extremely Fast
• Global reasoning
• Generalizable representation
6. Unified Detection
• All BBox, All classes
1) Image → S x S grids
2) grid cell
→ B: BBoxes and Confidence score
→ C: class probabilities w.r.t #classes
x, y, w, h, confidence
Appendix: Intersection over Union (IOU)
7. Unified Detection
• Predict one set of class probabilities
per grid cell, regardless of the number
of boxes B.
• At test time,
individual box confidence prediction
94. Appendix | GoogLeNet
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
96. Appendix | Networks on Convolutional Feature Maps
Previous: FC
Proposed : Conv + FC
Ren, Shaoqing, et al. "Object detection networks on convolutional feature maps." IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).
97. Appendix | Sum-squared error (SSE)
sum of squared errors of prediction (SSE), is the sum of the
squares of residuals (deviations predicted from actual empirical values
of data). It is a measure of the discrepancy between the data and an
estimation model. A small RSS indicates a tight fit of the model to the
data. It is used as an optimality criterion in parameter selection and
model selection.
https://en.wikipedia.org/wiki/Residual_sum_of_squares