4. What Are Covered
• YOLOv1 review
• YOLOv2 Improvements Over YOLOv1 (Better)
• YOLOv2 Using Darknet-19 (Faster)
• YOLO9000 by WordTree (Stronger)
• Deep Learning on Event Detection for River Surveillance in Taiwan
4
15. YOLOv21.Batch Normalization
15
What : dropout batch normalization
How : adding batch normalization on all of the convolution layers
Why : without overfitting+加速訓練的速度+避免梯度消失
Value : 2% mAP
Batch Normalizationdropout
Sigmoid
derivative
Near zero
18. YOLOv22.Convolution with Anchor Boxes
18
448 × 448 416 × 416
odd number spatial dimension
What : 448 416
How : Shrink the network o operate on 416 input images
Why : Large objects , tend to occupy the center of the image so it’s good to have a single location right
at the center to predict these objects.
22. YOLOv24.Dimensional Clusters
22
Hand picked
K-means
clustering
This indicates that using k-means to generate our bounding box starts the model off with a better
representation and makes the task easier to learn.
The cluster centroids are significantly different than hand-
picked anchor boxes.
There are fewer short, wide boxes and more tall, thin boxes
23. YOLOv25.Direct location prediction
23
unconstrained constrained
In region proposal networks the network predicts values 𝑡 and
𝑡 and the (x, y) center coordinates are calculated as:
For example, a prediction of 𝑡 = 1 would shift the box to the
right by the width of the anchor box, a prediction of 𝑡 = −1
would shift it to the left by the same amount. This formulation is
unconstrained so any anchor box can end up at any point in the
image, regardless of what location predicted the box. With
random initialization the model takes a long time to stabilize to
predicting sensible offsets.
predict location coordinates relative to the location of the grid
cell
24. YOLOv25.Direct location prediction
24
In Yolo v2 anchors (width, height) - are sizes of objects relative to the final feature
map
ANCHORS = [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434,
7.88282, 3.52778, 9.77052, 9.16828]
prior
25. YOLOv25.Direct location prediction
• The final values are based on not coordinates but grid values.
• YOLO default set:
anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
this means the height and width of first anchor is slightly over one grid cell [1.3221, 1.73145] and the
last anchor almost covers the whole image [11.2364, 10.0071] considering the image is 13x13 grid.
25
26. YOLOv25.Direct location prediction
26
What : unconstrained constrained
How : predict location coordinates relative to the location of the grid cell
Why : Using anchor boxes->model instability(especially during early iterations)(predicting the (x,y) for the ox)
Value : 5% mAP
27. YOLOv26.Fine-Grained Features
27
Add passthrough
layer
• Faster R-CNN and SSD both run their proposal networks at various feature
maps in the network o get a range of resolutions.
• Add a pass-through layer that brings features from an earlier layer at 26*26
resolution.
28. YOLOv26.Fine-Grained Features
28
The passthrough layer concatenates the higher resolution features with the
low resolution features by stacking adjacent features into different channels
instead of spatial locations, similar to the identity mappings in ResNet.
資料來源
31. YOLOv27.Multi-Scale Training
31
228
352
416
480544
What : Fixing the input images size Multi-Scale Training
How : Every 10 batches our network randomly chooses a new image dimension size from the multiples of 32:{320,352,…,608}
Why : Be robust to running on images of different sizes.
[Since our model only uses convolutional and pooling layers it can be resized on the fly]
Value : 1% mAP
Fixing the input
images size
Multi-Scale
Training