SlideShare uma empresa Scribd logo
1 de 149
Baixar para ler offline
• ImageNet, GPU
2
3
Deeper - more layers
Go deeper for boosting the capacity
deeper
8
4
19
5
22
6
T
+
H
xx
1-T
T
+
H
xx
1-T
T
+
H
xx
1-T
T
+
H
xx
1-T
100+
7
152
8
9
Session 1: Interleaved Group Convolutions
for efficient and wider CNNs
11
Wider - more channels
Deeper - more layers
Go wider by eliminating redundancy in convolutions
→ widerdeeper
• Convolution operations
• Low-precision kernels
• Low-rank kernels
• Composition from low-rank kernels
• Sparse kernels
• Composition from sparse kernels
12
14
15
16
17
• Convolution operations
• Low-precision kernels
• Low-rank kernels
• Composition from low-rank kernels
• Sparse kernels
• Composition from sparse kernels
18
• Integer
• Quantization
19
• Binarization
• Quantization
20
• Binarization
• Integer
21
• Convolution operations
• Low-precision kernels
• Low-rank kernels
• Composition from low-rank kernels
• Sparse kernels
• Composition from sparse kernels
22
• Filter pruning
• Channel pruning
23
• Filter pruning
• Channel pruning
24
• Filter pruning
• Channel pruning
25
• Filter pruning
• Channel pruning
26
• Convolution operations
• Low-precision kernels
• Low-rank kernels
• Composition from low-rank kernels
• Sparse kernels
• Composition from sparse kernels
27
28
• Convolution operations
• Low-precision kernels
• Low-rank kernels
• Composition from low-rank kernels
• Sparse kernels
• Composition from sparse kernels
29
• Non-structured sparse
• Structured sparse
30
• Non-structured sparse
• Structured sparse
31
• Convolution operations
• Low-precision kernels
• Low-rank kernels
• Composition from low-rank kernels
• Sparse kernels
• Composition from sparse kernels
32
[1] Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang: Interleaved Group Convolutions. ICCV 2017: 4383-4392
[2] Guotian Xie, Jingdong Wang, Ting Zhang, Jianhuang Lai, Richang Hong, and Guo-JunQi. IGCV2: Interleaved Structured Sparse Convolutional Neural Networks. CVPR 2018.
[3] Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks. BMVC 2018. 33
IGCV1: Interleaved group convolutions for large models
Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang: Interleaved Group Convolutions, ICCV 2017.
34
Conv
Channel
35
Split Concat.
Conv.
Conv.
36
Split Concat.
Conv.
Conv.
Primary group convolution
Split
Conv.
Conv.
Conv.
Concat.
Secondary group 1 × 1 convolution
L=2 primary partitions
3 channels in each partition
M=3 secondary partitions
2 channels in each partition
37
The path connecting each input channel with each output channel
Split Concat.
Conv.
Conv.
Split
Conv.
Conv.
Conv.
Concat.
38
Split Concat.
Conv.
Conv.
Split
Conv.
Conv.
Conv.
Concat.
x′ = PW 𝑑
P 𝑇
W 𝑝
x
x
W 𝑝
P 𝑇
W 𝑑
P
x′
39
x′ = PW 𝑑
P 𝑇
W 𝑝
x
dense
40
Advantages: Wider than regular convolutions
Thus, our IGC is wider except 𝐿 = 1 under the same #parameters
41
depth RegConv-18 IGC
20 92.55 ± 0.14 92.84 ± 0.26
38 91.57 ± 0.09 92.24 ± 0.62
62 88.60 ± 0.49 90.03 ± 0.85
depth RegConv-18 IGC
20 0.34 0.15
38 0.71 0.31
62 1.20 0.52
depth RegConv-18 IGC
20 0.51 0.29
38 1.1 0.57
62 1.7 0.95
+1.43
42
depth RegConv-18 IGC
20 68.71 ± 0.32 70.54 ± 0.26
38 65.00 ± 0.57 69.56 ± 0.76
62 58.52 ± 2.31 65.84 ± 0.75
depth RegConv-18 IGC
20 0.34 0.15
38 0.71 0.31
62 1.20 0.52
depth RegConv-18 IGC
20 0.51 0.29
38 1.1 0.57
62 1.7 0.95
+7.32
43
#params FLOPS
Training error Validation error
Top-1 Top-5 Top-1 Top-5
ResNet (Reg. Conv.) 1.133 2.1 21.43 5.96 30.58 10.77
Our approach 0.861 1.3 13.93 2.75 26.95 8.92
+3.63 +1.85
44
• Primary group convolution
• Secondary group convolution
Regular convolutions are interleaved group convolutions
Primary
Secondary
W11 W12 W21 W22
+ +
W
45
Split Concat. Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
47
23 32 39 50 54 72 84 80 64 44 62 88 124 154 184 196 205 192 170 128width
M=2: Higher accuracy than Xception
Best accuracy
48
Xception Our approach
testing error 36.93 ± 0.54 36.11 ± 0.62
#params 3.62 × 104
3.80 × 104
FLOPS 3.05 × 107
3.07 × 107
Xception Our approach
testing error 32.87 ± 0.67 31.87 ± 0.58
#params 1.21 × 105
1.26 × 105
FLOPS 1.11 × 108 1.12 × 108
−0.82
−1.00
49
Split Conv.
Conv.
Conv.
Conv.
Conv.
Concat Split Conv.
Conv.
Conv.
Concat
50
51
IGCV2: Interleaved Structured Sparse
Convolutional Neural Networks
Guotian Xie, Jingdong Wang, Ting Zhang, Jianhuang Lai, Richang Hong, and Guo-JunQi. IGCV2: Interleaved Structured Sparse
Convolutional Neural Networks. CVPR 2018.
53
small
54
Split
Conv.
Conv.
Conv.
Conv.
Split
Conv.
Conv.
ConcatConcat
55
Split
Conv.
Conv.
Conv.
Conv.
Split
Conv.
Conv.
ConcatConcat
IGCV1
IGCV1
56
Split
Conv.
Conv.
Conv.
Conv.
Split
Conv.
Conv.
Split
Conv.
Conv.
Split Concat
Concat
Concat
Concat
Conv.
IGCV1
57
Split
Conv.
Conv.
Conv.
Conv.
Split
Conv.
Conv.
Split
Conv.
Conv.
Split Concat
Conv.
Conv.
Split
Conv.
Conv.
Split Concat
Concat
Concat
Concat
Concat
58
Split
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
Conv.
ConcatSplitConcat SplitConcat
𝐿 (e.g., 3 or >3) group convolutionsx′ = P3W3P2W2P1W1x
W3W1
P1
W2
P2 P3x
Dense due to complementary condition 59
Design:
Balance condition minimum total #parameters, three conditions hold:
1) #parameters for (𝐿 − 1) group convolutions are the same
2) #branches for (𝐿 − 1) group convolutions are the same
3) #channels in each branch are the same
How to design 𝐿 group convolutions
#channels in each branch: 𝐾2 = ⋯ = 𝐾𝐿 = 𝐶
1
𝐿−1 = 𝐾
width
IGC block: 1 channel-wise 3 × 3 convolution + (𝐿 − 1) group 1 × 1 convolutions
61
Meet complementary and balance conditions
62
#parameters When balance condition holds
3 × 3 1 × 1
How many group convolutions?
𝐿 = log 𝐶 + 1 so that 𝑄 is minimum
Benefit: Further redundancy reduction by composing more (𝐿 > 2) sparse matrices
𝑄 𝐿 = 3 < 𝑄 𝐿 = 2 if 𝐶 > 4
64
65
Depth IGCV1 IGCV2
8 66.52 ± 0.12 67.65 ± 0.29
20 70.87 ± 0.39 72.63 ± 0.07
26 71.82 ± 0.40 73.49 ± 0.34
Depth IGCV1 IGCV2
8 0.046 0.046
20 0.151 0.144
26 0.203 0.193
Depth IGCV1 IGCV2
8 1.00 1.18
20 2.89 3.20
26 3.83 4.21
+1.67
66
Depth IGCV1 IGCV2
8 48.57 ± 0.53 51.49 ± 0.33
20 54.83 ± 0.27 56.40 ± 0.17
26 56.64 ± 0.15 57.12 ± 0.09
Depth IGCV1 IGCV2
8 0.046 0.046
20 0.151 0.144
26 0.203 0.193
Depth IGCV1 IGCV2
8 1.00 1.18
20 2.89 3.20
26 3.83 4.21
+0.48
67
• Complementary condition
• Strict → loose
small
69
Strict
one and only one
70
Strict
is one and only one path
˄
Loose
are more paths
rich
71
Strict
channels lie in different
branches and
super-channels
Loose
72
small
IGCV2
73
Model #Params. (M) FLOPS (M) Accuracy (%)
MobileNetV1-1.0 4.2 569 70.6
IGCV2-1.0 4.1 564 70.7
MobileNetV1-0.5 1.3 149 63.7
IGCV2-0.5 1.3 156 65.5
MobileNetV1-0.25 0.5 41 50.6
IGCV2-0.25 0.5 46 54.9
75
IGCV3: Interleaved Low-Rank Group Convolutions
Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural
Networks. BMVC 2018.
76
• Low-rank structured sparse matrix composition
• Structured sparse + low-rank
small
77
1 × 1 Gconv. 3 × 3 Cconv. 1 × 1 Gconv.
Channel
Gconv
3 × 3 Cconv
Gcon
78
Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural
Networks. BMVC 2018.
small
IGCV3
79
81
82
83
• Small
• Fast
• High
Summary
84
87
MobileNet IGC ShuffleNet
v1 v1/v2 ≈ v1
Block Depthwise + pointwise 2/2+ group conv. 2 group conv.
Relation Complementary Complementary Shuffled
Experiment Strong Weak Strong
Insight Weak Strong Weak
v2 v3 v2
Block Depthwise + low rank pointwise 2 group + low-rank pointwise
Relation Complementary Complementary
Experiment Strong Strong
Insight Weak Strong
• Ting Zhang
88
IGC Code: https://github.com/homles11/IGCV3
89
Session 2: High-Resolution Representation Learning
Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang: Deep High-Resolution Representation Learning for Human Pose Estimation. CVPR 2019
Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang: High-Resolution
Representation Learning for labeling pixels and regions (Journal extension).
92
Wider - more channels
Deeper - more layers
Finer -
higher resolution
→ finer
New dimension: go finer towards high-resolution representation learning
deeper → wider
93
32 × 32 5 × 5
28 × 28
14 × 14
10 × 10
1/6
series
High-resolution conv. → medium-resolution conv. → low-resolution conv.
Low-resolution
and other classification networks: AlexNet, VGGNet, GoogleNet, ResNet, DenseNet, ……
94
Low resolution
is enough
image recog. pixel-level recog.region-level recog.
The high-resolution representation is needed
global position-sensitive
97
High-resolution
low-resolution classification networks
❑ Recover
Hourglass, U-Net, Encoder-decoder, DeconvNet, SimpleBaseline, etc
98
U-Net
SegNet
DeconvNet Hourglass
Look different, essentially the same
99
High-resolution
low-resolution classification networks
❑ Recover
location-sensitivity loss
Hourglass, U-Net, Encoder-decoder, DeconvNet, SimpleBaseline, etc
102
Learn high-resolution representations through high resolution
maintenance rather than recovering
High-resolution
103
series
104
parallel
with repeated fusions
105
parallel
repeated fusions
106
107
series
• Recover from low-resolution representations
• Repeat fusions across resolutions to strengthen high- & low-resolution
representations
parallel
Maintain through the whole process
HRNet can learn high-resolution strong representations
108
#blocks = 1 #blocks = 4 #blocks = 3
109
Image
classification
Semantic
segmentation
Object
detection
Face
alignment
Pose
estimation
111
112
113
Datasets training validation testing Evaluation
COCO 2017 57K 5000 images 20K AP@OKS
MPII 13K 12k PCKh
PoseTrack 292 videos 50 208 mAP/MOTA
COCO: http://cocodataset.org/#keypoints-eval
MPII http://human-pose.mpi-inf.mpg.de/
PoseTrack https://posetrack.net/
114
115
Method Backbone Pretrain Input size #Params GFLOPs AP AP50
AP75
APM
APL AR
8-stage Hourglass [38] 8-stage Hourglass N 256×192 25.1M 14.3 66.9 - - - - -
CPN [11] ResNet-50 Y 256×192 27.0M 6.2 68.6 - - - - -
CPN+OHKM [11] ResNet-50 Y 256×192 27.0M 6.2 69.4 - - - - -
SimpleBaseline [66] ResNet-50 Y 256×192 24.0M 8.9 70.4 88.6 78.3 67.1 77.2 76.3
SimpleBaseline [66] ResNet-101 Y 256×192 50.3M 12.4 71.4 89.3 79.3 68.1 78.1 77.1
HRNet-W32 HRNet-W32 N 256×192 28.5M 7.1 73.4 89.5 80.7 70.2 80.1 78.9
HRNet-W32 HRNet-W32 Y 256×192 28.5M 7.1 74.4 90.5 81.9 70.8 81.0 79.8
SimpleBaseline [66] ResNet-152 Y 256×192 68.6M 15.7 72.0 89.3 79.8 68.7 78.9 77.8
HRNet-W48 HRNet-W48 Y 256×192 63.6M 14.6 75.1 90.6 82.2 71.5 81.8 80.4
SimpleBaseline [66] ResNet-152 Y 384×288 68.6M 35.6 74.3 89.6 81.1 70.5 79.7 79.7
HRNet-W32 HRNet-W32 Y 384×288 28.5M 16.0 75.8 90.6 82.7 71.9 82.8 81.0
HRNet-W48 HRNet-W48 Y 384×288 63.6M 32.9 76.3 90.8 82.9 72.3 83.4 81.2
116
method Backbone Input size #Params GFLOPs AP AP50 AP75 APM APL AR
Bottom-up: keypoint detection and grouping
OpenPose [6], CMU - - - - 61.8 84.9 67.5 57.1 68.2 66.5
Associative Embedding [39] - - - - 65.5 86.8 72.3 60.6 72.6 70.2
PersonLab [46], Google - - - - 68.7 89.0 75.4 64.1 75.5 75.4
MultiPoseNet [33] - - - - 69.6 86.3 76.6 65.0 76.3 73.5
Top-down: human detection and single-person keypoint detection
Mask-RCNN [21], Facebook ResNet-50-FPN - - - 63.1 87.3 68.7 57.8 71.4 -
G-RMI [47] ResNet-101 353×257 42.0M 57.0 64.9 85.5 71.3 62.3 70.0 69.7
Integral Pose Regression [60] ResNet-101 256×256 45.0M 11.0 67.8 88.2 74.8 63.9 74.0 -
G-RMI + extra data [47] ResNet-101 353×257 42.6M 57.0 68.5 87.1 75.5 65.8 73.3 73.3
CPN [11] , Face++ ResNet-Inception 384×288 - - 72.1 91.4 80.0 68.7 77.2 78.5
RMPE [17] PyraNet [77] 320×256 28.1M 26.7 72.3 89.2 79.1 68.0 78.6 -
CFN [25] , - - - - 72.6 86.1 69.7 78.3 64.1 -
CPN (ensemble) [11], Face++ ResNet-Inception 384×288 - - 73.0 91.7 80.9 69.5 78.1 79.0
SimpleBaseline [72], Microsoft ResNet-152 384×288 68.6M 35.6 73.7 91.9 81.1 70.3 80.0 79.0
HRNet-W32 HRNet-W32 384×288 28.5M 16.0 74.9 92.5 82.8 71.3 80.9 80.1
HRNet-W48 HRNet-W48 384×288 63.6M 32.9 75.5 92.5 83.3 71.9 81.5 80.5
HRNet-W48 + extra data HRNet-W48 384×288 63.6M 32.9 77.0 92.7 84.5 73.4 83.1 82.0
117
121
https://posetrack.net/leaderboard.php
by Feb. 28, 2019
PoseTrack Leaderboard
Multi-Person Pose TrackingMulti-Frame Person Pose Estimation
125
COCO, train from scratch
Method Final exchange Int. exchange across Int. exchange within AP
(a) ✓ 70.8
(b) ✓ ✓ 71.9
(c) ✓ ✓ ✓ 73.4
126
COCO, train from scratch
128
Image
classification
Semantic
segmentation
Object
detection
Face
alignment
Pose
estimation
130
131
Regular convolution Multi-resolution convolution
(across-resolution fusion)
132
Datasets training testing #landmarks Evaluation (NME)
WFLW 7500 2500 98 inter-ocular
AFLW 20000 full 4386, frontal 1314 19 box
COFW 1345 507 29 inter-ocular
300W 3148 full 689, challenging 135, test 600 68 inter-ocular
133
backbone Test Pose Expr Illu Mu Occu Blur
ESR [10] - 11.13 25.88 11.47 10.49 11.05 13.75 12.20
SDM [105], CMU - 10.29 24.10 11.45 9.32 9.38 13.03 11.28
CFSS [132] - 9.07 21.36 10.09 8.30 8.74 11.76 9.96
DVLN [100] VGG-16 6.08 11.54 6.78 5.73 5.98 7.33 6.88
Our approach HRNetV2-W18 4.60 7.94 4.85 4.55 4.29 5.44 5.42
135
backbone full frontal
RCN [39] - 5.60 5.36
CDM [113] - 5.43 3.77
ERT [43] - 4.35 2.75
LBF [79] - 4.25 2.74
SDM [105], CMU - 4.05 2.94
CFSS [131] - 3.92 2.68
RCPR [8] - 3.73 2.87
CCL [132] - 2.72 2.17
DAC-CSR [29] - 2.27 1.81
TSR [68] VGG-S 2.17 -
CPM + SBR [25] CPM 2.14 -
SAN [24] ResNet-152 1.91 1.85
DSRN [69] - 1.86 -
LAB (w/o B) [99], SenseTime Hourglass 1.85 1.62
Our approach HRNetV2-W18 1.57 1.46
137
backbone NME FR0.1
Human - 5.60 -
ESR [10] - 11.20 36.00
RCPR [8] - 8.50 20.00
HPM [32] 7.50 13.00
CCR [27] - 7.03 10.90
DRDA [115] - 6.46 6.00
RAR [102] - 6.03 4.14
DAC-CSR [29] - 6.03 4.73
LAB (w/o B) [99], SenseTime Hourglass 5.58 2.76
Our approach HRNetV2-W18 3.45 0.19
139
backbone common challenging full
RCN [39] - 4.67 8.44 5.41
DSRN [69] - 4.12 9.68 5.21
PCD-CNN [50] - 3.67 7.62 4.44
CPM + SBR [25] CPM 3.28 7.58 4.10
SAN [24] ResNet-152 3.34 6.60 3.98
DAN [49] - 3.19 5.24 3.59
Our approach HRNetV2-W18 2.87 5.15 3.32
141
backbone NME AUC0.08 AUC0.1 FR0.08 FR0.1
Balt. et al. [3] - - 19.55 - 38.83 -
ESR [10] - 8.47 26.09 - 30.50 -
ERT [43] - 8.41 27.01 - 28.83 -
LBF [79] - 8.57 25.27 - 33.67 -
Face++ [127] - - 32.81 - 13.00 -
SDM [105], CMU - 5.83 36.27 - 13.00 -
CFAN [116] - 5.78 34.78 - 14.00 -
Yan et al. [108] - 5.74 34.97 - 12.67 -
CFSS [131] - 4.78 36.58 - 12.33 -
MDM [92] - 4.30 45.32 - 6.80 -
DAN [49] - 3.96 47.00 - 2.67 -
Chen et al. [17] Hourglass - 53.64 - 2.50 -
Deng et al. [21] - - - 47.52 - 5.50
Fan et al. [26] - - - 48.02 - 14.83
Dreg + MDM [35] ResNet101 - - 52.19 - 3.67
JMFA [22] Hourglass - - 54.85 - 1.00
Our approach HRNetV2-W18 3.85 52.09 61.55 1.00 0.33
143
Image
classification
Semantic
segmentation
Object
detection
Face
alignment
Pose
estimation
145
146
Datasets training validation testing #classes Evaluation
Cityscapes 2975 500 1525 19+1 mIoU
PASCAL context 4998 5105 59+1 mIoU
LIP 30462 10000 19+1 mIoU
147
backbone #Params. GFLOPs mIoU
U-Net++ [130] ResNet-101 59.5M 748.5 75.5
DeepLabv3 [14], Google Dilated-resNet-101 58.0M 1778.7 78.5
DeepLabv3+ [16], Google Dilted-Xception-71 43.5M 1444.6 79.6
PSPNet [123], SenseTime Dilated-ResNet-101 65.9M 2017.6 79.7
Our approach HRNetV2-W40 45.2M 493.2 80.2
Our approach HRNetV2-W48 65.9M 747.3 81.1
148
backbone mIoU iIoU cat. IoU cat. iIoU cat.
Model learned on the train+valid set
GridNet [130] - 69.5 44.1 87.9 71.1
LRR-4x [33] - 69.7 48.0 88.2 74.7
DeepLab [13], Google Dilated-ResNet-101 70.4 42.6 86.4 67.7
LC [54] - 71.1 - - -
Piecewise [60] VGG-16 71.6 51.7 87.3 74.1
FRRN [77] - 71.8 45.5 88.9 75.1
RefineNet [59] ResNet-101 73.6 47.2 87.9 70.6
PEARL [42] Dilated-ResNet-101 75.4 51.6 89.2 75.1
DSSPN [58] Dilated-ResNet-101 76.6 56.2 89.6 77.8
LKM [75] ResNet-152 76.9 - - -
DUC-HDC [97] - 77.6 53.6 90.1 75.2
SAC [117] Dilated-ResNet-101 78.1 - - -
DepthSeg [46] Dilated-ResNet-101 78.2 - - -
ResNet38 [101] WResNet-38 78.4 59.1 90.9 78.1
BiSeNet [111] ResNet-101 78.9 - - -
DFN [112] ResNet-101 79.3 - - -
PSANet [125], SenseTime Dilated-ResNet-101 80.1 - - -
PADNet [106] Dilated-ResNet-101 80.3 58.8 90.8 78.5
DenseASPP [124] WDenseNet-161 80.6 59.1 90.9 78.1
Our approach HRNetV2-w48 81.6 61.8 92.1 82.2
149
backbone
mIoU
(59classes)
mIoU
(60classes)
FCN-8s [86] VGG-16 - 35.1
BoxSup [20] - - 40.5
HO_CRF [1] - - 41.3
Piecewise [60] VGG-16 - 43.3
DeepLabv2 [13], Google Dilated-ResNet-101 - 45.7
RefineNet [59] ResNet-152 - 47.3
U-Net++ [130] ResNet-101 47.7 -
PSPNet [123], SenseTime Dilated-ResNet-101 47.8 -
Ding et al. [23] ResNet-101 51.6 -
EncNet [114] Dilated-ResNet-101 52.6 -
Our approach HRNetV2-W48 54.0 48.3
150
backbone extra pixel acc. avg. acc. mIoU
Attention+SSL [34] VGG-16 Pose 84.36 54.94 44.73
DeepLabv2 [16], Google Dilated-ResNet-101 - 84.09 55.62 44.80
MMAN[67] Dilated-ResNet-101 - - - 46.81
SS-NAN [125] ResNet-101 Pose 87.59 56.03 47.92
MuLA [72] Hourglass Pose 88.50 60.50 49.30
JPPNet [57] Dilated-ResNet-101 Pose 86.39 62.32 51.37
CE2P [65] Dilated-ResNet-101 Edge 87.37 63.20 53.10
Our approach HRNetV2-W48 N 88.21 67.43 55.90
151
Image
classification
Semantic
segmentation
Object
detection
Face
alignment
Pose
estimation
153
156
backbone Size LS AP AP50 AP75 APS APM APL
DFPR [47] ResNet-101 512 1 × 34.6 54.3 37.3 - - -
PFPNet [45] VGG16 512 - 35.2 57.6 37.9 18.7 38.6 45.9
RefineDet [118] ResNet-101-FPN 512 - 36.4 57.5 39.5 16.6 39.9 51.4
RelationNet [40] ResNet-101 600 - 39.0 58.6 42.9 - - -
C-FRCNN [18] ResNet-101 800 1 × 39.0 59.7 42.8 19.4 42.4 53.0
RetinaNet [62] ResNet-101-FPN 800 1.5 × 39.1 59.1 42.3 21.8 42.7 50.2
Deep Regionlets [107] ResNet-101 800 1.5 × 39.3 59.8 - 21.7 43.7 50.9
FitnessNMS [94] ResNet-101 768 39.5 58.0 42.6 18.9 43.5 54.1
DetNet [56] DetNet-59-FPN 800 2 × 40.3 62.1 43.8 23.6 42.6 50.0
CornerNet [51] Hourglass-104 511 40.5 56.5 43.1 19.4 42.7 53.9
M2Det [126] VGG16 800 ∼ 10 × 41.0 59.7 45.0 22.1 46.5 53.8
Faster R-CNN [61] ResNet-101-FPN 800 1 × 39.3 61.3 42.7 22.1 42.1 49.7
Faster R-CNN HRNetV2p-W32 800 1 × 39.5 61.2 43.0 23.3 41.7 49.1
Faster R-CNN [61] ResNet-101-FPN 800 2 × 40.3 61.8 43.9 22.6 43.1 51.0
Faster R-CNN HRNetV2p-W32 800 2 × 41.1 62.3 44.9 24.0 43.1 51.4
Faster R-CNN [61] ResNet-152-FPN 800 2 × 40.6 62.1 44.3 22.6 43.4 52.0
Faster R-CNN HRNetV2p-W40 800 2 × 42.1 63.2 46.1 24.6 44.5 52.6
Faster R-CNN [11] ResNeXt-101-64x4d-FPN 800 2 × 41.1 62.8 44.8 23.5 44.1 52.3
Faster R-CNN HRNetV2p-W48 800 2 × 42.4 63.6 46.4 24.9 44.6 53.0
Cascade R-CNN [9]* ResNet-101-FPN 800 ∼ 1.6 × 42.8 62.1 46.3 23.7 45.5 55.2
Cascade R-CNN ResNet-101-FPN 800 ∼ 1.6 × 43.1 61.7 46.7 24.1 45.9 55.0
Cascade R-CNN HRNetV2p-W32 800 ∼ 1.6 × 43.7 62.0 47.4 25.5 46.0 55.3
158
backbone LS
mask bbox
AP APS APM APL AP APS APM APL
ResNet-50-FPN 1 × 34.2 15.7 36.8 50.2 37.8 22.1 40.9 49.3
HRNetV2p-W18 1 × 33.8 15.6 35.6 49.8 37.1 21.9 39.5 47.9
ResNet-50-FPN 2 × 35.0 16.0 37.5 52.0 38.6 21.7 41.6 50.9
HRNetV2p-W18 2 × 35.3 16.9 37.5 51.8 39.2 23.7 41.7 51.0
ResNet-101-FPN 1 × 36.1 16.2 39.0 53.0 40.0 22.6 43.4 52.3
HRNetV2p-W32 1 × 36.7 17.3 39.0 53.0 40.9 24.5 43.9 52.2
ResNet-101-FPN 2 × 36.7 17.0 39.5 54.8 41.0 23.4 44.4 53.9
HRNetV2p-W32 2 × 37.6 17.8 40.0 55.0 42.3 25.0 45.4 54.9
159
Image
classification
Semantic
segmentation
Object
detection
Face
alignment
Pose
estimation
160
161
#Params. GFLOPs Top-1 err. Top-5 err.
Residual branch formed by two 3 × 3 convolutions
ResNet-38 28.3M 3.80 24.6% 7.4%
HRNet-W18 21.3M 3.99 23.1% 6.5%
ResNet-71 48.4M 7.46 23.3% 6.7%
HRNet-W30 37.7M 7.55 21.9% 5.9%
ResNet-105 64.9M 11.1 22.7% 6.4%
HRNet-W40 57.6M 11.8 21.1% 5.6%
Residual branch formed a bottleneck
ResNet-50 25.6M 3.82 23.3% 6.6%
HRNet-W44 21.9M 3.90 23.0% 6.5%
ResNet-101 44.6M 7.30 21.6% 5.8%
HRNet-W76 40.8M 7.30 21.5% 5.8%
ResNet-152 60.2M 10.7 21.2% 5.7%
HRNet-W96 57.5M 10.2 21.0% 5.7%
Surprisingly, HRNet performs slightly better than ResNet
163
164
Cityscapes and pascal context COCO detection
165
vs vs
166
image-level pixel-levelregion-level
Low resolution
High resolution
Recover from low-resolution (ResNet, VGGNet) 
High-resolution (our HRNet) ✓
vs
167
vs vs
168
Convolutional neural fabrics
Gridnet: generalized U-NetInterlinked CNN
Multi-scale densenet
169
by Google
Related to HRNet, but no high-resolution maintenance
171
Image
classification
Semantic
segmentation
Object
detection
Face
alignment
Pose
estimation
and …
172
173
174
Super-resolution from LapSRN Optical flow
Depth estimation Edge detection
175
176
Used in many challenges in CVPR 2019
177
, CVPRW 2019
Meitu (美图) adopted the HRNet
178
NTIRE 2019 Image Dehazing Challenge Report, CVPRW 2019
Meitu (美图)
adopted the HRNet
179
180
181
182
Cityscapes leaderboard: Rank 1
183
semantic
segmentation, object detection, facial landmark detection, human pose estimation
Replace classification networks (e.g., ResNet) for computer vision tasks
184
Thanks!
Q&A
185
https://jingdongwang2017.github.io/Projects/HRNet/

Mais conteúdo relacionado

Mais procurados

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)NAVER Engineering
 
Batch normalization paper review
Batch normalization paper reviewBatch normalization paper review
Batch normalization paper reviewMinho Heo
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
Super resolution-review
Super resolution-reviewSuper resolution-review
Super resolution-reviewWoojin Jeong
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 
Scaling up Deep Learning Based Super Resolution Algorithms
Scaling up Deep Learning Based Super Resolution AlgorithmsScaling up Deep Learning Based Super Resolution Algorithms
Scaling up Deep Learning Based Super Resolution AlgorithmsXiaoyong Zhu
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization太一郎 遠藤
 
Discrete cosine transform
Discrete cosine transform   Discrete cosine transform
Discrete cosine transform Rashmi Karkra
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolutionRonak Mehta
 
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...mlaij
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksGreeshma M.S.R
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural networkAbdullah Khan Zehady
 
Medical image fusion using curvelet transform 2-3-4-5
Medical image fusion using curvelet transform 2-3-4-5Medical image fusion using curvelet transform 2-3-4-5
Medical image fusion using curvelet transform 2-3-4-5IAEME Publication
 

Mais procurados (20)

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
 
Batch normalization paper review
Batch normalization paper reviewBatch normalization paper review
Batch normalization paper review
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Super resolution-review
Super resolution-reviewSuper resolution-review
Super resolution-review
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
Super resolution from a single image
Super resolution from a single imageSuper resolution from a single image
Super resolution from a single image
 
Scaling up Deep Learning Based Super Resolution Algorithms
Scaling up Deep Learning Based Super Resolution AlgorithmsScaling up Deep Learning Based Super Resolution Algorithms
Scaling up Deep Learning Based Super Resolution Algorithms
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
Discrete cosine transform
Discrete cosine transform   Discrete cosine transform
Discrete cosine transform
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolution
 
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional NetworksSingle Image Super Resolution using Fuzzy Deep Convolutional Networks
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
 
Medical image fusion using curvelet transform 2-3-4-5
Medical image fusion using curvelet transform 2-3-4-5Medical image fusion using curvelet transform 2-3-4-5
Medical image fusion using curvelet transform 2-3-4-5
 

Semelhante a Architecture Design for Deep Neural Networks II

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用NVIDIA Taiwan
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationQualcomm Research
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012Ted Dunning
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...taeseon ryu
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnnNAVER D2
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Mara Graziani
 

Semelhante a Architecture Design for Deep Neural Networks II (20)

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Resnet
ResnetResnet
Resnet
 
Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Fast Algorithms for Quantized Convolutional Neural Networks
Fast Algorithms for Quantized Convolutional Neural NetworksFast Algorithms for Quantized Convolutional Neural Networks
Fast Algorithms for Quantized Convolutional Neural Networks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
Co coon 35min
Co coon 35minCo coon 35min
Co coon 35min
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
 
Mincut_Placement_Final_Report
Mincut_Placement_Final_ReportMincut_Placement_Final_Report
Mincut_Placement_Final_Report
 

Mais de Wanjin Yu

Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIWanjin Yu
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia RecommendationWanjin Yu
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IWanjin Yu
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learningWanjin Yu
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportationWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering IIWanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 

Mais de Wanjin Yu (15)

Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia Recommendation
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 

Último

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 

Último (20)

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 

Architecture Design for Deep Neural Networks II

  • 1.
  • 3. 3 Deeper - more layers Go deeper for boosting the capacity deeper
  • 4. 8 4
  • 9. 9
  • 10. Session 1: Interleaved Group Convolutions for efficient and wider CNNs
  • 11. 11 Wider - more channels Deeper - more layers Go wider by eliminating redundancy in convolutions → widerdeeper
  • 12. • Convolution operations • Low-precision kernels • Low-rank kernels • Composition from low-rank kernels • Sparse kernels • Composition from sparse kernels 12
  • 13. 14
  • 14. 15
  • 15. 16
  • 16. 17
  • 17. • Convolution operations • Low-precision kernels • Low-rank kernels • Composition from low-rank kernels • Sparse kernels • Composition from sparse kernels 18
  • 21. • Convolution operations • Low-precision kernels • Low-rank kernels • Composition from low-rank kernels • Sparse kernels • Composition from sparse kernels 22
  • 22. • Filter pruning • Channel pruning 23
  • 23. • Filter pruning • Channel pruning 24
  • 24. • Filter pruning • Channel pruning 25
  • 25. • Filter pruning • Channel pruning 26
  • 26. • Convolution operations • Low-precision kernels • Low-rank kernels • Composition from low-rank kernels • Sparse kernels • Composition from sparse kernels 27
  • 27. 28
  • 28. • Convolution operations • Low-precision kernels • Low-rank kernels • Composition from low-rank kernels • Sparse kernels • Composition from sparse kernels 29
  • 29. • Non-structured sparse • Structured sparse 30
  • 30. • Non-structured sparse • Structured sparse 31
  • 31. • Convolution operations • Low-precision kernels • Low-rank kernels • Composition from low-rank kernels • Sparse kernels • Composition from sparse kernels 32
  • 32. [1] Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang: Interleaved Group Convolutions. ICCV 2017: 4383-4392 [2] Guotian Xie, Jingdong Wang, Ting Zhang, Jianhuang Lai, Richang Hong, and Guo-JunQi. IGCV2: Interleaved Structured Sparse Convolutional Neural Networks. CVPR 2018. [3] Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks. BMVC 2018. 33
  • 33. IGCV1: Interleaved group convolutions for large models Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang: Interleaved Group Convolutions, ICCV 2017. 34
  • 36. Split Concat. Conv. Conv. Primary group convolution Split Conv. Conv. Conv. Concat. Secondary group 1 × 1 convolution L=2 primary partitions 3 channels in each partition M=3 secondary partitions 2 channels in each partition 37
  • 37. The path connecting each input channel with each output channel Split Concat. Conv. Conv. Split Conv. Conv. Conv. Concat. 38
  • 38. Split Concat. Conv. Conv. Split Conv. Conv. Conv. Concat. x′ = PW 𝑑 P 𝑇 W 𝑝 x x W 𝑝 P 𝑇 W 𝑑 P x′ 39
  • 39. x′ = PW 𝑑 P 𝑇 W 𝑝 x dense 40
  • 40. Advantages: Wider than regular convolutions Thus, our IGC is wider except 𝐿 = 1 under the same #parameters 41
  • 41. depth RegConv-18 IGC 20 92.55 ± 0.14 92.84 ± 0.26 38 91.57 ± 0.09 92.24 ± 0.62 62 88.60 ± 0.49 90.03 ± 0.85 depth RegConv-18 IGC 20 0.34 0.15 38 0.71 0.31 62 1.20 0.52 depth RegConv-18 IGC 20 0.51 0.29 38 1.1 0.57 62 1.7 0.95 +1.43 42
  • 42. depth RegConv-18 IGC 20 68.71 ± 0.32 70.54 ± 0.26 38 65.00 ± 0.57 69.56 ± 0.76 62 58.52 ± 2.31 65.84 ± 0.75 depth RegConv-18 IGC 20 0.34 0.15 38 0.71 0.31 62 1.20 0.52 depth RegConv-18 IGC 20 0.51 0.29 38 1.1 0.57 62 1.7 0.95 +7.32 43
  • 43. #params FLOPS Training error Validation error Top-1 Top-5 Top-1 Top-5 ResNet (Reg. Conv.) 1.133 2.1 21.43 5.96 30.58 10.77 Our approach 0.861 1.3 13.93 2.75 26.95 8.92 +3.63 +1.85 44
  • 44. • Primary group convolution • Secondary group convolution Regular convolutions are interleaved group convolutions Primary Secondary W11 W12 W21 W22 + + W 45
  • 46. 23 32 39 50 54 72 84 80 64 44 62 88 124 154 184 196 205 192 170 128width M=2: Higher accuracy than Xception Best accuracy 48
  • 47. Xception Our approach testing error 36.93 ± 0.54 36.11 ± 0.62 #params 3.62 × 104 3.80 × 104 FLOPS 3.05 × 107 3.07 × 107 Xception Our approach testing error 32.87 ± 0.67 31.87 ± 0.58 #params 1.21 × 105 1.26 × 105 FLOPS 1.11 × 108 1.12 × 108 −0.82 −1.00 49
  • 49. 51
  • 50. IGCV2: Interleaved Structured Sparse Convolutional Neural Networks Guotian Xie, Jingdong Wang, Ting Zhang, Jianhuang Lai, Richang Hong, and Guo-JunQi. IGCV2: Interleaved Structured Sparse Convolutional Neural Networks. CVPR 2018. 53
  • 56. Split Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. ConcatSplitConcat SplitConcat 𝐿 (e.g., 3 or >3) group convolutionsx′ = P3W3P2W2P1W1x W3W1 P1 W2 P2 P3x Dense due to complementary condition 59
  • 57. Design: Balance condition minimum total #parameters, three conditions hold: 1) #parameters for (𝐿 − 1) group convolutions are the same 2) #branches for (𝐿 − 1) group convolutions are the same 3) #channels in each branch are the same How to design 𝐿 group convolutions #channels in each branch: 𝐾2 = ⋯ = 𝐾𝐿 = 𝐶 1 𝐿−1 = 𝐾 width IGC block: 1 channel-wise 3 × 3 convolution + (𝐿 − 1) group 1 × 1 convolutions 61
  • 58. Meet complementary and balance conditions 62
  • 59. #parameters When balance condition holds 3 × 3 1 × 1 How many group convolutions? 𝐿 = log 𝐶 + 1 so that 𝑄 is minimum Benefit: Further redundancy reduction by composing more (𝐿 > 2) sparse matrices 𝑄 𝐿 = 3 < 𝑄 𝐿 = 2 if 𝐶 > 4 64
  • 60. 65
  • 61. Depth IGCV1 IGCV2 8 66.52 ± 0.12 67.65 ± 0.29 20 70.87 ± 0.39 72.63 ± 0.07 26 71.82 ± 0.40 73.49 ± 0.34 Depth IGCV1 IGCV2 8 0.046 0.046 20 0.151 0.144 26 0.203 0.193 Depth IGCV1 IGCV2 8 1.00 1.18 20 2.89 3.20 26 3.83 4.21 +1.67 66
  • 62. Depth IGCV1 IGCV2 8 48.57 ± 0.53 51.49 ± 0.33 20 54.83 ± 0.27 56.40 ± 0.17 26 56.64 ± 0.15 57.12 ± 0.09 Depth IGCV1 IGCV2 8 0.046 0.046 20 0.151 0.144 26 0.203 0.193 Depth IGCV1 IGCV2 8 1.00 1.18 20 2.89 3.20 26 3.83 4.21 +0.48 67
  • 63. • Complementary condition • Strict → loose small 69
  • 65. Strict is one and only one path ˄ Loose are more paths rich 71
  • 66. Strict channels lie in different branches and super-channels Loose 72
  • 68. Model #Params. (M) FLOPS (M) Accuracy (%) MobileNetV1-1.0 4.2 569 70.6 IGCV2-1.0 4.1 564 70.7 MobileNetV1-0.5 1.3 149 63.7 IGCV2-0.5 1.3 156 65.5 MobileNetV1-0.25 0.5 41 50.6 IGCV2-0.25 0.5 46 54.9 75
  • 69. IGCV3: Interleaved Low-Rank Group Convolutions Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks. BMVC 2018. 76
  • 70. • Low-rank structured sparse matrix composition • Structured sparse + low-rank small 77
  • 71. 1 × 1 Gconv. 3 × 3 Cconv. 1 × 1 Gconv. Channel Gconv 3 × 3 Cconv Gcon 78
  • 72. Ke Sun, Mingjie Li, Dong Liu, and Jingdong Wang. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks. BMVC 2018. small IGCV3 79
  • 73. 81
  • 74. 82
  • 75. 83
  • 76. • Small • Fast • High Summary 84
  • 77. 87 MobileNet IGC ShuffleNet v1 v1/v2 ≈ v1 Block Depthwise + pointwise 2/2+ group conv. 2 group conv. Relation Complementary Complementary Shuffled Experiment Strong Weak Strong Insight Weak Strong Weak v2 v3 v2 Block Depthwise + low rank pointwise 2 group + low-rank pointwise Relation Complementary Complementary Experiment Strong Strong Insight Weak Strong
  • 80. Session 2: High-Resolution Representation Learning Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang: Deep High-Resolution Representation Learning for Human Pose Estimation. CVPR 2019 Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang: High-Resolution Representation Learning for labeling pixels and regions (Journal extension).
  • 81. 92 Wider - more channels Deeper - more layers Finer - higher resolution → finer New dimension: go finer towards high-resolution representation learning deeper → wider
  • 82. 93 32 × 32 5 × 5 28 × 28 14 × 14 10 × 10 1/6 series High-resolution conv. → medium-resolution conv. → low-resolution conv. Low-resolution and other classification networks: AlexNet, VGGNet, GoogleNet, ResNet, DenseNet, ……
  • 83. 94 Low resolution is enough image recog. pixel-level recog.region-level recog. The high-resolution representation is needed global position-sensitive
  • 84. 97 High-resolution low-resolution classification networks ❑ Recover Hourglass, U-Net, Encoder-decoder, DeconvNet, SimpleBaseline, etc
  • 86. 99 High-resolution low-resolution classification networks ❑ Recover location-sensitivity loss Hourglass, U-Net, Encoder-decoder, DeconvNet, SimpleBaseline, etc
  • 87. 102 Learn high-resolution representations through high resolution maintenance rather than recovering High-resolution
  • 91. 106
  • 92. 107 series • Recover from low-resolution representations • Repeat fusions across resolutions to strengthen high- & low-resolution representations parallel Maintain through the whole process HRNet can learn high-resolution strong representations
  • 93. 108 #blocks = 1 #blocks = 4 #blocks = 3
  • 95. 111
  • 96. 112
  • 97. 113 Datasets training validation testing Evaluation COCO 2017 57K 5000 images 20K AP@OKS MPII 13K 12k PCKh PoseTrack 292 videos 50 208 mAP/MOTA COCO: http://cocodataset.org/#keypoints-eval MPII http://human-pose.mpi-inf.mpg.de/ PoseTrack https://posetrack.net/
  • 98. 114
  • 99. 115 Method Backbone Pretrain Input size #Params GFLOPs AP AP50 AP75 APM APL AR 8-stage Hourglass [38] 8-stage Hourglass N 256×192 25.1M 14.3 66.9 - - - - - CPN [11] ResNet-50 Y 256×192 27.0M 6.2 68.6 - - - - - CPN+OHKM [11] ResNet-50 Y 256×192 27.0M 6.2 69.4 - - - - - SimpleBaseline [66] ResNet-50 Y 256×192 24.0M 8.9 70.4 88.6 78.3 67.1 77.2 76.3 SimpleBaseline [66] ResNet-101 Y 256×192 50.3M 12.4 71.4 89.3 79.3 68.1 78.1 77.1 HRNet-W32 HRNet-W32 N 256×192 28.5M 7.1 73.4 89.5 80.7 70.2 80.1 78.9 HRNet-W32 HRNet-W32 Y 256×192 28.5M 7.1 74.4 90.5 81.9 70.8 81.0 79.8 SimpleBaseline [66] ResNet-152 Y 256×192 68.6M 15.7 72.0 89.3 79.8 68.7 78.9 77.8 HRNet-W48 HRNet-W48 Y 256×192 63.6M 14.6 75.1 90.6 82.2 71.5 81.8 80.4 SimpleBaseline [66] ResNet-152 Y 384×288 68.6M 35.6 74.3 89.6 81.1 70.5 79.7 79.7 HRNet-W32 HRNet-W32 Y 384×288 28.5M 16.0 75.8 90.6 82.7 71.9 82.8 81.0 HRNet-W48 HRNet-W48 Y 384×288 63.6M 32.9 76.3 90.8 82.9 72.3 83.4 81.2
  • 100. 116 method Backbone Input size #Params GFLOPs AP AP50 AP75 APM APL AR Bottom-up: keypoint detection and grouping OpenPose [6], CMU - - - - 61.8 84.9 67.5 57.1 68.2 66.5 Associative Embedding [39] - - - - 65.5 86.8 72.3 60.6 72.6 70.2 PersonLab [46], Google - - - - 68.7 89.0 75.4 64.1 75.5 75.4 MultiPoseNet [33] - - - - 69.6 86.3 76.6 65.0 76.3 73.5 Top-down: human detection and single-person keypoint detection Mask-RCNN [21], Facebook ResNet-50-FPN - - - 63.1 87.3 68.7 57.8 71.4 - G-RMI [47] ResNet-101 353×257 42.0M 57.0 64.9 85.5 71.3 62.3 70.0 69.7 Integral Pose Regression [60] ResNet-101 256×256 45.0M 11.0 67.8 88.2 74.8 63.9 74.0 - G-RMI + extra data [47] ResNet-101 353×257 42.6M 57.0 68.5 87.1 75.5 65.8 73.3 73.3 CPN [11] , Face++ ResNet-Inception 384×288 - - 72.1 91.4 80.0 68.7 77.2 78.5 RMPE [17] PyraNet [77] 320×256 28.1M 26.7 72.3 89.2 79.1 68.0 78.6 - CFN [25] , - - - - 72.6 86.1 69.7 78.3 64.1 - CPN (ensemble) [11], Face++ ResNet-Inception 384×288 - - 73.0 91.7 80.9 69.5 78.1 79.0 SimpleBaseline [72], Microsoft ResNet-152 384×288 68.6M 35.6 73.7 91.9 81.1 70.3 80.0 79.0 HRNet-W32 HRNet-W32 384×288 28.5M 16.0 74.9 92.5 82.8 71.3 80.9 80.1 HRNet-W48 HRNet-W48 384×288 63.6M 32.9 75.5 92.5 83.3 71.9 81.5 80.5 HRNet-W48 + extra data HRNet-W48 384×288 63.6M 32.9 77.0 92.7 84.5 73.4 83.1 82.0
  • 101. 117
  • 102. 121 https://posetrack.net/leaderboard.php by Feb. 28, 2019 PoseTrack Leaderboard Multi-Person Pose TrackingMulti-Frame Person Pose Estimation
  • 103. 125 COCO, train from scratch Method Final exchange Int. exchange across Int. exchange within AP (a) ✓ 70.8 (b) ✓ ✓ 71.9 (c) ✓ ✓ ✓ 73.4
  • 106. 130
  • 107. 131 Regular convolution Multi-resolution convolution (across-resolution fusion)
  • 108. 132 Datasets training testing #landmarks Evaluation (NME) WFLW 7500 2500 98 inter-ocular AFLW 20000 full 4386, frontal 1314 19 box COFW 1345 507 29 inter-ocular 300W 3148 full 689, challenging 135, test 600 68 inter-ocular
  • 109. 133 backbone Test Pose Expr Illu Mu Occu Blur ESR [10] - 11.13 25.88 11.47 10.49 11.05 13.75 12.20 SDM [105], CMU - 10.29 24.10 11.45 9.32 9.38 13.03 11.28 CFSS [132] - 9.07 21.36 10.09 8.30 8.74 11.76 9.96 DVLN [100] VGG-16 6.08 11.54 6.78 5.73 5.98 7.33 6.88 Our approach HRNetV2-W18 4.60 7.94 4.85 4.55 4.29 5.44 5.42
  • 110. 135 backbone full frontal RCN [39] - 5.60 5.36 CDM [113] - 5.43 3.77 ERT [43] - 4.35 2.75 LBF [79] - 4.25 2.74 SDM [105], CMU - 4.05 2.94 CFSS [131] - 3.92 2.68 RCPR [8] - 3.73 2.87 CCL [132] - 2.72 2.17 DAC-CSR [29] - 2.27 1.81 TSR [68] VGG-S 2.17 - CPM + SBR [25] CPM 2.14 - SAN [24] ResNet-152 1.91 1.85 DSRN [69] - 1.86 - LAB (w/o B) [99], SenseTime Hourglass 1.85 1.62 Our approach HRNetV2-W18 1.57 1.46
  • 111. 137 backbone NME FR0.1 Human - 5.60 - ESR [10] - 11.20 36.00 RCPR [8] - 8.50 20.00 HPM [32] 7.50 13.00 CCR [27] - 7.03 10.90 DRDA [115] - 6.46 6.00 RAR [102] - 6.03 4.14 DAC-CSR [29] - 6.03 4.73 LAB (w/o B) [99], SenseTime Hourglass 5.58 2.76 Our approach HRNetV2-W18 3.45 0.19
  • 112. 139 backbone common challenging full RCN [39] - 4.67 8.44 5.41 DSRN [69] - 4.12 9.68 5.21 PCD-CNN [50] - 3.67 7.62 4.44 CPM + SBR [25] CPM 3.28 7.58 4.10 SAN [24] ResNet-152 3.34 6.60 3.98 DAN [49] - 3.19 5.24 3.59 Our approach HRNetV2-W18 2.87 5.15 3.32
  • 113. 141 backbone NME AUC0.08 AUC0.1 FR0.08 FR0.1 Balt. et al. [3] - - 19.55 - 38.83 - ESR [10] - 8.47 26.09 - 30.50 - ERT [43] - 8.41 27.01 - 28.83 - LBF [79] - 8.57 25.27 - 33.67 - Face++ [127] - - 32.81 - 13.00 - SDM [105], CMU - 5.83 36.27 - 13.00 - CFAN [116] - 5.78 34.78 - 14.00 - Yan et al. [108] - 5.74 34.97 - 12.67 - CFSS [131] - 4.78 36.58 - 12.33 - MDM [92] - 4.30 45.32 - 6.80 - DAN [49] - 3.96 47.00 - 2.67 - Chen et al. [17] Hourglass - 53.64 - 2.50 - Deng et al. [21] - - - 47.52 - 5.50 Fan et al. [26] - - - 48.02 - 14.83 Dreg + MDM [35] ResNet101 - - 52.19 - 3.67 JMFA [22] Hourglass - - 54.85 - 1.00 Our approach HRNetV2-W18 3.85 52.09 61.55 1.00 0.33
  • 115. 145
  • 116. 146 Datasets training validation testing #classes Evaluation Cityscapes 2975 500 1525 19+1 mIoU PASCAL context 4998 5105 59+1 mIoU LIP 30462 10000 19+1 mIoU
  • 117. 147 backbone #Params. GFLOPs mIoU U-Net++ [130] ResNet-101 59.5M 748.5 75.5 DeepLabv3 [14], Google Dilated-resNet-101 58.0M 1778.7 78.5 DeepLabv3+ [16], Google Dilted-Xception-71 43.5M 1444.6 79.6 PSPNet [123], SenseTime Dilated-ResNet-101 65.9M 2017.6 79.7 Our approach HRNetV2-W40 45.2M 493.2 80.2 Our approach HRNetV2-W48 65.9M 747.3 81.1
  • 118. 148 backbone mIoU iIoU cat. IoU cat. iIoU cat. Model learned on the train+valid set GridNet [130] - 69.5 44.1 87.9 71.1 LRR-4x [33] - 69.7 48.0 88.2 74.7 DeepLab [13], Google Dilated-ResNet-101 70.4 42.6 86.4 67.7 LC [54] - 71.1 - - - Piecewise [60] VGG-16 71.6 51.7 87.3 74.1 FRRN [77] - 71.8 45.5 88.9 75.1 RefineNet [59] ResNet-101 73.6 47.2 87.9 70.6 PEARL [42] Dilated-ResNet-101 75.4 51.6 89.2 75.1 DSSPN [58] Dilated-ResNet-101 76.6 56.2 89.6 77.8 LKM [75] ResNet-152 76.9 - - - DUC-HDC [97] - 77.6 53.6 90.1 75.2 SAC [117] Dilated-ResNet-101 78.1 - - - DepthSeg [46] Dilated-ResNet-101 78.2 - - - ResNet38 [101] WResNet-38 78.4 59.1 90.9 78.1 BiSeNet [111] ResNet-101 78.9 - - - DFN [112] ResNet-101 79.3 - - - PSANet [125], SenseTime Dilated-ResNet-101 80.1 - - - PADNet [106] Dilated-ResNet-101 80.3 58.8 90.8 78.5 DenseASPP [124] WDenseNet-161 80.6 59.1 90.9 78.1 Our approach HRNetV2-w48 81.6 61.8 92.1 82.2
  • 119. 149 backbone mIoU (59classes) mIoU (60classes) FCN-8s [86] VGG-16 - 35.1 BoxSup [20] - - 40.5 HO_CRF [1] - - 41.3 Piecewise [60] VGG-16 - 43.3 DeepLabv2 [13], Google Dilated-ResNet-101 - 45.7 RefineNet [59] ResNet-152 - 47.3 U-Net++ [130] ResNet-101 47.7 - PSPNet [123], SenseTime Dilated-ResNet-101 47.8 - Ding et al. [23] ResNet-101 51.6 - EncNet [114] Dilated-ResNet-101 52.6 - Our approach HRNetV2-W48 54.0 48.3
  • 120. 150 backbone extra pixel acc. avg. acc. mIoU Attention+SSL [34] VGG-16 Pose 84.36 54.94 44.73 DeepLabv2 [16], Google Dilated-ResNet-101 - 84.09 55.62 44.80 MMAN[67] Dilated-ResNet-101 - - - 46.81 SS-NAN [125] ResNet-101 Pose 87.59 56.03 47.92 MuLA [72] Hourglass Pose 88.50 60.50 49.30 JPPNet [57] Dilated-ResNet-101 Pose 86.39 62.32 51.37 CE2P [65] Dilated-ResNet-101 Edge 87.37 63.20 53.10 Our approach HRNetV2-W48 N 88.21 67.43 55.90
  • 122. 153
  • 123. 156 backbone Size LS AP AP50 AP75 APS APM APL DFPR [47] ResNet-101 512 1 × 34.6 54.3 37.3 - - - PFPNet [45] VGG16 512 - 35.2 57.6 37.9 18.7 38.6 45.9 RefineDet [118] ResNet-101-FPN 512 - 36.4 57.5 39.5 16.6 39.9 51.4 RelationNet [40] ResNet-101 600 - 39.0 58.6 42.9 - - - C-FRCNN [18] ResNet-101 800 1 × 39.0 59.7 42.8 19.4 42.4 53.0 RetinaNet [62] ResNet-101-FPN 800 1.5 × 39.1 59.1 42.3 21.8 42.7 50.2 Deep Regionlets [107] ResNet-101 800 1.5 × 39.3 59.8 - 21.7 43.7 50.9 FitnessNMS [94] ResNet-101 768 39.5 58.0 42.6 18.9 43.5 54.1 DetNet [56] DetNet-59-FPN 800 2 × 40.3 62.1 43.8 23.6 42.6 50.0 CornerNet [51] Hourglass-104 511 40.5 56.5 43.1 19.4 42.7 53.9 M2Det [126] VGG16 800 ∼ 10 × 41.0 59.7 45.0 22.1 46.5 53.8 Faster R-CNN [61] ResNet-101-FPN 800 1 × 39.3 61.3 42.7 22.1 42.1 49.7 Faster R-CNN HRNetV2p-W32 800 1 × 39.5 61.2 43.0 23.3 41.7 49.1 Faster R-CNN [61] ResNet-101-FPN 800 2 × 40.3 61.8 43.9 22.6 43.1 51.0 Faster R-CNN HRNetV2p-W32 800 2 × 41.1 62.3 44.9 24.0 43.1 51.4 Faster R-CNN [61] ResNet-152-FPN 800 2 × 40.6 62.1 44.3 22.6 43.4 52.0 Faster R-CNN HRNetV2p-W40 800 2 × 42.1 63.2 46.1 24.6 44.5 52.6 Faster R-CNN [11] ResNeXt-101-64x4d-FPN 800 2 × 41.1 62.8 44.8 23.5 44.1 52.3 Faster R-CNN HRNetV2p-W48 800 2 × 42.4 63.6 46.4 24.9 44.6 53.0 Cascade R-CNN [9]* ResNet-101-FPN 800 ∼ 1.6 × 42.8 62.1 46.3 23.7 45.5 55.2 Cascade R-CNN ResNet-101-FPN 800 ∼ 1.6 × 43.1 61.7 46.7 24.1 45.9 55.0 Cascade R-CNN HRNetV2p-W32 800 ∼ 1.6 × 43.7 62.0 47.4 25.5 46.0 55.3
  • 124. 158 backbone LS mask bbox AP APS APM APL AP APS APM APL ResNet-50-FPN 1 × 34.2 15.7 36.8 50.2 37.8 22.1 40.9 49.3 HRNetV2p-W18 1 × 33.8 15.6 35.6 49.8 37.1 21.9 39.5 47.9 ResNet-50-FPN 2 × 35.0 16.0 37.5 52.0 38.6 21.7 41.6 50.9 HRNetV2p-W18 2 × 35.3 16.9 37.5 51.8 39.2 23.7 41.7 51.0 ResNet-101-FPN 1 × 36.1 16.2 39.0 53.0 40.0 22.6 43.4 52.3 HRNetV2p-W32 1 × 36.7 17.3 39.0 53.0 40.9 24.5 43.9 52.2 ResNet-101-FPN 2 × 36.7 17.0 39.5 54.8 41.0 23.4 44.4 53.9 HRNetV2p-W32 2 × 37.6 17.8 40.0 55.0 42.3 25.0 45.4 54.9
  • 126. 160
  • 127. 161 #Params. GFLOPs Top-1 err. Top-5 err. Residual branch formed by two 3 × 3 convolutions ResNet-38 28.3M 3.80 24.6% 7.4% HRNet-W18 21.3M 3.99 23.1% 6.5% ResNet-71 48.4M 7.46 23.3% 6.7% HRNet-W30 37.7M 7.55 21.9% 5.9% ResNet-105 64.9M 11.1 22.7% 6.4% HRNet-W40 57.6M 11.8 21.1% 5.6% Residual branch formed a bottleneck ResNet-50 25.6M 3.82 23.3% 6.6% HRNet-W44 21.9M 3.90 23.0% 6.5% ResNet-101 44.6M 7.30 21.6% 5.8% HRNet-W76 40.8M 7.30 21.5% 5.8% ResNet-152 60.2M 10.7 21.2% 5.7% HRNet-W96 57.5M 10.2 21.0% 5.7% Surprisingly, HRNet performs slightly better than ResNet
  • 128. 163
  • 129. 164 Cityscapes and pascal context COCO detection
  • 131. 166 image-level pixel-levelregion-level Low resolution High resolution Recover from low-resolution (ResNet, VGGNet)  High-resolution (our HRNet) ✓ vs
  • 133. 168 Convolutional neural fabrics Gridnet: generalized U-NetInterlinked CNN Multi-scale densenet
  • 134. 169 by Google Related to HRNet, but no high-resolution maintenance
  • 136. 172
  • 137. 173
  • 138. 174 Super-resolution from LapSRN Optical flow Depth estimation Edge detection
  • 139. 175
  • 140. 176 Used in many challenges in CVPR 2019
  • 141. 177 , CVPRW 2019 Meitu (美图) adopted the HRNet
  • 142. 178 NTIRE 2019 Image Dehazing Challenge Report, CVPRW 2019 Meitu (美图) adopted the HRNet
  • 143. 179
  • 144. 180
  • 145. 181
  • 147. 183 semantic segmentation, object detection, facial landmark detection, human pose estimation Replace classification networks (e.g., ResNet) for computer vision tasks
  • 148. 184