文献紹介：CutDepth: Edge-aware Data Augmentation in Depth Estimation

!"#$%&#'()*+%,-.-/%0$-#-0
1"+2%3#-#4530430$%&#'0)6#42-#453
!"#$%&'()*#+((, -"."/&#+()!"0"#+(1",)"'2(34546
木全潤（名工大玉木研）
論文紹介4546764758

概要
n単眼深度推定
• 9:;画像と深度の両方が必要
• 大規模データの収集が困難
nデータ拡張
• 高次のタスク<物体推定など=に対して研究
• 低次のタスク<深度推定など=に
対しての研究は少ない
n提案手法
• 単眼深度推定用のデータ拡張>$1?@A1+
• 深度情報を9:;画像に埋め込み
• エッジの特徴を崩さずにデータを拡張
1006 Kadoma, Kadoma City, O
ishii.yasunori@jp.panaso
(a) Input image (b) Depth (c) Pr
(d) CutOut (e) RE (
Figure 1. Examples of data augmen
Abstract
v1
[cs.CV]
16
Jul
2021
Panasonic
1006 Kadoma, Kadoma City, Osaka, Japan
ishii.yasunori@jp.panasonic.com
(a) Input image (b) Depth (c) Proposed method

関連手法
nデータ拡張
• 光学的変換
• 色，輝度の変更
• 幾何学的変換
• 反転，並進，アフィン変換，ランダムクリップ
n画像の一部を置き換えるデータ拡張手法
• >$1&$1)B?@C'(@#D,)"'2(3456EF
• 画像の一部を欠落
• 9"%G&0)H'"#(%I)<9H=)BJ+&%ID,)KKK*4545F
• 画像の一部をランダム値で埋める
• >$1L(M)B!$%D,)*>>C456NF
• 画像の一部を別画像で埋める
1006 Kadoma, Kadoma
ishii.yasunori@jp.p
(a) Input image (b) Depth
(d) CutOut (e) RE
Figure 1. Examples of data
cs.CV]
16
Jul
2021
(a) Input image
(d) CutOut
Figure 1.
It is difficult to c
84v1
[cs.CV]
16
Jul
(a) Input image (b) Depth (c) Proposed me
(d) CutOut (e) RE (f) CutMix
Figure 1. Examples of data augmentation
s.CV]
16
Jul
2021
(a) Input image (b) Depth (c) Proposed method
(d) CutOut (e) RE (f) CutMix
Figure 1. Examples of data augmentation
s.CV]
16
Jul
2021

!"#$%&#'
n9:;画像の一部を深度情報に置き換え
n処理の計算式
• 画像への処理
• マスクの決定
show the quality of the data distribution after data
gmentation in terms of diversity and affinity.
show that the depth estimation performance is im-
ved subjectively and objectively for a real image
ng the proposed data augmentation method.
ated work
ta augmentation
cal transformations and geometric transformations
conducted for data augmentation [8]. The for-
sformations include changing luminance and colors
the latter transformations include image flipping,
on, affine transformation, and random clipping.
e are methods of making changes optically and geo-
ly by replacing a partial area of the image with other
tion [1, 13, 14] (Figure 1). CutOut [1] and Random
.
Figure 2. Data augmentation using CutDepth
replacement region are obtained as
(l, u) = (a ⇥ W, b ⇥ H) (2)
(w, h) = (min((W a ⇥ W) ⇥ c ⇥ p, 1), (3)
nM!,)M",)L： 9:;画像，深度，マスク
n<O,)$=：マスクの左上の座標
n+,)P：マスクの高さ，幅
n",Q,R,G：マスクの大きさを決める乱数
nA：マスクの大きさの上限
𝑥!
" = M ∗ 𝑥" + 1 − 𝑀 ∗ 𝑥#
𝑙, 𝑢 = 𝑎 × 𝑊, 𝑏 × 𝐻
𝑤, ℎ = (min 𝑊 − 𝑎 × 𝑊 × 𝑐 × 𝑝, 1 ,
min( 𝐻 − 𝑏 × 𝐻 × 𝑑 × 𝑝, 1)))

実験設定
n深度推定の評価
• ;-S)BT@@D,)"'2(3456NF
• T"AO"R("%)?@A1+)
BSUV:D,)W->SC-4546F
nオプティマイザ
• KG"0,)KG"0X
n学習率：65#$
• Y&O/%&0("O)G@R"/)<5ZN倍,)5Z[倍=
n使用エンコーダ
• ?@%#@V@166,)9@#V@M1]656
• *0"I@V@1で事前学習
nベースラインのデータ拡張
• 回転，反転，色変換
nデータセット
• V!^)?@A1+)C4)BS+&1@%D,)X>>C4564F
• サイズ
• 学習：416×544
• 評価：480×540
• 枚数
• 学習：4_486
• 評価：[_

実験結果
n他のデータ拡張手法を上回る結果が得られた
nA<マスク領域の最大値=が5Z[,)5ZE[の時に良い性能が得られる傾向
Table 1. Comparison of the depth estimation performances when using different data augmentation methods. Lower Abs Rel, log10,
RMSE and RMSE log indicate higher performance whereas higher d1, d2 and d3 indicate higher performance. The best performances are
presented in bold text.
BTS Laplacian Depth
Method p Abs Rel # log10 # RMSE # RMSE log # d1 " d2 " d3 " Abs Rel # log10 # RMSE # RMSE log # d1 " d2 " d3 "
Baseline 0.1122 0.048 0.406 0.145 0.878 0.979 0.995 0.11 0.047 0.39 0.139 0.884 0.983 0.996
CutOut
0.25 0.1122 0.048 0.405 0.144 0.878 0.98 0.996 0.106 0.046 0.384 0.136 0.891 0.984 0.996
0.50 0.1118 0.048 0.402 0.144 0.879 0.981 0.996 0.109 0.046 0.382 0.137 0.889 0.983 0.997
0.75 0.1146 0.05 0.414 0.148 0.871 0.979 0.996 0.106 0.045 0.382 0.135 0.893 0.985 0.997
1.00 0.1194 0.051 0.427 0.152 0.864 0.977 0.996 0.11 0.047 0.394 0.14 0.884 0.984 0.997
Random
0.25 0.1106 0.048 0.4 0.143 0.88 0.981 0.996 0.109 0.046 0.384 0.137 0.89 0.982 0.996
Erasing
0.50 0.1116 0.048 0.4 0.143 0.881 0.981 0.996 0.106 0.045 0.378 0.134 0.892 0.985 0.997
0.75 0.1132 0.049 0.415 0.147 0.871 0.979 0.996 0.106 0.045 0.379 0.134 0.893 0.985 0.997
1.00 0.1186 0.051 0.429 0.152 0.863 0.977 0.996 0.111 0.047 0.394 0.14 0.884 0.983 0.997
CutMix
0.25 0.1105 0.047 0.397 0.142 0.882 0.981 0.996 0.107 0.046 0.388 0.137 0.889 0.983 0.996
0.50 0.1132 0.049 0.406 0.146 0.874 0.979 0.996 0.107 0.046 0.386 0.136 0.891 0.983 0.996
0.75 0.1231 0.054 0.438 0.158 0.848 0.976 0.996 0.107 0.046 0.386 0.136 0.891 0.983 0.996
1.00 0.1851 0.086 0.674 0.241 0.659 0.918 0.982 0.11 0.047 0.391 0.139 0.886 0.982 0.996
Proposed
0.25 0.1083 0.047 0.398 0.141 0.884 0.981 0.996 0.106 0.045 0.38 0.135 0.895 0.984 0.996
0.50 0.1077 0.046 0.391 0.14 0.884 0.982 0.997 0.104 0.044 0.375 0.132 0.899 0.985 0.997
0.75 0.1074 0.047 0.392 0.14 0.885 0.982 0.996 0.106 0.045 0.379 0.135 0.894 0.984 0.997
1.00 0.1127 0.047 0.392 0.142 0.88 0.981 0.996 0.104 0.045 0.376 0.132 0.898 0.985 0.996
Table 2. Comparison of the depth estimation performances when

結果の可視化
n推定画像
• 青：近距離
• 赤：遠距離
n結果
• 遠距離と輪郭の推定に優れることを確認
BTS
laplacian depth
(a) RGB image (b) Ground Truth (c) Baseline (d) CutMix (e) CutOut (f) RE (g) Proposed
Figure 3. Depth estimation results obtained using different data augmentation methods
laplacian depth
laplacian depth
!"#$%&'("$)*
+,-./0%. 1$)2.3)*

正則化の効果の検証
n潜在空間での9:;画像と深度の距離を比較
• 距離尺度
• 9LSH
• LKH
• コサイン距離
n結果
• 9LSH,)LKHは大きな変化なし
• 特徴マップのスケールが小さいため
差が分かりにくい
• コサイン距離が改善されていた
• スケールの正規化により差が明確に現れる
Table 3. Comparison of the distances between the RGB image and
depth in the latent space
p RMSE # MAE # Cosine "
Baseline 1.094 0.49 0.24
CutOut
0.25 1.12 0.50 0.21
0.50 1.16 0.52 0.17
0.75 1.20 0.52 0.17
1.00 1.39 0.61 0.15
Random
0.25 1.05 0.48 0.22
Erasing
0.50 1.09 0.49 0.20
0.75 1.13 0.50 0.17
1.00 1.17 0.52 0.17
CutMix
0.25 1.03 0.47 0.28
0.50 0.92 0.41 0.22
0.75 0.95 0.43 0.20
1.00 1.35 0.50 0.12
Proposed
0.25 0.92 0.42 0.37
0.50 1.06 0.48 0.37
0.75 0.96 0.44 0.35
1.00 1.07 0.48 0.33

データ拡張の特性の確認
n:&%1(`&]T&A@#らの研究 B"'2(34545F
• データ拡張を２つの尺度から測定
• ?(3@'#(1/
• データ分布の広がりの大きさ
• 大きいほどデータが多様
• Kaa(%(1/
• 元データからの乖離の小ささ
• 大きいほど元のデータ分布に近い
n測定結果
• 両方の尺度でベースラインを超える
• 従来手法より?(3@'#(1/が低い
• エッジ特徴の過度な変化が抑えられているため
Figure 4. Comparison of diversity and affinity between different
data augmentation methods
arXiv:2011.11778, 2020. 1
[4] Raphael Gontijo-Lopes, Sylvia J Smullin, Ekin D Cubuk,
[1
[1
[1
[1
G(3@'#(1/
"aa(%(1/

まとめ
n深度推定用のデータ拡張>$1?@A1+の提案
• 入力画像に深度画像の一部を貼り付けてバリエーションを増やす
nメリット
• 従来手法より優れた推定精度
• データ拡張前後でのエッジの特徴が類似
• 従来手法と比較してデータ分布を過度に拡張しない

実験結果
n元のデータのサイズを変えて実験を行う
• 結果
• 全てのデータサイズで最も良い性能
CutMix
0.25 0.1105 0.047 0.397 0.142 0.882 0.981 0.996 0.107 0.046 0.388 0.137 0.889 0.
0.50 0.1132 0.049 0.406 0.146 0.874 0.979 0.996 0.107 0.046 0.386 0.136 0.891 0.
0.75 0.1231 0.054 0.438 0.158 0.848 0.976 0.996 0.107 0.046 0.386 0.136 0.891 0.
1.00 0.1851 0.086 0.674 0.241 0.659 0.918 0.982 0.11 0.047 0.391 0.139 0.886 0.
Proposed
0.25 0.1083 0.047 0.398 0.141 0.884 0.981 0.996 0.106 0.045 0.38 0.135 0.895 0.
0.50 0.1077 0.046 0.391 0.14 0.884 0.982 0.997 0.104 0.044 0.375 0.132 0.899 0.
0.75 0.1074 0.047 0.392 0.14 0.885 0.982 0.996 0.106 0.045 0.379 0.135 0.894 0.
1.00 0.1127 0.047 0.392 0.142 0.88 0.981 0.996 0.104 0.045 0.376 0.132 0.898 0.
Table 2. Comparison of the depth estimation performances when
using different numbers of data (p = 0.75). Lower Abs Rel, log10,
RMSE and RMSE log indicate higher performance whereas higher
d1, d2 and d3 indicate higher performance.
Scale Method Abs Rel # log10 # RMSE # RMSE log # d1 " d2 " d3 "
25%
Baseline 0.1226 0.052 0.428 0.154 0.859 0.977 0.995
CutOut 0.1242 0.053 0.432 0.156 0.854 0.976 0.996
RE 0.1268 0.054 0.440 0.158 0.848 0.976 0.995
CutMix 0.1467 0.064 0.520 0.188 0.782 0.956 0.993
Proposed 0.1225 0.052 0.424 0.153 0.858 0.978 0.995
50%
Baseline 0.1174 0.050 0.414 0.150 0.867 0.978 0.995
CutOut 0.1168 0.050 0.418 0.150 0.867 0.979 0.996
RE 0.1184 0.051 0.422 0.151 0.862 0.978 0.996
CutMix 0.1307 0.056 0.460 0.168 0.832 0.970 0.994
Proposed 0.1155 0.049 0.411 0.148 0.870 0.981 0.996
75%
Baseline 0.1154 0.049 0.410 0.147 0.871 0.979 0.996
CutOut 0.1148 0.050 0.413 0.147 0.870 0.980 0.997
RE 0.1179 0.051 0.424 0.151 0.863 0.977 0.996
CutMix 0.1353 0.058 0.465 0.172 0.826 0.967 0.993
Proposed 0.1142 0.048 0.401 0.144 0.876 0.981 0.996
compare the distances in the latent space, which
put of the BTS encoder, when the RGB image
are input to the BTS model. The root-mean-s
(RMSE), mean absolute error (MAE), and cosi
are used as distance measures. Table 3 gives th
son results. In terms of the RMSE and MAE, th
of the proposed method and CutMix are compar
ever, the cosine distance is small for the propos
It is difficult to see the difference between the
MAE because of the small scale of the feature m
ever, the difference becomes clear for the cosin
where the scale is normalized.
4.4. Evaluation of the properties of data a
BTS
laplacian depth
(a) RGB image (b) Ground Truth (c) Baseline (d) CutMix (e) CutOut (f) RE (g) Proposed
Figure 3. Depth estimation results obtained using different data augmentation methods
BTS
laplacian depth
(a) RGB image (b) Ground Truth (c) Baseline (d) CutMix (e) CutOut
Figure 3. Depth estimation results obtained using different data augm
%&'
()*+),-)./01*"2

評価指標など
n-+'#+&OG
• 推定深度の精密性と外れ値の量
n @
𝑑
• 深度の推定値
n9LSH
• 平均平方二乗誤差
nLKH
• 平均絶対誤差
n>&#(%@)G(#1"%R@
• ベクトル同士の角度の近さを表現
• 6に近いほど似ている
BTS [Lee+, arXiv2019].

$()%*+(#,と-..(/(#,
n?(3@'#(1/とKaa(%(1/については下図右のイメージ
• ?(3@'#(1/が大きいほど分布が広い
• Kaa(%(1/が大きいほど元分布と近い
n提案手法
• 右図の右下
• 元分布からの距離が近いので
遠距離や輪郭に強い
n計算式
• Kaa(%(1/
• G(3@'#(1/
Gontijo-Lopesらの研究 [arXiv2020]

文献紹介：CutDepth: Edge-aware Data Augmentation in Depth Estimation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 文献紹介：CutDepth: Edge-aware Data Augmentation in Depth Estimation

Similar to 文献紹介：CutDepth: Edge-aware Data Augmentation in Depth Estimation (20)

More from Toru Tamaki

More from Toru Tamaki (20)

Recently uploaded

Recently uploaded (10)

文献紹介：CutDepth: Edge-aware Data Augmentation in Depth Estimation