文献紹介：Understanding How Image Quality Affects Deep Neural Networks

!"#$%&'("#)"* +,- ).(*$
/0(1)'23(44$5'&3#$$63"$0%(13
"$'-,%7&
!"#$%&'()*+%,'-./"'0"1"#,'2)3456789
大谷碧生（名工大玉木研）
英語論文紹介6768:87:6;

概要
n画像分類において
• 一般的には
• 高品質の画像で学習・テストを行う
• 問題点
• 全ての入力画像が高品質とは限らない
• 意図的な"*<%1="1."&'="#>&%='
?@))*A%&&)BC,'DE-F678GH
• 画像の取得，送信，保存時にアーチファクト
の発生
n本論文の内容
• 低品質画像の画像分類において，DNNにどのよう
な影響を与えるか検証
Neural Networks
Samuel Dodge and Lina Karam
Arizona State University
sfdodge@asu.edu, karam@asu.edu
Abstract—Image quality is an important practical challenge
that is often overlooked in the design of machine vision systems.
Commonly, machine vision systems are trained and tested on
high quality image datasets, yet in practical applications the
input images can not be assumed to be of high quality. Recently,
deep neural networks have obtained state-of-the-art performance
on many machine vision tasks. In this paper we provide an
evaluation of 4 state-of-the-art deep neural network models for
image classification under quality distortions. We consider five
types of quality distortions: blur, noise, contrast, JPEG, and
JPEG2000 compression. We show that the existing networks are
susceptible to these quality distortions, particularly to blur and
noise. These results enable future work in developing deep neural
networks that are more invariant to quality distortions.
I. INTRODUCTION
Visual quality evaluation has traditionally been focused
on the perception of quality from the perspective of human
subjects. However, with growing applications of computer
vision, it is also important to characterize the effect of image
quality on computer vision systems. These two notions of
image quality may not be directly comparable as the computer
may be fooled by images that are perceived by humans to be
identical [1], or in some cases the computer can recognize
images that are indistinguishable from noise by a human
observer [2]. Thus, it is important to separately consider how
image quality can affect computer vision applications.
In computer vision, recent techniques based on deep neural
Toy Poodle
Blur
Chow-chow
Noise
Persian Cat
JPEG
Persian Cat
Contrast
Persian Cat
Original Image

関連研究
n画質の歪みを考慮した顔認証データセット ?2-IJHの作成 ?0"1"#'K'LM$,'678GH
• G種類の歪みを付加
n3ND!Oを改良した/P3ND!Oの提案 ?Q"=$C,'4!RNN678GH
• ガウシアンノイズ，モーションブラー，コントラスト低下の付加
• *%%>'S%&.%A'/%TB)1U=の修正
n低解像度の画像の切り抜きに対する(NNの性能を検討 ?V&.#"/C,'WNR!6789H
• 3DFE=（認識可能な最小画像構成）領域に対して，ネットワークをテスト
• 人間の能力には及ばない

実験設定
nネットワーク
• E"AA%'F%A%1%/X%'?Y."C,'"15.<678ZH
• [@@PENNP!'?EM"TA.%&*C,'"15.<678ZH
• [@@P89'?!.#)/"/'K'L.==%1#"/,'DE-F678ZH
• @))+&%N%T ?!]%+%*C,'E[WF678GH
nデータセット
• D#"+%N%T'6786'?F$=="U)<=UC,'DYE[678GHから作成
• 8777個のカテゴリからそれぞれ87枚の画像を無作為に抽出
• 各画像に対して，様々なレベルの品質の歪みを持つ追加画像を生成
n精度指標
• トップ8分類精度
• トップG分類精度

実験設定
n破損種類とレベル
• YW4@'X)#>1%==.)/'?-.SYW4@H
• 圧縮をかけていない場合が877とした時の6から67まで6ずつ変化
• YW4@6777'X)#>1%==.)/'?^>%/YW4@H
• W!NFが67からZ7までの6ずつ変化
• Q&$1
• @"$==."/'U%1/%&の標準偏差を8から;まで8ずつ変化
• N).=%
• @"$==."/'/).=%の標準偏差を87から877まで87ずつ変化
• E)/T1"=T
• +1"画像とブレンド ?_"%S%1&.'K'[))1M.%=,'8;;ZH
• ブレンド係数?"&>M"Hを7から8まで7`8ずつ変化
)$T'a'"&>M"'× )1.+./"&'C'?8'b "&>M"H'× +1"'

結果
nYW4@
• 各ネットワークは耐性がある
• 性能が低下し始めるのは品質パラペータが87未満になってから
nYW4@6777
• 性能が低下し始めるのはW!NFがc7未満になってから
(a) Blur
VGG16
VGG-CNN-S
Caffe Reference
GoogleNet
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Gaussian Blur σ
(c) Contrast
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.0
0.2
0.4
0.6
0.8
1.0
Contrast Level
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
20
(b) Noise
0 10 20 30 40 50 60 70 80 90
0.0
0.2
0 10 20 30 40 50 60 70 80 90
Noise σ
20
20
40
60
80
100
0.0
40
60
80
100
20
25
30
35
40
0
20
40
60
80
100
2
4
6
8
10
12
14
16
18
(a) Blur
(d) JPEG (e) JPEG 2000
VGG16
VGG-CNN-S
Caffe Reference
GoogleNet
Top
1
Accuracy
(%)
Top
5
Accuracy
(%)
Top
1
Accuracy
(%)
Top
5
Accuracy
(%)
0 1 2 3 4 5 6 7 8 9
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 5 6 7 8 9
Gaussian Blur σ
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
20
40
60
80
100
(c) Contrast
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Contrast Level
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
20
40
60
80
100
(b) Noise
0 10 20 30 40 50 60 70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80 90
Noise σ
20
40
60
80
100
20
40
60
80
100
0.0
20
25
30
35
40
JPEG 2000 PSNR
0
20
40
60
80
100
20
25
30
35
40
0
20
40
60
80
100
2
4
6
8
10
12
14
16
18
20
JPEG Quality Level
0
20
40
60
80
100
2
4
6
8
10
12
14
16
18
20
0
20
40
60
80
100

結果
nQ&$1
• どのネットワークも性能の低下が早い
nN).=%
• E"AA%や[@@PENNP!に比べて，[@@P89や@))+&%N%Tは，性能の低下が緩やか
nE)/T1"=T
(a) Blur
Top
1
Accuracy
(%)
Top
5
Accuracy
(%)
0 1 2 3 4 5 6 7 8 9
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 5 6 7 8 9
Gaussian Blur σ
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
20
40
60
80
100
(c) Contrast
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Contrast Level
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
20
40
60
80
100
(b) Noise
0 10 20 30 40 50 60 70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70 80 90
Noise σ
20
40
60
80
100
20
40
60
80
100
0.0
(a) Blur
VGG16
VGG-CNN-S
Caffe Reference
GoogleNet
Top
1
Accuracy
(%)
Top
5
uracy
(%)
Top
5
Accuracy
(%)
0 1 2 3 4 5 6 7 8 9
0.0
0 1 2 3 4 5 6 7 8 9
Gaussian Blur σ
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
(c) Contrast
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.0
0.2
0.4
0.6
0.8
1.0
Contrast Level
0.0
0.2
0.4
0.6
0.8
1.0
20
40
60
80
100
20
(b) Noise
0 10 20 30 40 50 60 70 80 90
0.0
0.2
0 10 20 30 40 50 60 70 80 90
Noise σ
20
20
40
60
80
100
0.0
40
60
80
100
20
25
30
35
40
0
20
40
60
80
100
40
60
80
100
2
4
6
8
10
12
14
16
18
20
0
20
40
60
80
100

ソフトマックスユニットの出力結果
n品質が低下するにつれて，ネットワークは自信を失う
Blur
Caffe 0.991843 0.991679 0.578361 0.0305795
VGG-CNN-S 0.998868 0.950583 0.0315013 0.00160614
GoogleNet 0.999726 0.993224 0.344413 0.0950852
VGG16 0.997842 0.985675 0.917207 0.767373
Noise
Caffe 0.439129 0.496755 0.123831 0.00186453
VGG-CNN-S 0.354262 0.612398 0.444991 0.0499469
GoogleNet 0.546162 0.287545 0.130923 0.0513721
VGG16 0.406895 0.336332 0.48098 0.280146
Contrast
Caffe 0.729299 0.635093 0.374961 0.00598902
VGG-CNN-S 0.921437 0.88517 0.678366 0.0301079
GoogleNet 0.970436 0.96616 0.871698 0.349995
VGG16 0.834489 0.742968 0.551311 0.32043
Blur
Caffe 0.991843 0.991679 0.578361 0.0305795
VGG-CNN-S 0.998868 0.950583 0.0315013 0.00160614
GoogleNet 0.999726 0.993224 0.344413 0.0950852
VGG16 0.997842 0.985675 0.917207 0.767373
Noise
Caffe 0.439129 0.496755 0.123831 0.00186453
VGG-CNN-S 0.354262 0.612398 0.444991 0.0499469
GoogleNet 0.546162 0.287545 0.130923 0.0513721
VGG16 0.406895 0.336332 0.48098 0.280146
Contrast
Caffe 0.729299 0.635093 0.374961 0.00598902
VGG-CNN-S 0.921437 0.88517 0.678366 0.0301079
GoogleNet 0.970436 0.96616 0.871698 0.349995
VGG16 0.834489 0.742968 0.551311 0.32043
JPEG
Caffe 0.648897 0.628435 0.195371 0.00520851
VGG-CNN-S 0.661421 0.444696 0.226847 0.000301685
GoogleNet 0.527089 0.199268 0.11795 9.22498e-05
VGG16 0.817807 0.781653 0.728399 0.000846454
JPEG
2000
Caffe 0.994832 0.961075 0.891444 0.296613
VGG-CNN-S 0.999976 0.999287 0.988213 0.840988
GoogleNet 0.999978 0.997871 0.995659 0.961735
VGG16 0.999402 0.972143 0.862521 0.357437
Fig. 3: Example distorted images. For each image we also show the output of the soft-max unit from VGG16 for the correct
class. This output corresponds to the confidence the network has of the considered class. For all networks and for all distortions
低
低
高
高品質
品質
半分以下に低下半分以下に低下

!"#$と%&'()に対してなぜ敏感なのか
nQ&$1とN).=%が発生した時のネットワークからの
フィルター出力を調査
nQ&$1の影響
• 最初の畳み込み層でも僅かに変化
• 最後の層はオリジナルと比べて大きく変化
• 最初の層の小さな変化が伝播して，上の
層でより大きな変化を起こしたから
nN).=%の影響
• 最初の畳み込み層で大きく変化
• 最後の層はノイズは多くないが，オリジナルと
比べて大きく変化
Conv1-1
Conv5-3
(a) Original (b) Blur (c) Noise
Original Image
Conv1-1
Conv5-3
(a) Origin
Original Image
Fig. 4: Filter outputs from the VGG16 network on t
significantly affect the early filter responses, yet the
last layer of the original undistorted image. Noise giv
responses in the last layer.
low quality images should improve testing results
quality images, but perhaps this may also result in d
performance on high quality images. Furthermore,
with increased number of samples leads to longer
times. An investigation of the benefits of training w

まとめ
n低品質画像の画像分類において，DNNにどのような影響を与えるか検証
• ぼかし，ノイズの影響を受けやすい
• 圧縮，コントラストには耐性がある
• VGG-16は他のネットワークより優れた性能
• ネットワークの構造が深いから
n今後の課題
• 上記のような特性に左右されないネットワークの設計
• 低品質の画像でネットワークを学習
• 利点：低画質の画像のテスト結果は改善されるかもしれない
• 欠点：高画質の画像での性能が低下してしまうかもしれない
• 低品質の画像を用いて学習する際の利点について調査

予備スライド
n"*<%1="1."&'="#>&%=
• コンピュータが高い信頼度で間違えるように意図的に作られたノイズ
Published as a conference paper at ICLR 2015
+ .007 ⇥ =
x sign(rxJ(✓, x, y))
x +
✏sign(rxJ(✓, x, y))
“panda” “nematode” “gibbon”
57.7% confidence 8.2% confidence 99.3 % confidence
Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy
et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal to
the sign of the elements of the gradient of the cost function with respect to the input, we can change
(Goodfellow+, ICLR2015)

文献紹介：Understanding How Image Quality Affects Deep Neural Networks

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a 文献紹介：Understanding How Image Quality Affects Deep Neural Networks

Semelhante a 文献紹介：Understanding How Image Quality Affects Deep Neural Networks (20)

Mais de Toru Tamaki

Mais de Toru Tamaki (20)

Último

Último (9)

文献紹介：Understanding How Image Quality Affects Deep Neural Networks