Samuel Dodge, Lina Karam, Understanding How Image Quality Affects Deep Neural Networks, IEEE Xplore in the Proceedings of the Conference on the Quality of Multimedia Experience (QoMEX), June 6-8, 2016
https://ieeexplore.ieee.org/document/7498955
https://arxiv.org/abs/1604.04004
2. 概要
n画像分類において
• 一般的には
• 高品質の画像で学習・テストを行う
• 問題点
• 全ての入力画像が高品質とは限らない
• 意図的な"*<%1="1."&'="#>&%='
?@))*A%&&)BC,'DE-F678GH
• 画像の取得,送信,保存時にアーチファクト
の発生
n本論文の内容
• 低品質画像の画像分類において,DNNにどのよう
な影響を与えるか検証
Neural Networks
Samuel Dodge and Lina Karam
Arizona State University
sfdodge@asu.edu, karam@asu.edu
Abstract—Image quality is an important practical challenge
that is often overlooked in the design of machine vision systems.
Commonly, machine vision systems are trained and tested on
high quality image datasets, yet in practical applications the
input images can not be assumed to be of high quality. Recently,
deep neural networks have obtained state-of-the-art performance
on many machine vision tasks. In this paper we provide an
evaluation of 4 state-of-the-art deep neural network models for
image classification under quality distortions. We consider five
types of quality distortions: blur, noise, contrast, JPEG, and
JPEG2000 compression. We show that the existing networks are
susceptible to these quality distortions, particularly to blur and
noise. These results enable future work in developing deep neural
networks that are more invariant to quality distortions.
I. INTRODUCTION
Visual quality evaluation has traditionally been focused
on the perception of quality from the perspective of human
subjects. However, with growing applications of computer
vision, it is also important to characterize the effect of image
quality on computer vision systems. These two notions of
image quality may not be directly comparable as the computer
may be fooled by images that are perceived by humans to be
identical [1], or in some cases the computer can recognize
images that are indistinguishable from noise by a human
observer [2]. Thus, it is important to separately consider how
image quality can affect computer vision applications.
In computer vision, recent techniques based on deep neural
Toy Poodle
Blur
Chow-chow
Noise
Persian Cat
JPEG
Persian Cat
Contrast
Persian Cat
Original Image
8. ソフトマックスユニットの出力結果
n品質が低下するにつれて,ネットワークは自信を失う
Blur
Caffe 0.991843 0.991679 0.578361 0.0305795
VGG-CNN-S 0.998868 0.950583 0.0315013 0.00160614
GoogleNet 0.999726 0.993224 0.344413 0.0950852
VGG16 0.997842 0.985675 0.917207 0.767373
Noise
Caffe 0.439129 0.496755 0.123831 0.00186453
VGG-CNN-S 0.354262 0.612398 0.444991 0.0499469
GoogleNet 0.546162 0.287545 0.130923 0.0513721
VGG16 0.406895 0.336332 0.48098 0.280146
Contrast
Caffe 0.729299 0.635093 0.374961 0.00598902
VGG-CNN-S 0.921437 0.88517 0.678366 0.0301079
GoogleNet 0.970436 0.96616 0.871698 0.349995
VGG16 0.834489 0.742968 0.551311 0.32043
Blur
Caffe 0.991843 0.991679 0.578361 0.0305795
VGG-CNN-S 0.998868 0.950583 0.0315013 0.00160614
GoogleNet 0.999726 0.993224 0.344413 0.0950852
VGG16 0.997842 0.985675 0.917207 0.767373
Noise
Caffe 0.439129 0.496755 0.123831 0.00186453
VGG-CNN-S 0.354262 0.612398 0.444991 0.0499469
GoogleNet 0.546162 0.287545 0.130923 0.0513721
VGG16 0.406895 0.336332 0.48098 0.280146
Contrast
Caffe 0.729299 0.635093 0.374961 0.00598902
VGG-CNN-S 0.921437 0.88517 0.678366 0.0301079
GoogleNet 0.970436 0.96616 0.871698 0.349995
VGG16 0.834489 0.742968 0.551311 0.32043
JPEG
Caffe 0.648897 0.628435 0.195371 0.00520851
VGG-CNN-S 0.661421 0.444696 0.226847 0.000301685
GoogleNet 0.527089 0.199268 0.11795 9.22498e-05
VGG16 0.817807 0.781653 0.728399 0.000846454
JPEG
2000
Caffe 0.994832 0.961075 0.891444 0.296613
VGG-CNN-S 0.999976 0.999287 0.988213 0.840988
GoogleNet 0.999978 0.997871 0.995659 0.961735
VGG16 0.999402 0.972143 0.862521 0.357437
Fig. 3: Example distorted images. For each image we also show the output of the soft-max unit from VGG16 for the correct
class. This output corresponds to the confidence the network has of the considered class. For all networks and for all distortions
低
低
高
高 品質
品質
半分以下に低下 半分以下に低下
9. !"#$と%&'()に対してなぜ敏感なのか
nQ&$1とN).=%が発生した時のネットワークからの
フィルター出力を調査
nQ&$1の影響
• 最初の畳み込み層でも僅かに変化
• 最後の層はオリジナルと比べて大きく変化
• 最初の層の小さな変化が伝播して,上の
層でより大きな変化を起こしたから
nN).=%の影響
• 最初の畳み込み層で大きく変化
• 最後の層はノイズは多くないが,オリジナルと
比べて大きく変化
Conv1-1
Conv5-3
(a) Original (b) Blur (c) Noise
Original Image
Conv1-1
Conv5-3
(a) Origin
Original Image
Fig. 4: Filter outputs from the VGG16 network on t
significantly affect the early filter responses, yet the
last layer of the original undistorted image. Noise giv
responses in the last layer.
low quality images should improve testing results
quality images, but perhaps this may also result in d
performance on high quality images. Furthermore,
with increased number of samples leads to longer
times. An investigation of the benefits of training w
11. 予備スライド
n"*<%1="1."&'="#>&%=
• コンピュータが高い信頼度で間違えるように意図的に作られたノイズ
Published as a conference paper at ICLR 2015
+ .007 ⇥ =
x sign(rxJ(✓, x, y))
x +
✏sign(rxJ(✓, x, y))
“panda” “nematode” “gibbon”
57.7% confidence 8.2% confidence 99.3 % confidence
Figure 1: A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy
et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal to
the sign of the elements of the gradient of the cost function with respect to the input, we can change
(Goodfellow+, ICLR2015)