SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Jin-Hwa Kim
BI Lab, Seoul National University
Multimodal Residual
Learning for Visual
Question-Answering
!""!#$% #&"
% "!#$% #&" '
% "!#$% #&" (
BS06+#*,- #$% #&"
+#*,+$ ./% +0"1!, " ) 2.3&45%$ 6% 43 " #&" 7 37"#+$ $ 36%8 +$ 9%034*. " 7 :;%< 9& ł =8 6" &% 7 6#"#%5
" $ 7 .!4*"#%5 7 %0" + >! + ? %0 %< 3$ 6# #$% #&" 87 "##&" " )*! +#*, " 4$3@ ," "#%+ #$% #&" &"+$ 7 "##&" "
! 9%0! # 2 2 6% %# ' 33 8 7 %0)* 2A6',B = 8 45! 3@ "6% 2( " . ł+ > )*! +#*, " 7 7 2 CDEEFGEGH %07 7 7
! 9 36% 7 %0"#3 " " 7 7 : %0%< +0" 7 " 2 27 < & % " 7 7 : %0 7 $ " + > 7 2 FI-JGK %< . 6# : L 43 7 %0 7
N 36% " +
Table of Contents
1. VQA: Visual Question Answering
2. Vision Part
3. Question Part
4. Multimodal Residual Learning
5. Results
6. Discussion
7. Recent Works
8. Q&A
1.
VQA: Visual Question
Answering
1. VQA: Visual Question Answering
VQA is a new dataset containing open-ended questions about images, 

for understanding of vision, language and commonsense knowledge to
answer.
VQA Challenge
Antol et al., ICCV 2015
1. VQA: Visual Question Answering
Examples of Visual-Question, or Only-Question answers by Human
VQA Challenge
Antol et al., ICCV 2015
1. VQA: Visual Question Answering
# images: 204,721 (MS COCO)
# questions: 760K (3 per image)
# answers: 10M (10 per question + ɑ)
Numbers
Images Questions Answers
Training 80K 240K 2.4M
Validation 40K 120K 1.2M
Test 80K 240K
Antol et al., ICCV 2015
1. VQA: Visual Question Answering
80K test images / Four splits of 20K images each
Test-dev (development)
Debugging and Validation - 10/day submission to the evaluation server.
Test-standard (publications)
Used to score entries for the Public Leaderboard.
Test-challenge (competitions)
Used to rank challenge participants.
Test-reserve (check overfitting)
Used to estimate overfitting. Scores on this set are never released.
Test Dataset
Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015
2.
Vision Part
2. Vision Part
ResNet: A Thing among Convolutional Neural Networks
He et al., CVPR 2016
1st place on the ImageNet 2015 classification task
1st place on the ImageNet 2015 detection task
1st place on the ImageNet 2015 localization task
1st place on the COCO object detection task
1st place on the COCO object segmentation task
2. Vision Part
ResNet-152
He et al., CVPR 2016
Conv 1x1, 512
Conv 3x3, 512
Conv 1x1, 2048
AveragePooling
7x7x2048
1x1x2048
Linear
Output Size
Softmax
152-Layered Convolutional
Neural Networks
3x224x224 Input Size
2. Vision Part
ResNet-152
He et al., CVPR 2016
Conv 1x1, 512
Conv 3x3, 512
Conv 1x1, 2048
AveragePooling
7x7x2048
1x1x2048
Linear
Output Size
Softmax
As a visual feature extractor
3x224x224 Input Size
2. Vision Part
ResNet-152
He et al., CVPR 2016
Pre-trained models are available!
For Torch, TensorFlow, Theano, Caffe, Lasagne, Neon, and MatConvNet
https://github.com/KaimingHe/deep-residual-networks
3.
Question Part
3. Question Part
Word-Embedding
What color are her eyes?
what color are her eyes ?
preprocessing
indexing
53 7 44 127 2 6177
w53 | w7 | w44 | w127 | w2 | w6177
lookup
w1T
w2T
⋮
w7T
⋮
w44T
⋮
Lookup Table
{wi} are learning parameters
for back-propagation algorithm.
3. Question Part
Question-Embedding
RNNw53 (what)
h0
h1
RNNw7 (color)
h1
h2
RNN
w6177 (?)
h5
h6
Step 0
Step 1
Step 5 use this
3. Question Part
Choice of RNN: Gated Recurrent Units (GRU)
Cho et al., EMNLP 2014

Chung et al., arXiv 2014
hst-1
st-1
h
z = σ(xtUz + st-1Wz)
r = σ(xtUr + st-1Wr)
h = tanh(xtUh + (st-1⚬r)Wh)
st = (1 - z)⚬h + z⚬st-1
3. Question Part
Skip-Thought Vectors
Pre-trained model for word-embedding and question-embedding
Kiros et al., NIPS 2015
I got back home.
I could see the cat on the steps.
This was strange.
try to reconstruct the
previous sentence and 

next sentence
BookCorpus dataset 

(Zhu et al., ArXiv 2015)
3. Question Part
Skip-Thought Vectors
Pre-trained model for word-embedding and question-embedding
Its encoder as Sent2Vec model
Kiros et al., NIPS 2015
w1T
w2T
w3T
w4T
⋮
Lookup Table
Gated 

Recurrent 

Units
Pre-trained GRU
3. Question Part
Skip-Thought Vectors
Pre-trained model (Theano) and porting code (Torch) are available!
https://github.com/ryankiros/skip-thoughts
https://github.com/HyeonwooNoh/DPPnet/tree/master/
003_skipthoughts_porting
Noh et al., CVPR 2016
Kiros et al., NIPS 2015
4.
Multimodal Residual
Learning
4. Multimodal Residual Learning
Idea 1: Deep Residual Learning
Extend the idea of deep residual learning for multimodal learning
He et al., CVPR 2016
identity
weight layer
weight layer
relu
relu
F(x) + x
x
F(x)
x
Figure 2. Residual learning: a building block.
4. Multimodal Residual Learning
Idea 2: Hadamard product for Joint Residual Mapping
One modality is directly involved in the gradient with respect to the other
modality
https://github.com/VT-vision-lab/VQA_LSTM_CNN
vQ vI
tanh tanh
◉
softmax
∂σ (x)!σ (y)
∂x
= diag( ′σ (x)!σ (y))
Scaling Problem?
Wu et al., NIPS 2016
4. Multimodal Residual Learning
Multimodal Residual Networks
Kim et al., NIPS 2016
Q
V
ARNN
CNN
softmax
Multimodal Residual Networks
What kind of animals
are these ?
sheep
word
embedding
question
shortcuts
Hadamard products
word2vec
(Mikolov et al., 2013)
skip-thought vectors
(Kiros et al., 2015)
ResNet
(He et al., 2015)
4. Multimodal Residual Learning
Multimodal Residual Networks
Kim et al., NIPS 2016
A
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
Q V
H1
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
H2
V
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
H3
V
Linear
Softmax
⊙
⊕
⊙
⊕
⊙
⊕
Softmax
5.
Results
5. Results
Tanh
Linear
Linear
Tanh
Linear
Q V
Hl V
⊙
⊕
(a)
Linear
Tanh
Linear
Tanh
Linear
Tanh
Linear
Q V
Hl
V⊙
⊕
(c)
Linear
Tanh
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
Q V
Hl V
⊙
⊕
(b)
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
Q V
Hl
V
⊙
⊕
(e)
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
Q V
Hl V
⊙
⊕
(d)
if l=1
else
Identity
if l=1
Linear
else
none
Table 1: The results of alternative models
(a)-(e) on the test-dev.
Open-Ended
All Y/N Num. Other
(a) 60.17 81.83 38.32 46.61
(b) 60.53 82.53 38.34 46.78
(c) 60.19 81.91 37.87 46.70
(d) 59.69 81.67 37.23 46.00
(e) 60.20 81.98 38.25 46.57
Table 2: The e ect of the visual features and
# of target answers on the test-dev results.
Vgg for VGG-19, and Res for ResNet-152 fea-
tures described in Section 4.
Open-Ended
All Y/N Num. Other
Vgg, 1k 60.53 82.53 38.34 46.78
Vgg, 2k 60.79 82.13 38.87 47.52
Vgg, 3k 60.68 82.40 38.69 47.10
Res, 1k 61.45 82.36 38.40 48.81
Res, 2k 61.68 82.28 38.82 49.25
Res, 3k 61.47 82.28 39.09 48.76
5 Results
The VQA Challenge, which released the VQA dataset, provides evaluation servers for test-
dev and test-standard test splits. For the test-dev, the evaluation server permits unlimited
submissions for validation, while the test-standard permits limited submissions for the
competition. We report accuracies in percentage.
Alternative Models The test-dev results for the Open-Ended task on the of alternative
models are shown in Table 1. (a) shows a significant improvement over SAN. However, (b) is
marginally better than (a). As compared to (b), (c) deteriorates the performance. An extra
embedding for a question vector may easily cause overfitting leading the overall degradation.
And, identity shortcuts in (d) cause the degradation problem, too. Extra parameters of the
linear mappings may e ectively support to do the task.
(e) shows a reasonable performance, however, the extra shortcut is not essential. The
ive models
Other
46.61
46.78
46.70
46.00
46.57
Table 2: The e ect of the visual features and
# of target answers on the test-dev results.
Vgg for VGG-19, and Res for ResNet-152 fea-
tures described in Section 4.
Open-Ended
All Y/N Num. Other
Vgg, 1k 60.53 82.53 38.34 46.78
Vgg, 2k 60.79 82.13 38.87 47.52
Vgg, 3k 60.68 82.40 38.69 47.10
Res, 1k 61.45 82.36 38.40 48.81
Res, 2k 61.68 82.28 38.82 49.25
Res, 3k 61.47 82.28 39.09 48.76
A Appendix
A.1 VQA test-dev Results
Table 1: The effects of various options for VQA test-dev. Here, the model of Figure 3a is used, since
these experiments are preliminarily conducted. VGG-19 features and 1k target answers are used. s
stands for the usage of Skip-Thought Vectors [6] to initialize the question embedding model of GRU,
b stands for the usage of Bayesian Dropout [3], and c stands for the usage of postprocessing using
image captioning model [5].
Open-Ended Multiple-Choice
All Y/N Num. Other All Y/N Num. Other
baseline 58.97 81.11 37.63 44.90 63.53 81.13 38.91 54.06
s 59.38 80.65 38.30 45.98 63.71 80.68 39.73 54.65
s,b 59.74 81.75 38.13 45.84 64.15 81.77 39.54 54.67
s,b,c 59.91 81.75 38.13 46.19 64.18 81.77 39.51 54.72
Table 2: The results for VQA test-dev. The precision of some accuracies [11, 2, 10] are one less than
5. Results
Table 3: The effects of shortcut connections of MRN for VQA test-dev. ResNet-152 features and 2k
target answers are used. MN stands for Multimodal Networks without residual learning, which does
not have any shortcut connections. Dim. stands for common embedding vector’s dimension. The
number of parameters for word embedding (9.3M) and question embedding (21.8M) is subtracted
from the total number of parameters in this table.
Open-Ended
L Dim. #params All Y/N Num. Other
MN 1 4604 33.9M 60.33 82.50 36.04 46.89
MN 2 2350 33.9M 60.90 81.96 37.16 48.28
MN 3 1559 33.9M 59.87 80.55 37.53 47.25
MRN 1 3355 33.9M 60.09 81.78 37.09 46.78
MRN 2 1766 33.9M 61.05 81.81 38.43 48.43
MRN 3 1200 33.9M 61.68 82.28 38.82 49.25
MRN 4 851 33.9M 61.02 82.06 39.02 48.04
A.2 More Examples
5. Results
Table 3: The VQA test-standard results. The precision of some accuracies [30, 1] are one
less than others, so, zero-filled to match others.
Open-Ended Multiple-Choice
All Y/N Num. Other All Y/N Num. Other
DPPnet [21] 57.36 80.28 36.92 42.24 62.69 80.35 38.79 52.79
D-NMN [1] 58.00 - - - - - - -
Deep Q+I [11] 58.16 80.56 36.53 43.73 63.09 80.59 37.70 53.64
SAN [30] 58.90 - - - - - - -
ACK [27] 59.44 81.07 37.12 45.83 - - - -
FDA [9] 59.54 81.34 35.67 46.10 64.18 81.25 38.30 55.20
DMN+ [28] 60.36 80.43 36.82 48.33 - - - -
MRN 61.84 82.39 38.23 49.41 66.33 82.41 39.57 58.40
Human [2] 83.30 95.77 83.39 72.67 - - - -
5.1 Visualization
6.
Discussions
6. Discussions
A
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
Q V
H1
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
H2
V
Linear
Tanh
Linear
TanhLinear
Tanh
Linear
H3
V
Linear
Softmax
⊙
⊕
⊙
⊕
⊙
⊕
Softmax
pretrained
model
1st visualization
2nd visualization
3rd visualization
Visualization
6. Discussionsexamples examples
What kind of animals are these ? sheep What animal is the picture ? elephant
What is this animal ? zebra What game is this person playing ? tennis
How many cats are here ? 2 What color is the bird ? yellow
What sport is this ? surfing Is the horse jumping ? yes
(a) (b)
(c) (d)
(e) (f)
(g) (h)
6. Discussions
What color is the bird ? yellow
Is the horse jumping ? yes
(f)
(h)
Acknowledgments
This work was supported by Naver Corp.
and partly by the Korea government (IITP-R0126-16-1072-
SW.StarLab, KEIT-10044009-HRI.MESSI, KEIT-10060086-
RISF, ADD-UD130070ID-BMRR).
Recent Works
Recent Works
Low-rank Bilinear
Pooling
Low-rank Bilinear Pooling
Bilinear model
Low-rank Restriction
Kim et al., arXiv 2016
fi = wijk
k=1
M
∑
j=1
N
∑ xj yk + bi = xT
Wiy + bi
fi = xT
Wiy + bi = xT
UiVi
T
y + bi = 1T
(Ui
T
x!Vi
T
y)+ bi
f = PT
(UT
x!VT
y)+ b
N x D D x M
rank(Wi) <= min(N,M)
Low-rank Bilinear Pooling
Bilinear model
Low-rank Restriction
Kim et al., arXiv 2016
fi = wijk
k=1
M
∑
j=1
N
∑ xj yk + bi = xT
Wiy + bi
fi = xT
Wiy + bi = xT
UiVi
T
y + bi = 1T
(Ui
T
x!Vi
T
y)+ bi
f = PT
(UT
x!VT
y)+ b
vQ vI
◉
N x D D x M
rank(Wi) <= min(N,M)
Low-rank Bilinear Pooling
Kim et al., arXiv 2016
x1 x2 … xN
WT
∑wixi ∑wjxj … ∑wkxk
Low-rank Bilinear Pooling
Kim et al., arXiv 2016
x1 x2 … xN
Wx
T
∑wixi ∑wjxj … ∑wkxk
y1 y2 … yN
Wy
T
∑wlyl ∑wmym … ∑wnyn
∑∑wiwlxiyl ∑∑wjwmxjym … ∑∑wkwnxkyn
Hadamard Product (Element-wise Multiplication)
Recent Works
Multimodal Low-rank
Bilinear Attention
Networks (MLB)
MLB Attention Networks
Kim et al., arXiv 2016
A
Tanh
Conv
Tanh
Linear
Replicate
Q V
Softmax
Conv
Tanh
Linear
Tanh
LinearLinear
Softmax
MLB Attention Networks (MLB)
Kim et al., arXiv 2016
Under review as a conference paper at ICLR 2017
Table 2: The VQA test-standard results to compare with state-of-the-art. Notice that these results
are trained by provided VQA train and validation splits, without any data augmentation.
Open-Ended MC
MODEL ALL Y/N NUM ETC ALL
iBOWIMG (Zhou et al., 2015) 55.89 76.76 34.98 42.62 61.97
DPPnet (Noh et al., 2015) 57.36 80.28 36.92 42.24 62.69
Deeper LSTM+Normalized CNN (Lu et al., 2015) 58.16 80.56 36.53 43.73 63.09
SMem (Xu & Saenko, 2016) 58.24 80.80 37.53 43.48 -
Ask Your Neuron (Malinowski et al., 2016) 58.43 78.24 36.27 46.32 -
SAN (Yang et al., 2015) 58.85 79.11 36.41 46.42 -
D-NMN (Andreas et al., 2016) 59.44 80.98 37.48 45.81 -
ACK (Wu et al., 2016b) 59.44 81.07 37.12 45.83 -
FDA (Ilievski et al., 2016) 59.54 81.34 35.67 46.10 64.18
HYBRID (Kafle & Kanan, 2016b) 60.06 80.34 37.82 47.56 -
DMN+ (Xiong et al., 2016) 60.36 80.43 36.82 48.33 -
MRN (Kim et al., 2016b) 61.84 82.39 38.23 49.41 66.33
HieCoAtt (Lu et al., 2016) 62.06 79.95 38.22 51.95 66.07
RAU (Noh & Han, 2016) 63.2 81.7 38.2 52.8 67.3
MLB (ours) 65.07 84.02 37.90 54.77 68.89
The rate of the divided answers is approximately 16.40%, and only 0.23% of questions have more
than two divided answers in VQA dataset. We assume that it eases the difficulty of convergence
without severe degradation of performance.
MLB Attention Networks (MLB)
Kim et al., arXiv 2016
Open-Ended task of VQA. The major improvements are from yes-or-no (Y/N) and others (ETC)-
type answers. In Table 3, we also report the accuracy of our ensemble model to compare with other
ensemble models on VQA test-standard, which won 1st to 5th places in VQA Challenge 20162
. We
beat the previous state-of-the-art with a margin of 0.42%.
Table 3: The VQA test-standard results for ensemble models to compare with state-of-the-art. For
unpublished entries, their team names are used instead of their model names. Some of their figures
are updated after the challenge.
Open-Ended MC
MODEL ALL Y/N NUM ETC ALL
RAU (Noh & Han, 2016) 64.12 83.33 38.02 53.37 67.34
MRN (Kim et al., 2016b) 63.18 83.16 39.14 51.33 67.54
DLAIT (not published) 64.83 83.23 40.80 54.32 68.30
Naver Labs (not published) 64.79 83.31 38.70 54.79 69.26
MCB (Fukui et al., 2016) 66.47 83.24 39.47 58.00 70.10
MLB (ours) 66.89 84.61 39.07 57.79 70.29
Human (Antol et al., 2015) 83.30 95.77 83.39 72.67 91.54
7 RELATED WORKS
7.1 COMPACT BILINEAR POOLING
Recent Works
DEMO
DEMO
Q: 아니 이게 뭐야?
A: 냉장고 입니다.
DEMO
Q&A
Thank You

Mais conteúdo relacionado

Mais procurados

H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLSri Ambati
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務台灣資料科學年會
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical ImagingSanghoon Hong
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...NAVER Engineering
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)Hansol Kang
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksMark Chang
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareGlobalLogic Ukraine
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsMohid Nabil
 
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법NAVER Engineering
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Generative Adversarial Networks 2
Generative Adversarial Networks 2Generative Adversarial Networks 2
Generative Adversarial Networks 2Alireza Shafaei
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 

Mais procurados (20)

H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical Imaging
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in Healthcare
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
 
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
기계학습을 이용하여 정적 분석기의 안전성을 선별적으로 조절하는 방법
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Generative Adversarial Networks 2
Generative Adversarial Networks 2Generative Adversarial Networks 2
Generative Adversarial Networks 2
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 

Destaque

강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introductionTaehoon Kim
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향홍배 김
 
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017Taehoon Kim
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016Taehoon Kim
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017Taehoon Kim
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016Taehoon Kim
 

Destaque (7)

강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
 
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
지적 대화를 위한 깊고 넓은 딥러닝 PyCon APAC 2016
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
 

Semelhante a Multimodal Residual Learning for Visual Question-Answering

論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation LearningToru Tamaki
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfTulasiramKandula1
 
White Box testing by Pankaj Thakur, NITTTR Chandigarh
White Box testing by Pankaj Thakur, NITTTR ChandigarhWhite Box testing by Pankaj Thakur, NITTTR Chandigarh
White Box testing by Pankaj Thakur, NITTTR ChandigarhPankaj Thakur
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Human_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_ModelHuman_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_ModelDavid Ritchie
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Sangmin Park
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationAsankhaya Sharma
 
Image Classification
Image ClassificationImage Classification
Image ClassificationAnwar Jameel
 
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...University of Saskatchewan
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...UOC Universitat Oberta de Catalunya
 

Semelhante a Multimodal Residual Learning for Visual Question-Answering (20)

論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
論文紹介:Masked Vision and Language Modeling for Multi-modal Representation Learning
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
White Box testing by Pankaj Thakur, NITTTR Chandigarh
White Box testing by Pankaj Thakur, NITTTR ChandigarhWhite Box testing by Pankaj Thakur, NITTTR Chandigarh
White Box testing by Pankaj Thakur, NITTTR Chandigarh
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Sequential Query Expansion using Concept Graph
Sequential Query Expansion using Concept GraphSequential Query Expansion using Concept Graph
Sequential Query Expansion using Concept Graph
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Human_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_ModelHuman_Activity_Recognition_Predictive_Model
Human_Activity_Recognition_Predictive_Model
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
 
Gate-Cs 1993
Gate-Cs 1993Gate-Cs 1993
Gate-Cs 1993
 
Sbst2018 contest2018
Sbst2018 contest2018Sbst2018 contest2018
Sbst2018 contest2018
 
MNIST 10-class Classifiers
MNIST 10-class ClassifiersMNIST 10-class Classifiers
MNIST 10-class Classifiers
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated Verification
 
Image Classification
Image ClassificationImage Classification
Image Classification
 
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...
Investigating the Quality Aspects of Crowd-Sourced Developer Forum: A Case St...
 
20130523 05 - Cyclomatic complexity
20130523 05 - Cyclomatic complexity20130523 05 - Cyclomatic complexity
20130523 05 - Cyclomatic complexity
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
 

Mais de NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual SearchNAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?NAVER D2
 

Mais de NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Multimodal Residual Learning for Visual Question-Answering

  • 1. Jin-Hwa Kim BI Lab, Seoul National University Multimodal Residual Learning for Visual Question-Answering !""!#$% #&" % "!#$% #&" ' % "!#$% #&" ( BS06+#*,- #$% #&" +#*,+$ ./% +0"1!, " ) 2.3&45%$ 6% 43 " #&" 7 37"#+$ $ 36%8 +$ 9%034*. " 7 :;%< 9& ł =8 6" &% 7 6#"#%5 " $ 7 .!4*"#%5 7 %0" + >! + ? %0 %< 3$ 6# #$% #&" 87 "##&" " )*! +#*, " 4$3@ ," "#%+ #$% #&" &"+$ 7 "##&" " ! 9%0! # 2 2 6% %# ' 33 8 7 %0)* 2A6',B = 8 45! 3@ "6% 2( " . ł+ > )*! +#*, " 7 7 2 CDEEFGEGH %07 7 7 ! 9 36% 7 %0"#3 " " 7 7 : %0%< +0" 7 " 2 27 < & % " 7 7 : %0 7 $ " + > 7 2 FI-JGK %< . 6# : L 43 7 %0 7 N 36% " +
  • 2. Table of Contents 1. VQA: Visual Question Answering 2. Vision Part 3. Question Part 4. Multimodal Residual Learning 5. Results 6. Discussion 7. Recent Works 8. Q&A
  • 4. 1. VQA: Visual Question Answering VQA is a new dataset containing open-ended questions about images, 
 for understanding of vision, language and commonsense knowledge to answer. VQA Challenge Antol et al., ICCV 2015
  • 5. 1. VQA: Visual Question Answering Examples of Visual-Question, or Only-Question answers by Human VQA Challenge Antol et al., ICCV 2015
  • 6. 1. VQA: Visual Question Answering # images: 204,721 (MS COCO) # questions: 760K (3 per image) # answers: 10M (10 per question + ɑ) Numbers Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Test 80K 240K Antol et al., ICCV 2015
  • 7. 1. VQA: Visual Question Answering 80K test images / Four splits of 20K images each Test-dev (development) Debugging and Validation - 10/day submission to the evaluation server. Test-standard (publications) Used to score entries for the Public Leaderboard. Test-challenge (competitions) Used to rank challenge participants. Test-reserve (check overfitting) Used to estimate overfitting. Scores on this set are never released. Test Dataset Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015
  • 9. 2. Vision Part ResNet: A Thing among Convolutional Neural Networks He et al., CVPR 2016 1st place on the ImageNet 2015 classification task 1st place on the ImageNet 2015 detection task 1st place on the ImageNet 2015 localization task 1st place on the COCO object detection task 1st place on the COCO object segmentation task
  • 10. 2. Vision Part ResNet-152 He et al., CVPR 2016 Conv 1x1, 512 Conv 3x3, 512 Conv 1x1, 2048 AveragePooling 7x7x2048 1x1x2048 Linear Output Size Softmax 152-Layered Convolutional Neural Networks 3x224x224 Input Size
  • 11. 2. Vision Part ResNet-152 He et al., CVPR 2016 Conv 1x1, 512 Conv 3x3, 512 Conv 1x1, 2048 AveragePooling 7x7x2048 1x1x2048 Linear Output Size Softmax As a visual feature extractor 3x224x224 Input Size
  • 12. 2. Vision Part ResNet-152 He et al., CVPR 2016 Pre-trained models are available! For Torch, TensorFlow, Theano, Caffe, Lasagne, Neon, and MatConvNet https://github.com/KaimingHe/deep-residual-networks
  • 14. 3. Question Part Word-Embedding What color are her eyes? what color are her eyes ? preprocessing indexing 53 7 44 127 2 6177 w53 | w7 | w44 | w127 | w2 | w6177 lookup w1T w2T ⋮ w7T ⋮ w44T ⋮ Lookup Table {wi} are learning parameters for back-propagation algorithm.
  • 15. 3. Question Part Question-Embedding RNNw53 (what) h0 h1 RNNw7 (color) h1 h2 RNN w6177 (?) h5 h6 Step 0 Step 1 Step 5 use this
  • 16. 3. Question Part Choice of RNN: Gated Recurrent Units (GRU) Cho et al., EMNLP 2014
 Chung et al., arXiv 2014 hst-1 st-1 h z = σ(xtUz + st-1Wz) r = σ(xtUr + st-1Wr) h = tanh(xtUh + (st-1⚬r)Wh) st = (1 - z)⚬h + z⚬st-1
  • 17. 3. Question Part Skip-Thought Vectors Pre-trained model for word-embedding and question-embedding Kiros et al., NIPS 2015 I got back home. I could see the cat on the steps. This was strange. try to reconstruct the previous sentence and 
 next sentence BookCorpus dataset 
 (Zhu et al., ArXiv 2015)
  • 18. 3. Question Part Skip-Thought Vectors Pre-trained model for word-embedding and question-embedding Its encoder as Sent2Vec model Kiros et al., NIPS 2015 w1T w2T w3T w4T ⋮ Lookup Table Gated 
 Recurrent 
 Units Pre-trained GRU
  • 19. 3. Question Part Skip-Thought Vectors Pre-trained model (Theano) and porting code (Torch) are available! https://github.com/ryankiros/skip-thoughts https://github.com/HyeonwooNoh/DPPnet/tree/master/ 003_skipthoughts_porting Noh et al., CVPR 2016 Kiros et al., NIPS 2015
  • 21. 4. Multimodal Residual Learning Idea 1: Deep Residual Learning Extend the idea of deep residual learning for multimodal learning He et al., CVPR 2016 identity weight layer weight layer relu relu F(x) + x x F(x) x Figure 2. Residual learning: a building block.
  • 22. 4. Multimodal Residual Learning Idea 2: Hadamard product for Joint Residual Mapping One modality is directly involved in the gradient with respect to the other modality https://github.com/VT-vision-lab/VQA_LSTM_CNN vQ vI tanh tanh ◉ softmax ∂σ (x)!σ (y) ∂x = diag( ′σ (x)!σ (y)) Scaling Problem? Wu et al., NIPS 2016
  • 23. 4. Multimodal Residual Learning Multimodal Residual Networks Kim et al., NIPS 2016 Q V ARNN CNN softmax Multimodal Residual Networks What kind of animals are these ? sheep word embedding question shortcuts Hadamard products word2vec (Mikolov et al., 2013) skip-thought vectors (Kiros et al., 2015) ResNet (He et al., 2015)
  • 24. 4. Multimodal Residual Learning Multimodal Residual Networks Kim et al., NIPS 2016 A Linear Tanh Linear TanhLinear Tanh Linear Q V H1 Linear Tanh Linear TanhLinear Tanh Linear H2 V Linear Tanh Linear TanhLinear Tanh Linear H3 V Linear Softmax ⊙ ⊕ ⊙ ⊕ ⊙ ⊕ Softmax
  • 26. 5. Results Tanh Linear Linear Tanh Linear Q V Hl V ⊙ ⊕ (a) Linear Tanh Linear Tanh Linear Tanh Linear Q V Hl V⊙ ⊕ (c) Linear Tanh Linear Tanh Linear TanhLinear Tanh Linear Q V Hl V ⊙ ⊕ (b) Linear Tanh Linear TanhLinear Tanh Linear Q V Hl V ⊙ ⊕ (e) Linear Tanh Linear TanhLinear Tanh Linear Q V Hl V ⊙ ⊕ (d) if l=1 else Identity if l=1 Linear else none Table 1: The results of alternative models (a)-(e) on the test-dev. Open-Ended All Y/N Num. Other (a) 60.17 81.83 38.32 46.61 (b) 60.53 82.53 38.34 46.78 (c) 60.19 81.91 37.87 46.70 (d) 59.69 81.67 37.23 46.00 (e) 60.20 81.98 38.25 46.57 Table 2: The e ect of the visual features and # of target answers on the test-dev results. Vgg for VGG-19, and Res for ResNet-152 fea- tures described in Section 4. Open-Ended All Y/N Num. Other Vgg, 1k 60.53 82.53 38.34 46.78 Vgg, 2k 60.79 82.13 38.87 47.52 Vgg, 3k 60.68 82.40 38.69 47.10 Res, 1k 61.45 82.36 38.40 48.81 Res, 2k 61.68 82.28 38.82 49.25 Res, 3k 61.47 82.28 39.09 48.76 5 Results The VQA Challenge, which released the VQA dataset, provides evaluation servers for test- dev and test-standard test splits. For the test-dev, the evaluation server permits unlimited submissions for validation, while the test-standard permits limited submissions for the competition. We report accuracies in percentage. Alternative Models The test-dev results for the Open-Ended task on the of alternative models are shown in Table 1. (a) shows a significant improvement over SAN. However, (b) is marginally better than (a). As compared to (b), (c) deteriorates the performance. An extra embedding for a question vector may easily cause overfitting leading the overall degradation. And, identity shortcuts in (d) cause the degradation problem, too. Extra parameters of the linear mappings may e ectively support to do the task. (e) shows a reasonable performance, however, the extra shortcut is not essential. The ive models Other 46.61 46.78 46.70 46.00 46.57 Table 2: The e ect of the visual features and # of target answers on the test-dev results. Vgg for VGG-19, and Res for ResNet-152 fea- tures described in Section 4. Open-Ended All Y/N Num. Other Vgg, 1k 60.53 82.53 38.34 46.78 Vgg, 2k 60.79 82.13 38.87 47.52 Vgg, 3k 60.68 82.40 38.69 47.10 Res, 1k 61.45 82.36 38.40 48.81 Res, 2k 61.68 82.28 38.82 49.25 Res, 3k 61.47 82.28 39.09 48.76 A Appendix A.1 VQA test-dev Results Table 1: The effects of various options for VQA test-dev. Here, the model of Figure 3a is used, since these experiments are preliminarily conducted. VGG-19 features and 1k target answers are used. s stands for the usage of Skip-Thought Vectors [6] to initialize the question embedding model of GRU, b stands for the usage of Bayesian Dropout [3], and c stands for the usage of postprocessing using image captioning model [5]. Open-Ended Multiple-Choice All Y/N Num. Other All Y/N Num. Other baseline 58.97 81.11 37.63 44.90 63.53 81.13 38.91 54.06 s 59.38 80.65 38.30 45.98 63.71 80.68 39.73 54.65 s,b 59.74 81.75 38.13 45.84 64.15 81.77 39.54 54.67 s,b,c 59.91 81.75 38.13 46.19 64.18 81.77 39.51 54.72 Table 2: The results for VQA test-dev. The precision of some accuracies [11, 2, 10] are one less than
  • 27. 5. Results Table 3: The effects of shortcut connections of MRN for VQA test-dev. ResNet-152 features and 2k target answers are used. MN stands for Multimodal Networks without residual learning, which does not have any shortcut connections. Dim. stands for common embedding vector’s dimension. The number of parameters for word embedding (9.3M) and question embedding (21.8M) is subtracted from the total number of parameters in this table. Open-Ended L Dim. #params All Y/N Num. Other MN 1 4604 33.9M 60.33 82.50 36.04 46.89 MN 2 2350 33.9M 60.90 81.96 37.16 48.28 MN 3 1559 33.9M 59.87 80.55 37.53 47.25 MRN 1 3355 33.9M 60.09 81.78 37.09 46.78 MRN 2 1766 33.9M 61.05 81.81 38.43 48.43 MRN 3 1200 33.9M 61.68 82.28 38.82 49.25 MRN 4 851 33.9M 61.02 82.06 39.02 48.04 A.2 More Examples
  • 28. 5. Results Table 3: The VQA test-standard results. The precision of some accuracies [30, 1] are one less than others, so, zero-filled to match others. Open-Ended Multiple-Choice All Y/N Num. Other All Y/N Num. Other DPPnet [21] 57.36 80.28 36.92 42.24 62.69 80.35 38.79 52.79 D-NMN [1] 58.00 - - - - - - - Deep Q+I [11] 58.16 80.56 36.53 43.73 63.09 80.59 37.70 53.64 SAN [30] 58.90 - - - - - - - ACK [27] 59.44 81.07 37.12 45.83 - - - - FDA [9] 59.54 81.34 35.67 46.10 64.18 81.25 38.30 55.20 DMN+ [28] 60.36 80.43 36.82 48.33 - - - - MRN 61.84 82.39 38.23 49.41 66.33 82.41 39.57 58.40 Human [2] 83.30 95.77 83.39 72.67 - - - - 5.1 Visualization
  • 31. 6. Discussionsexamples examples What kind of animals are these ? sheep What animal is the picture ? elephant What is this animal ? zebra What game is this person playing ? tennis How many cats are here ? 2 What color is the bird ? yellow What sport is this ? surfing Is the horse jumping ? yes (a) (b) (c) (d) (e) (f) (g) (h)
  • 32. 6. Discussions What color is the bird ? yellow Is the horse jumping ? yes (f) (h)
  • 33. Acknowledgments This work was supported by Naver Corp. and partly by the Korea government (IITP-R0126-16-1072- SW.StarLab, KEIT-10044009-HRI.MESSI, KEIT-10060086- RISF, ADD-UD130070ID-BMRR).
  • 36. Low-rank Bilinear Pooling Bilinear model Low-rank Restriction Kim et al., arXiv 2016 fi = wijk k=1 M ∑ j=1 N ∑ xj yk + bi = xT Wiy + bi fi = xT Wiy + bi = xT UiVi T y + bi = 1T (Ui T x!Vi T y)+ bi f = PT (UT x!VT y)+ b N x D D x M rank(Wi) <= min(N,M)
  • 37. Low-rank Bilinear Pooling Bilinear model Low-rank Restriction Kim et al., arXiv 2016 fi = wijk k=1 M ∑ j=1 N ∑ xj yk + bi = xT Wiy + bi fi = xT Wiy + bi = xT UiVi T y + bi = 1T (Ui T x!Vi T y)+ bi f = PT (UT x!VT y)+ b vQ vI ◉ N x D D x M rank(Wi) <= min(N,M)
  • 38. Low-rank Bilinear Pooling Kim et al., arXiv 2016 x1 x2 … xN WT ∑wixi ∑wjxj … ∑wkxk
  • 39. Low-rank Bilinear Pooling Kim et al., arXiv 2016 x1 x2 … xN Wx T ∑wixi ∑wjxj … ∑wkxk y1 y2 … yN Wy T ∑wlyl ∑wmym … ∑wnyn ∑∑wiwlxiyl ∑∑wjwmxjym … ∑∑wkwnxkyn Hadamard Product (Element-wise Multiplication)
  • 40. Recent Works Multimodal Low-rank Bilinear Attention Networks (MLB)
  • 41. MLB Attention Networks Kim et al., arXiv 2016 A Tanh Conv Tanh Linear Replicate Q V Softmax Conv Tanh Linear Tanh LinearLinear Softmax
  • 42. MLB Attention Networks (MLB) Kim et al., arXiv 2016 Under review as a conference paper at ICLR 2017 Table 2: The VQA test-standard results to compare with state-of-the-art. Notice that these results are trained by provided VQA train and validation splits, without any data augmentation. Open-Ended MC MODEL ALL Y/N NUM ETC ALL iBOWIMG (Zhou et al., 2015) 55.89 76.76 34.98 42.62 61.97 DPPnet (Noh et al., 2015) 57.36 80.28 36.92 42.24 62.69 Deeper LSTM+Normalized CNN (Lu et al., 2015) 58.16 80.56 36.53 43.73 63.09 SMem (Xu & Saenko, 2016) 58.24 80.80 37.53 43.48 - Ask Your Neuron (Malinowski et al., 2016) 58.43 78.24 36.27 46.32 - SAN (Yang et al., 2015) 58.85 79.11 36.41 46.42 - D-NMN (Andreas et al., 2016) 59.44 80.98 37.48 45.81 - ACK (Wu et al., 2016b) 59.44 81.07 37.12 45.83 - FDA (Ilievski et al., 2016) 59.54 81.34 35.67 46.10 64.18 HYBRID (Kafle & Kanan, 2016b) 60.06 80.34 37.82 47.56 - DMN+ (Xiong et al., 2016) 60.36 80.43 36.82 48.33 - MRN (Kim et al., 2016b) 61.84 82.39 38.23 49.41 66.33 HieCoAtt (Lu et al., 2016) 62.06 79.95 38.22 51.95 66.07 RAU (Noh & Han, 2016) 63.2 81.7 38.2 52.8 67.3 MLB (ours) 65.07 84.02 37.90 54.77 68.89 The rate of the divided answers is approximately 16.40%, and only 0.23% of questions have more than two divided answers in VQA dataset. We assume that it eases the difficulty of convergence without severe degradation of performance.
  • 43. MLB Attention Networks (MLB) Kim et al., arXiv 2016 Open-Ended task of VQA. The major improvements are from yes-or-no (Y/N) and others (ETC)- type answers. In Table 3, we also report the accuracy of our ensemble model to compare with other ensemble models on VQA test-standard, which won 1st to 5th places in VQA Challenge 20162 . We beat the previous state-of-the-art with a margin of 0.42%. Table 3: The VQA test-standard results for ensemble models to compare with state-of-the-art. For unpublished entries, their team names are used instead of their model names. Some of their figures are updated after the challenge. Open-Ended MC MODEL ALL Y/N NUM ETC ALL RAU (Noh & Han, 2016) 64.12 83.33 38.02 53.37 67.34 MRN (Kim et al., 2016b) 63.18 83.16 39.14 51.33 67.54 DLAIT (not published) 64.83 83.23 40.80 54.32 68.30 Naver Labs (not published) 64.79 83.31 38.70 54.79 69.26 MCB (Fukui et al., 2016) 66.47 83.24 39.47 58.00 70.10 MLB (ours) 66.89 84.61 39.07 57.79 70.29 Human (Antol et al., 2015) 83.30 95.77 83.39 72.67 91.54 7 RELATED WORKS 7.1 COMPACT BILINEAR POOLING
  • 45. DEMO Q: 아니 이게 뭐야? A: 냉장고 입니다.
  • 46. DEMO
  • 47. Q&A