5. VQA:問題定義
5
Question: Does it appear to be
rainy?
n VQA: 入力は画像と質問文,出力は質問文の回答
n VQA task は“画像理解 (Computer Vision)” と “自然言語理
解(Natural Language Processing)”の cross-modal task.
①自然言語理
解
②画像理解
Supporting-facts:
<wet ground, related to, rainy>
<blue sky, related to, sunny>
…
③知識の表
示
12. VQA:アーキテクチャー
12
- Multi-modal architecture, attention Mechanism
CNN
CNN/
LSTM
“There is a yellow ball behind
the red metal cylinder; what is its
material?”
Attention
Function
Feature
fusion
function
画像と言語
特徴を融合
Multi-class
Classification
Function
答え候補か
ら正解を選
ぶ
…
rubber
metal
yes
no
5
画像中どういった領域が重要かを
決める
15. CVPR2018 VQA動向分析
15
- VQA in CVPR2018: 論⽂リスト(total: 22)
1 Embodied Question Answering
2 Learning by Asking Questions
3 VizWiz Grand Challenge: Answering Visual Questions From Blind People
4 Textbook Question Answering Under Instructor Guidance With Memory Networks
5 IQA: Visual Question Answering in Interactive Environments
6 Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge
7 Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
8 Learning Answer Embeddings for Visual Question Answering
9 DVQA: Understanding Data Visualizations via Question Answering
10 Cross-Dataset Adaptation for Visual Question Answering
11 Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering
12 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
13 Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
14 Visual Question Generation as Dual Task of Visual Question Answering
15 Focal Visual-Text Attention for Visual Question Answering
16 Motion-Appearance Co-Memory Networks for Video Question Answering
17 Visual Question Answering With Memory-Augmented Networks
18 Visual Question Reasoning on General Dependency Tree
19 Differential Attention for Visual Question Answering
20 Learning Visual Knowledge Memory Networks for Visual Question Answering
21 IVQA: Inverse Visual Question Answering
22 Customized Image Narrative Generation via Interactive Visual Question Generation and Answering
32. 参考資料
32
• [1] Antol, Stanislaw, et al. "Vqa: Visual question answering."
Proceedings of the IEEE international conference on computer vision.
2015.
• [2] Goyal, Yash, et al. "Making the V in VQA matter: Elevating the role of
image understanding in Visual Question Answering." CVPR. Vol. 1. No. 2.
2017.
• [3] Johnson, Justin, et al. "CLEVR: A diagnostic dataset for compositional
language and elementary visual reasoning." Computer Vision and Pattern
Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017.
• [4] Das, Abhishek, et al. "Embodied question answering." Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
2018.
• [5] Wang, Peng, et al. "Fvqa: Fact-based visual question answering."
IEEE transactions on pattern analysis and machine intelligence (2017).
• [6] Kafle, Kushal, et al. "DVQA: Understanding data visualizations via
question answering." Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2018.
• [7] Li, Juzheng, et al. "Textbook Question Answering Under Instructor
Guidance With Memory Networks." Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. 2018.
33. 参考資料
33
• [8] Gordon, Daniel, et al. "IQA: Visual question answering in interactive
environments." Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition. 2018.
• [9] Misra, Ishan, et al. "Learning by Asking Questions." arXiv preprint
arXiv:1712.01238 (2017).
• [10] Anderson, Peter, et al. "Bottom-up and top-down attention for
image captioning and visual question answering." CVPR. Vol. 3. No. 5.
2018.
• [11] Nguyen, Duy-Kien, and Takayuki Okatani. "Improved Fusion of
Visual and Language Representations by Dense Symmetric Co-Attention
for Visual Question Answering.“
• [12] Li, Yikang, et al. "Visual question generation as dual task of visual
question answering." Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2018.
• [13] Liu, Feng, et al. "iVQA: Inverse visual question answering." arXiv
preprint arXiv:1710.03370 (2017).
• [14] Jayaraman, Dinesh, and Kristen Grauman. "Learning to look around:
Intelligently exploring unseen environments for unknown tasks."
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 2018.
34. 参考資料
34
• [15] Su, Zhou, et al. "Learning Visual Knowledge Memory Networks for
Visual Question Answering." Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. 2018.
• [16] Shin, Andrew, Yoshitaka Ushiku, and Tatsuya Harada. "Customized
Image Narrative Generation via Interactive Visual Question Generation
and Answering." Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition. 2018.
• [17] Gurari, Danna, et al. "VizWiz Grand Challenge: Answering Visual
Questions from Blind People." arXiv preprint arXiv:1802.08218 (2018).