발표자: 서민준 (Washington Univ. 박사과정)
발표일: 2017.6.
“Reasoning” can be defined as an ability to derive a new formula (statement) through logical operations on several known formulas, which is a crucial requirement for human-level artificial intelligence.
In this talk, I will first argue that natural language, and in particular, question answering, is an effective means of teaching machines to reason and evaluating their performance. Then I will discuss my past approaches to these objectives with regard to three criteria:learning end-to-end, handling syntactic variations, and performing complex inference.
I will especially highlight my observation on the strength and the limitation of fully differentiable architectures, such as deep neural networks,for modeling the discrete nature of logic. I will finally conclude with the future direction of my research towards designing the ultimate reason in gmachine.
발표 영상:
http://tv.naver.com/v/2083653
https://youtu.be/r0veZ_WV0sA
4. Examples
• Most neural machine translation systems (Bahdanau et al. , 2014)
• Need very high hidden state size (~1000)
• No need to query the database (context) à very fast
• Most dependency, constituency parser (Chen et al., 2014; Klein et al.,
2003)
• Sentiment classification (Socher et al., 2013)
• Classifying whether a sentence is positive or negative
• Most neural image classification systems
23. Geometry QA
In the diagram at the
right, circle O has a
radius of 5, and CE =
2. Diameter AC is
perpendicular to chord
BD. What is the length
of BD?
a) 2 b) 4 c) 6
d) 8 e) 10
EB D
A
O
5
2
C
53. Attention Layer
Modeling Layer
Output Layer
Attention Flow
Layer
Phrase Embed
Layer
Word Embed
Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTMLSTM
Start End
h1 h2 hT
u1
u2
uJ
Softmax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT
u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2Query
Attention
Word
Embedding
GLOVE Char-CNN
Character
Embed Layer
Character
Embedding
g1 g2 gT
m1 m2 mT
54. Attention Layer
• Inputs: phrase-aware context and query words
• Outputs: query-aware representations of
context words
• Context-to-query attention: For each (phrase-
aware) context word, choose the most relevant
word from the (phrase-aware) query words
• Query-to-context attention: Choose the
context word that is most relevant to any of
query words.
Modeling Layer
Output Layer
Attention Flow
Layer
Phrase Embed
Layer
Word Embed
Layer
LSTM
LSTM
LSTMLSTM
Start End
h1 h2 hT
u1
u2
uJ
Softmax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT
u1 uJ
LSTM + SoftmaxDense + Softmax
Query2Context and Context2Query
Attention
WordCharacter Character
g1 g2 gT
m1 m2 mT
86. bAbI QA Dataset
• 20 different tasks
• 1k story-question pairs for each task (10k also available)
• Synthetically generated
• Many questions require looking at multiple sentences
• For end-to-end system supervised by answers only
90. Dialog Datasets
• bAbI Dialog Dataset
• Synthetic
• 5 different tasks
• 1k dialogs for each task
• DSTC2* Dataset
• Real dataset
• Evaluation metric is different from original DSTC2: response generation
instead of “state-tracking”
• Each dialog is 800+ utterances
• 2407 possible responses
98. So… What should we do?
• Disclaimer: completely subjective!
• Logic (reasoning) is discrete
• Modeling logic with differentiable model is hard
• Relaxation: either hard to optimize or converge to bad optimum (low
generalization error)
• Estimation: Low-bias or low-variance methods are proposed (Williams, 1992;
Jang et al., 2017), but improvements are not substantial.
• Big data: how much do we need? Exponentially many?
• Perhaps new paradigm is needed…