3. ●ML&DL Engineer (2014 ~ 2017)
○ POSCO Smart Factory Machine Learning Based Scheduling (2014~2015)
○ POSCO AI ChatBot (2016 ~ 2017)
○ Deep Learning Open Source Framework - TensorMSA (2016~2017)
●Android Developer - POSCO Mobile system (2010 ~ 2014)
○ LBS, IPS Vehicle & Navigation System
○ IPS with Deep Learning - Patent (2016)
●Awards
○ OSS world Challenge 2017 (on top 12 , on progress now)
○ Employee of the year 2015, 2017 on POSCO ICT
●Woori Bank AI (‘17.11.1 ~)
Session 1 : SeungWoo Kim tmddno1@gmail.com
5. Session 1 - 강의 목표
전체 ChatBot 아키텍처를 이해하고 서비스를 구성하기 위해 필요한 기반
지식에 대한 설명을 통해 Session 2 에서 실질적인 챗봇 개발에 대한 설명을
더 잘 이해 할 수 있도록 돕고자 함 .
챗봇 , 자연어 처리, 딥러닝 그리고 구현의 연관성을 이해하는 것에 중점 !
Session 1 - Understand NLP
6. About ChatBot
Session 1 - Understand NLP
Natural Language
Understanding
Natural Language
Generation
User System
자연어
Semantic Frame자연어
Semantic Frame
Why we need nlp on ChatBot system?
7. About ChatBot
Session 1 - Understand NLP
Sort of Chatbot
Easy Hard
Retrieval-based model Generative model
Traditional algorithms Deep Learning algorithms
Short Conversation Long Conversation
Closed Domain Open Domain
8. About ChatBot
Session 1 - Understand NLP
Retrieval-Based vs Generative Models
Retrieval-based models (easier)
use a repository of predefined responses and some kind of heuristic to pick an
appropriate response based on the input and context. The heuristic could be as
simple as a rule-based expression match, or as complex as an ensemble of Machine
Learning classifiers. These systems don’t generate any new text, they just pick a
response from a fixed set.
Generative models (harder)
don’t rely on pre-defined responses. They generate new responses from scratch.
Generative models are typically based on Machine Translation techniques, but
instead of translating from one language to another, we “translate” from an input to
an output (response).
9. About ChatBot
Session 1 - Understand NLP
Use Deep Learning or Not
Using Deep Learning
Using Deep Learning do not guarantee better performance
all the time to compared with using traditional techniques.
It’s more expensive to gather enough data and train heavy model.
Using traditional algorithms
Most of current chatbot systems are based on those traditional algorithms
and It has own strong points to compared with DL algorithms.
형태소 분석
품사 태깅
패턴 매칭
구문 분석
의미 분석
감성 분석
대화 처리
CharCNN
BiLSTMCrf
Seq2Seq
Word2Vec
RNN
DMN
E2E MMN
Attention
DNN
TFIDF
SVM
Dictionary
Bayesian
Logistic
LSA
HMM
USE
BOTH
10. About ChatBot
Session 1 - Understand NLP
Long Conversation vs Short Conversation
Short Conversation
the goal is to create a single response to a single input. For example, you
may receive a specific question from a user and reply with an appropriate
answer.
Long conversation
go through multiple turns and need to keep track of what has been said. Customer
support conversations are typically long conversational threads with multiple
questions.
11. About ChatBot
Session 1 - Understand NLP
Open Domain vs Closed Domain
“Closed Domain
You can ask a limited set of questions on specific topics.
(Easier). What is the Weather in Miami?”
“Open Domain
I can ask a question about any topic… and expect a relevant response.
(Harder) Think of a long conversation around refinancing my mortgage
where I could ask anything.” Mark Clark
12. OverView Session 1 - Understand NLP
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning BasicNLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
SyntaxNet
Scenario
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System
ChatBot
Server
Numpy
Pandas
Tensorflow
파이프 라인 데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
데이터 수집
데이터 전처리
모델 훈련
모델 평가
모델 서비스
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
1 2
3
기본 이론
관련 딥러닝
이론 설명
예제를 통한
구현 설명
Session 1 - Understand NLP
Memory Network
Seq2SeqResponse Generation
Ontology
DM
Legacy Data Base
[AI Based] Chat-Bot Research Environment
Data MartMonitoring
Summary Result
Train Data
AI Model
Pipe Line
13. Session 1 - Contents
1. 자연어 처리 이론
> 일반적으로 자연어를 처리하기 위해 필요한 언어학적 이론 설명
2. 딥러닝 이론
> 자연어 처리 이론에서 이야기하는 문제에 해당하는 딥러닝 이론
3. 구현
> 딥러닝 및 라이브러리 등을 사용한 이론의 구현
14. About NLP (Natural Language Process)
Session 1 - Understand NLP
Mostly Solved Making Good Progress Still Really Hard
Spam Detection
(스팸분석)
Text Categorization
(텍스트 분류)
Part of Speech Tagging
(단어 분석)
Named Entity Recognition
(의미 구분 분석)
Information Extraction
(정보 추출)
Sentiment Analysis
(감정분석)
Coreference Resolution
(같은 단어 복수 참조)
Word Sense
Disambiguation
(복수 의미 분류)
Syntactic Parsing
(구문해석)
Machine Translation
(기계번역)
Semantic Search
(의미 분석 검색)
Question & Answer
(질의 응답)
Textual inference
(문장 추론)
Summarization
(텍스트 요약)
Discourse & Dialog
(대화 & 토론)
15. About NLP (Natural Language Process)
Session 1 - Understand NLP
Text Categorization
Text Classification assigns one or more classes to a document according to their content. Classes are
selected from a previously established taxonomy (a hierarchy of catergories or classes).
Spam Detection
Spam Detection is also the part of Text Classification problem.
Part of Speech
grammatical tagging or word-category disambiguation, is the process of marking up a word in a
text (corpus) as corresponding to a particular part of speech, based on both its definition and its
context
16. About NLP (Natural Language Process)
Session 1 - Understand NLP
Low Level Information Extraction
17. About NLP (Natural Language Process)
Session 1 - Understand NLP
Information Extraction on Broader view
https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwievZKlmMzVAhVCgrwKHbM_D88QFggyMAE&url=https%3A
%2F%2Fweb.stanford.edu%2Fclass%2Fcs124%2Flec%2FInformation_Extraction_and_Named_Entity_Recognition.pptx&usg=AFQjCNFUT9ZjvrDrx
F9su0J9KiWobVP4Kg
Rule Based
Extraction
Named Entity
recognition
Syntax Anal
Relation Search
Ontology
Information
Extraction
18. About NLP (Natural Language Process)
Session 1 - Understand NLP
Coreference Resolution
I did not vote for the Donald Trump because I think he is too reckless
Coreference resolution is the task of finding all expressions that refer to the same entity in a
text. It is an important step for a lot of higher level NLP tasks that involve natural language
understanding such as document summarization, question answering, and information
extraction.
Deep Reinforcement Learning for Mention-Ranking Coreference Models
Improving Coreference Resolution by Learning Entity-Level Distributed Representations
https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30
19. About NLP (Natural Language Process)
Session 1 - Understand NLP
Word Sense Disambiguation
[Example]
1. a type of fish
2. tones of low frequency
and the sentences:
1. I went fishing for some sea bass.
2. The bass line of the song is too
weak.
http://www.cs.cornell.edu/courses/cs4740/2014sp/lectures/wsd-1.pdf
supervised way lable data example
simi-supervised way
20. About NLP (Natural Language Process)
Session 1 - Understand NLP
Syntactic Parsing
syntactic parsing is Find structural relationships between words in a sentence
https://web.stanford.edu/~jurafsky/slp3/12.pdf
21. About NLP (Natural Language Process)
Session 1 - Understand NLP
Machine translation (MT) is automated translation. It is the process by which computer software is
used to translate a text from one natural language (such as English) to another (such as Spanish).
Machine Translation
22. About NLP (Natural Language Process)
Session 1 - Understand NLP
Semantic Search
Semantic search seeks to improve search accuracy by understanding a searcher’s intent through
contextual meaning.
Question and Answer
Able to answer questions in natural language based on Knowledge data (usually ontology)
ex) Best example is IBM Watson
Textural Inference
Recognize, generate, or extract pairs <T,H> of natural language
expressions, such that a human who reads (and trusts) T would infer that His most likely also true
Summarization
Extracting interesting parts of the text and create a summary by using these parts of the text and
allow for rephrasings to make summary more grammatically correct.
Discourse & Dialog
Do conversation with understanding the whole history of dialog and semantic meaning of speaker.
23. Standard Natural Language Process
Session 1 - Understand NLP
Spoken Utterance
Lexical (어휘) Analysis : Word Structure
Speech Recognition
Written Utterance
Syntactic (구문) Analysis : Sentence Structure
Morphemes, Word
Semantic (의미) Analysis : Meaning of Words & Sentence
Sentence
Discourse (대화) Analysis : Relationship between sentence
Context beyond Sentence
24. Lexical Analysis
Syntactic Analysis
Semantic Analysis
NLU Server
(Understand)
NLG Server
(Generate)
Voice Recognition
Discourse Analysis
자연어 처리 이론
기본 이론
Session 1 - Understand NLP
Session 1 - Now We are Here!
Response Generation
25. Session 1 - Understand NLP
AI Speaker Alexa Alexa Microphone System
NLP - Voice Recognition
26. Session 1 - Understand NLP
Deep Learning for Classification Hidden Markov Model for Language Model
NLP - Voice Recognition
27. Lexical Analysis
Syntactic Analysis
Semantic Analysis
NLU Server
(Understand)
NLG Server
(Generate)
Voice Recognition
Discourse Analysis
자연어 처리 이론
기본 이론
Session 1 - Understand NLP
Session 1 - Now We are Here!
Response Generation
28. Session 1 - Understand NLP
NLP - Lexical Analysis
Main Factors on Lexical Analysis
1. Sentence Splitting
2. Tokenizing
3. Morphological
4. Part of speech Tagging
29. Session 1 - Understand NLP
NLP - Lexical Analysis
Lexical Analysis
What if there is no line change char (‘n’) ? Where is the EOS point?
What if sentence is not separated into words properly with space?
[Examples]
[Problems]
30. Session 1 - Understand NLP
NLP - Lexical Analysis
Word stemming lemmatization
Love Lov Love
Loves Lov Love
Loved Lov Love
Loving Lov Love
Innovation Innovat Innovation
Innovations Innovat Innovation
Innovate Innovat Innovate
Innovates Innovat Innovate
Innovative Innovat Innovative
Morphing Examples Stemming & lemmatization
Morphology is process of finding morpheme which is smallest“meaningful unit (Lexical meaning
or grammatical function)” and other features like stem in a language that carries information.
Lexical Analysis
31. Session 1 - Understand NLP
NLP - Lexical Analysis
Lexical Analysis
Ambiguity
“that” can be a subordinating conjunction or a relative pronoun
- The fact that/IN you’re here
- A man that/WDT I know
“Around” can be a preposition, particle, or adverb
- I bought it at the shop around/IN the corner.
- I never got around/RP to getting a car.
- A new Toyota Prius costs around/RB $25K.
Degree of ambiguity (in Brown corpus)
- 11.5% of word types (40% of word tokens) are ambiguous
# of Tags 1 2 3 4 5 6 7
# of Words 35340 3760 264 61 12 2 1
#Ambiguity Problem is much serious in Korean
Part-of-speech tagging is one of the most important text analysis tasks used to classify words into
their part-of-speech and label them according the tagset which is a collection of tags used for the pos
tagging. Part-of-speech tagging also known as word classes or lexical categories
32. Session 1 - Understand NLP
NLP - Lexical Analysis
Lexical Analysis
Hannanum Kkma Komoran Mecab Twitter
하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun
을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa
나 / N 날 / VV 나 / NP 나 / NP 나 / Noun
는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa
자동차 / N 자동차 / NNG 자동차 /
NNG
자동차 /
NNG
자동차 /
Noun
Anal Result Comparison Library Performance Comparison
34. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq
35. Session 1 - Understand NLP
NLP - Lexical Analysis
(1) Word Segmentation
(2) POS Tagging
(3) Chunking
(4) Clause Identification
(5) Named Entity Recognition
(6) Semantic Role Labeling
(7) Information Extraction
What we can do with sequence labeling What’s sequence labeling
Sequence Labeling
36. Session 1 - Understand NLP
NLP - Lexical Analysis
Word POS Chunk NE
West NNP B-NP B-MISC
Indian NNP I-NP I-MISC
all-around NN I-NP O
Phil NNP I-NP B-PER
Simons NNP I-NP I-PER
took VBD B-VP O
four CD B-NP O
for IN B-PP O
38 CD B-NP O
on IN B-PP O
Friday NNP B-NP O
<iob data set example>
POS Tag 의미
ttps://docs.google.com/spreadsheet/ccc?key=0ApcJghR6UMXxdEdU
RGY2YzIwb3dSZ290RFpSaUkzZ0E&usp=sharing
Chunk Tag 의미
B : Begin of Chunk
I : Continuation of Chunk
E: End of Chunk
NP : Noun
VP : Verb
NER BIO Tag 의미
B : Start with new Chunk
I : word inside Chunk
O: Outside of Chunk
Sequence Labeling
37. Session 1 - Understand NLP
NLP - Lexical Analysis
BiLSTM-CRF Description
Sequence Labeling with Deep Learning
Deep Learning Basic
Word Embedding
DL FrameWorks
Prerequisite
38. Session 1 - Understand NLP
NLP - Lexical Analysis
VIDEO
Deep Learning Basic
39. Session 1 - Understand NLP
New Algorithms
Back Propagation
CNN, RNN .. etc
Big Data
HDFS
MapReduce
Hardware
GPU Parallel Execution
Cloud Service
NLP - Lexical Analysis
Deep Learning Basic
40. Session 1 - Understand NLP
3
5
7
9
(1) Problem (2) Algorithm (3) Programming
Y = 2 * X + 1
function(x)
{
return x*2 + 1
}
NLP - Lexical Analysis
Deep Learning Basic
41. Session 1 - Understand NLP
3
5
7
9
(1) Problem (2) Algorithm (3) Programming
Y = w * X + b
3
5
7
9
initial
optimized
NLP - Lexical Analysis
Deep Learning Basic
42. Session 1 - Understand NLP
Supervised Learning Unsupervised Learning Reinforcement Learning
CAT
CAT
CAT
DOG
DOG
DOG
Deep Learning Basic
NLP - Lexical Analysis
43. Session 1 - Understand NLP
1. Perceptron
2. Activation Function
3. Cost
4. Gradient Descent
5. Back Propagation
6. Optimizers
Deep Learning Basic
NLP - Lexical Analysis
44. Session 1 - Understand NLP
Deep Learning Basic - Perceptron
wX + b
NLP - Lexical Analysis
45. Session 1 - Understand NLP
Deep Learning Basic - Perceptron
wX + b Activation Function
NLP - Lexical Analysis
46. Session 1 - Understand NLP
Deep Learning Basic - Activation Function
Logistic Regression Nonlinear Problems
NLP - Lexical Analysis
47. Session 1 - Understand NLP
Deep Learning Basic - Activation Function
NLP - Lexical Analysis
48. Session 1 - Understand NLP
Deep Learning Basic - Loss (Error)
Initial
Optimized
LOSS
x y y~
0 3 7
1 5 9
2 7 11
3 9 13
4 11 15
5 13 17
6 15 19
Y
X0 1 2 3
Y = wX + b
NLP - Lexical Analysis
49. Session 1 - Understand NLP
x y init opt
0 3 7 3
1 5 9 5
2 7 11 7
init : ((7-3)^2 + (9-5)^2 + (6-11)^2) / 3 = 16
opt : ((3-3)^2 + (5-5)^2 + (7-7)^2) / 3 = 0
HOW?
Deep Learning Basic - Loss (Error)
W, b
Cost(W, b)
NLP - Lexical Analysis
53. Session 1 - Understand NLP
NLP - Lexical Analysis
SGD
Adagrad
RMS
Momentum
Nag
Adadelta
Adam
Adaptive 계열 알고리즘
Deep Learning Basic - Optimizer
기존 진행 방향 반영
가속도 개념의 적용
Momentum과 유사
이동 위치에서 반영
2차 미분 값 활용
느린 것은 더 빨리
빠른 것은 더 꼼꼼히
누적 Gradient 를 Sum이
아닌 지수평균으로대체하여
G가 무한이 커지는 것을 방지
지수평균 사용, StepSize
변화 값의 제곱 사용
Adadelta, Momentum
특성 두 가지 모두 적용
http://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html
54. Session 1 - Understand NLP
NLP - Lexical Analysis
https://arxiv.org/pdf/1705.08292.pdf
"Gradient descent (GD)나 Stochastic gradient descent (SGD)를 이용하여 찾은 solution이
다른 adaptive methods (e.g. AdaGrad, RMSprop, and Adam)으로 찾은 solution보다 훨씬
generalization 측면에서 뛰어나다."
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson] , Rebecca Roelofs] ,
Mitchell Stern] , Nathan Srebro† , and Benjamin Recht]∗ ] University of California, Berkeley. † Toyota
Technological Institute at Chicago May 24, 2017
There is no optimizer best for all cases!!
When to use adaptive optimizer?
If input embedding vectors are sparse, it’s better to use adaptive optimizer!
Deep Learning Basic - Optimizer
55. Session 1 - Understand NLP
# tf Graph input
x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([784, 256])),
'h2': tf.Variable(tf.random_normal([256, 256])),
'out': tf.Variable(tf.random_normal([256, 10]))
}
biases = {
'b1': tf.Variable(tf.random_normal([256])),
'b2': tf.Variable(tf.random_normal([256])),
'out': tf.Variable(tf.random_normal([10]))
}
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
pred = tf.matmul(layer_2, weights['out']) + biases['out']
hypothesis = tf.nn.softmax(pred )
# Define loss and optimizer
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis),
reduction_indices=1))
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
input Hidden Out
784
256
10
Hidden
256
784
256
784 256
256 10
256
S
O
F
T
M
A
X
Y=Activation(W*x + b)
[Error]
Cross
Entropy
W W1
A(W*x + b)
b
b
A(W*x + b)x
2
1
3
4
5
256
784
1
Deep Learning Basic
NLP - Lexical Analysis
56. Session 1 - Understand NLP
START 오늘 날씨 는 ? PAD PAD END
START 오늘 날씨 는 어때 ? PAD END
START 오늘 비가 오 려 나 ? END
Case of long sentence …
Vanishing Problem happens
Various length of data cause
waste of computing power
Here we have concept of Dynamic RNN
BiDirectional Lstm learn given data from backward Long Short Term Memory Cell
Cell State
https://brunch.co.kr/@chris-song/9
updateforget out
cell state
https://blog.altoros.com/the-magic-behind-google-translate-
sequence-to-sequence-models-and-tensorflow.html
NLP - Lexical Analysis
Deep Learning Basic
57. Session 1 - Understand NLP
NLP - Lexical Analysis
Deep Learning Basic
Overfitting
Fine Tuning
Multi Tasking
Ensemble
Data Preprocessing
Drop Out
Batch Normalization
Network Compression
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdfhttps://arxiv.org/pdf/1510.00149.pdf
Adam+SGD
Learning Rate
Decaying
Fully Convolutional
1by1 Convolutional Filter
Quantize Neural
Networks
AutoML
Hyper Parameter
Random Search
Grid
Search
Genetic
Algorithm
58. Session 1 - Understand NLP
Session 1 - Now We are Here !
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Numpy
Pandas
Tensorflow
데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
구현
Response Generation
Memory Network
Seq2Seq
60. Session 1 - Understand NLP
NLP - Lexical Analysis - Implementation
Deep Learning Framework comparison
dynamic vs static graph definition
Debugging Visualization
Deployment
VS
61. Session 1 - Understand NLP
NLP - Lexical Analysis - Implementation
Deep Learning Framework - Tensorflow
with tf.Graph().as_default() :
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
Tensorflow : static graph definition Pytorch : dynamic graph definition
65. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here!
Response Generation
Memory Network
Seq2Seq
66. Session 1 - Understand NLP
Word Embedding 이란 ?
텍스트를 구성하는 하나의 음소, 음절, 단어, 문장, 문서
단위를 수치화하여 표현하는 방법의 일종
67. Session 1 - Understand NLP
NLP - Lexical Analysis - Word Embedding
Word Representation
Discrete Representation
WordNet OneHot Vector
Distributed Representation
Direct Prediction
Word2Vec
Count Based
Full Document Windows
LSA SVD of x Glove
FastText
68. Session 1 - Understand NLP
WordNet
NLP - Lexical Analysis - Word Embedding
과거에는 WordNet과 같은 방법을 사용했다. WordNet이란, 각 단어끼리의 관계(상위단어, 동의어) 가 나타나 있는 트리구조의 그래프 모형이다.
물론 이를 구축하기 위한 작업은 전부 사람이 했다. 그러다보니 주관적이고 유지하는데 있어 많은 노동이 필요하다는 한계가 존재했다.
70. Session 1 - Understand NLP
LSA(잠재적 의미 분석) with SVD(특이값 분해)
NLP - Lexical Analysis - Word Embedding
https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/06/pcasvdlsa/
- doc1 doc2 doc3
나 1 0 0
는 1 1 2
학교 1 1 0
에 1 1 0
가 1 1 0
ㄴ 1 0 0
다 1 0 1
영희 0 1 1
좋 0 0 1
truncated SVDSVD
LSA(잠재적 의미 분석)
71. Session 1 - Understand NLP
SVD of X
NLP - Lexical Analysis - Word Embedding
https://swalloow.github.io/cs224d-lecture2
이 방법은 Window의 길이 (일반적으로 5 - 10) 에 따라 대칭적으로 이동하면서 확인하는 방법이다.
● I like deep learning.
● I like NLP.
● I enjoy flying
위와 같은 corpus가 있을 때, 이를 matrix로 표현하면 다음과 같다. 간단히 보면 각 단어의 빈도 수를 체크한 것이다.
SVD 로 차원 축소Window size로 빈도 조사 결과
72. Session 1 - Understand NLP
https://www.tensorflow.org/tutorials/word2vec
http://w.elnn.kr/search/
Word2Vector Demo Site
장점 : 차원의 축소 , 의미적 유사성의 표현
단점 : 동음이의어 처리, 데이터 적을 경우 신경망 훈련시 신호 강도
NLP - Lexical Analysis - Word Embedding
Word2Vec
73. Session 1 - Understand NLP
C-Bow
the quick brown fox jumped over the lazy dog
([brown, jumped], fox)
window size : 1
brown
jumped
over
the
.
.
brown
jumped
over
fox
.
.
Input OutputHidden
Hidden Size Hidden Size
Vocab
Size
Data Set
Original
Text
NLP - Lexical Analysis - Word Embedding
Word2Vec
74. Session 1 - Understand NLP
the quick brown fox jumped over the lazy dog
(fox, brown), (fox, jumped)
window size : 1
brown
jumped
over
the
.
.
brown
jumped
over
fox
.
.
Input OutputHidden
Hidden Size Hidden Size
Vocab
Size
Data Set
Original
Text
Skip-Gram
NLP - Lexical Analysis - Word Embedding
Word2Vec
75. Session 1 - Understand NLP
(1)PV-DM (2)PV-DBOW
(3)DM + DBOW (Vector Concat)
W2V W2V W2V
(4)AVG(TF-IDF * W2V)
the quick brown fox jumped over the lazy dog
(paragraph, the)
(paragraph, quick)
(paragraph, brown)
(paragraph, fox)
(paragraph, jumped)
([paragraph, quick, brown,
fox, juped], over)
([paragraph, quick, brown,
fox, juped,over],the)
vector vector vector
TF-IDF TF-IDF TF-IDF
X X X
vector
AVG
NLP - Lexical Analysis - Word Embedding
Doc2Vec
76. Session 1 - Understand NLP
tfidf(t,d,D) = tf(t,d) x idf(t,D)
https://thinkwarelab.wordpress.com/2016/11/14/ir-tf-idf-%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B4%85%EC%8B%9C%EB%8B%A4/
http://www.popit.kr/bm25-elasticsearch-5-0%EC%97%90%EC%84%9C-%EA%B2%80%EC%83%89%ED%95%98%EB%8A%94-%EC%83%88%EB%A1%9C%EC%9A%B4-%EB%B0%A9%EB%B2%95/
Not exactly word embedding but used on nlp with deep learning pretty often
- Document similarity
- Words importance on document
- Used on search engine (like elasticsearch though it use BM25 for now)
NLP - Lexical Analysis - Word Embedding
TF-IDF
78. Session 1 - Understand NLP
the quick brown fox jumped over the lazy dog
0.2 0.1 0.4 0.21 0 0 0
f o x fox
Word2Vector
0 1 0 0 0 0 1 0
OneHot
Encoding
OneHot
Encoding
OneHot
Encoding
1.Word2Vec 계열은 의미적 상관성을 잘 표현
2.OneHot 은 강한 신호적 특성으로 Train 에 효과적
3.Word 단위 Embedding 은 단어를 잘 기억함
4.Char 단위 Embedding 은 미훈련 단어 처리에 용이
NLP - Lexical Analysis - Word Embedding
+
Char +Word Concat
79. Session 1 - Understand NLP
Words not exactly matched with the pretrained dict will return “UNKNOWN”
So FastText (by Facebook ) use ngram on their word embedding algorithm..
에어컨 ~ 에어조단 비교
에어컨
['$$에', '$에어', '에어컨', '어컨$', '컨$$'] => 5
에어조단
['$$에', '$에어', '에어조', '어조단', '조단$', '단$$'] => 6
일치
['$$에', '$에어'] => 2
점수
일치 2건 / 중복제거 전체 7건 => 0.2222
NLP - Lexical Analysis - Word Embedding
FastText
80. Session 1 - Understand NLP
Glove
NLP - Lexical Analysis - Word Embedding
(their dot product equals the logarithm of the words’ probability of co-occurrence) “임베딩된 단어벡터 간 유사도 측정을 수월하게 하면서도 말뭉치
전체의 통계 정보를 좀 더 잘 반영해보자”가 GloVe가 지향하는 핵심 목표라 말할 수 있을 것 같습니다.
동시 등장 확률
https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/09/glove/
Glove 는 특정 문맥 단어가 주어졌을 때 임베딩된 두 단어벡터의 내적이 두 단어의 동시 등장 확률 간 비율이 되게끔 단어를 임베딩 하고자 하였음
81. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론 관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here!
Numpy
Pandas
Tensorflow
데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
구현
Response Generation
Memory Network
Seq2Seq
82. Session 1 - Understand NLP
NLP - Lexical Analysis - Word Embedding
OneHot Encoding : Simple Test Code show concept of onehot
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
[Code]
83. Session 1 - Understand NLP
NLP - Lexical Analysis - Word Embedding
Word2Vector : Using Gensim word2vec package
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
85. Session 1 - Understand NLP
NLP - Lexical Analysis - Word Embedding
FastText : Possible to use pretrained vector and do find tuning on it
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
86. Session 1 - Understand NLP
NLP - Lexical Analysis - Word Embedding
N-grams are simply all combinations of adjacent words or letters of length n that you can
find in your source text.
87. Session 1 - Understand NLP
NLP - Lexical Analysis - Word Embedding
For large dataset word2vec training GPU acceleration is needed
You can also think about using Tensorflow or Keras for training model
https://github.com/SimonPavlik/word2vec-keras-in-gensim/blob/keras106/word2veckeras/word2veckeras.py
https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py
88. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq
89. Session 1 - Understand NLP
NLP - Lexical Analysis - DL ALgorithms
Paper Model CoNLL 2003 (F1 %)
Collobert et al.(2011) MLP with word embeddings+gazetteer 89.59
Passos et al.(2014) Lexicon Infused Phrase Embeddings 90.90
Chiu and Nichols(2015) Bi-LSTM with word+char+lexicon embeddings 90.77
Luo et al.(2015) Semi-CRF jointly trained with linking 91.20
Lample et al.(2016) Bi-LSTM-CRF with word+char embeddings 90.94
Lample et al.(2016) Bi-LSTM with word+char embeddings 89.15
https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/
https://arxiv.org/pdf/1708.02709.pdf
NER (Named Entity Recognition) Algorithm Performance
90. NLP - Lexical Analysis - DL ALgorithms
what do we want to do with this algorithm?
91. Session 1 - Understand NLP
NLP - Lexical Analysis - BiLstmCrf
김승우 B-PERSON
전화번호 B-TARGET
검색 O
김승우 B-PERSON
이메일 B-TARGET
검색 O
김승우 B-PERSON
이미지 B-TARGET
검색 O
IOB Data
김승우 전화번호 검색
김승우 이메일 검색
김승우 이미지 검색
Plain Data
Sentence
Splitting
Token Morphing
Part of
Speech
Tagging
Lexical Analysis
Word2Vector
OneHot Encoding
1 0 0 0
0 1 0 0
0 0 1 0
김승우
전화번호
이메일
검색
B-PERSON
B-TARGET
김
우
승
Index
List
92. Session 1 - Understand NLP
NLP - Lexical Analysis - BiLstmCrf
김승우
전화번호
이메일
검색
B-PERSON
B-TARGET
김
우
승
Index
List
[Code]
95. Session 1 - Understand NLP
NLP - Lexical Analysis - BiLstmCrf
Conditional Random Field Soft Max
[Code]
96. Session 1 - Understand NLP
NLP - Lexical Analysis - BiLstmCrf
http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
Probabilistic Model for sequence data segmentation and labeling
https://www.slideshare.net/kanimozhiu/tdm-probabilistic-models-part-2
he first method makes local choices. In other words, even if we capture some information from the
context in our hh thanks to the bi-LSTM, the tagging decision is still local. We don’t make use of the
neighbooring tagging decisions. For instance, in New York, the fact that we are tagging York as a
location should help us to decide that New corresponds to the beginning of a location. Given a
sequence of words w1,…,wmw1,…,wm, a sequence of score vectors s1,…,sms1,…,sm and a
sequence of tags y1,…,ymy1,…,ym, a linear-chain CRF defines a global score s∈Rs∈R
97. Session 1 - Understand NLP
NLP - Lexical Analysis - BiLstmCrf
Real Project BiLstm Result Sample Code Predict Test Result
Test data Not Included in Train Set
Predicts well
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/
98. Lexical Analysis
Syntactic Analysis
Semantic Analysis
NLU Server
(Understand)
NLG Server
(Generate)
Voice Recognition
Discourse Analysis
자연어 처리 이론
기본 이론
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
99. Session 1 - Understand NLP
NLP - Lexical Analysis - SyntaxNet
구문 분석(構文分析, 문화어: 구문해석, 문장해석)은 문장을 그것을 이루고 있는
구성 성분으로 분해하고 그들 사이의 위계 관계를 분석하여 문장의 구조를
결정하는 것을 말한다.
Graph-Based Models Transition-Based Models
CYK Style Parsing MST finding Algorithm Projective & Non Projective Model
100. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models
Sentence W
Repeat until all words have their head
- Select two target words in data structure
(One dependent & one head candidate)
- Deterministically predict next parsing action from parsing model
- Modify structure according parsing action
C0 -> C1 -> C2 -> ……..C8 -> C9 -> C10 -> .… -> Cm D-tree
t1 t2 t3 t8 t9 t10 tm
Oracle
(Classifier)
Predict the best
transition
102. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Assume that we are given an oracle :
- for any non-terminal configuration, it can predict the correct transition
(for deterministic parsing)
- That is, it takes two words & magically gives us the dependency
relation b/w item if one exists
103. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Shift :
Move Economic from buffer B to stack S
104. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (had, news, nsubj) to A
Remove news from stack (since it now has head in A)
105. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (ROOT, had, root) to A
keep had in stack : because it can have other dependents on the right
106. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (effect, little, amod) to A
Remove little from stack (since it now has head in A)
107. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (had, effect, dobj) to A
Keep effect in stack : because it can have other dependents on right
108. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (effect, on, prep) to A
Keep on in stack : because it can have other dependents on the right
109. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Shift :
Move financial from buffer B to stack S
110. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Left-arc :
Add left-arc (market, financial, amod) to A
Remove financial from stack (since it now has head in A)
111. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (on, markets, pmod) to A
Keep markets in stack : because it can have other dependents on the right
112. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Reduce :
Remove markets, on, effect from stack (since they already have head in A)
※ All decisions like right-arc, left-arc, reduce, shift will be made by oracle
113. Session 1 - Understand NLP
NLP - Syntactic Analysis
Transition-Based Models - Arc Eager Transition System
Right-arc :
Add right-arc (had, period, p) to A
Keep period in stack
Done !
114. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq
115. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
Parsing type Paper Model WSJ
Dependency
Parsing
Chen and
Manning(2014)
Fully-connected NN with features
including POS
91.8/89.6
(UAS/LAS)
Dependency
Parsing
Weiss et al.(2015) Deep fully-connected NN with features
including POS
94.3/92.4
(UAS/LAS)
Dependency
Parsing
Dyer et al.(2015) Stack LSTM 93.1/90.9
(UAS/LAS)
Constituency
Parsing
Petrov et al.(2006) Probabilistic context-free grammars
(PCFG)
91.8 (F1 Score)
Constituency
Parsing
Zhu et al.(2013) Feature-based transition parsing 91.3 (F1 Score)
Constituency
Parsing
Vinyals et
al.(2015b)
seq2seq learning with LSTM+Attention 93.5 (F1 Score)
Syntax Parsing Algorithm Performance
파싱(parsing, 구문분석)에는 두 가지 유형이 있다. 하나는 개별 단어를 이들 사이의 관계를 고려해 연결하는 의존구문분석(dependency
parsing)과 텍스트를 반복적으로 하위 구문으로 분리하는 구성성분분석(constituency parsing)이다.
116. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
We show this layout in the schematic below: the state of the system (a stack and a buffer, visualized
below for both the POS and the dependency parsing task) is used to extract sparse features, which
are fed into the network in groups. We show only a small subset of the features to simplify the
presentation in the schematic
Google SyntaxNet with Deep Learning - Pos Tagging
117. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
Google SyntaxNet with Deep Learning - A Fast and Accurate Dependency Parser using Neural Networks
https://arxiv.org/pdf/1603.06042.pdf
1 2 3
1 I _ PRP PRP _ 2 nsubj _ _
2 knew _ VBD VBD _ 0 ROOT _ _
3 I _ PRP PRP _ 5 nsubj _ _
4 could _ MD MD _ 5 aux _ _
5 do _ VB VB _ 2 ccomp _ _
6 it _ PRP PRP _ 5 dobj _ _
7 properly _ RB RB _ 5 advmod _ _
8 if _ IN IN _ 9 mark _ _
9 given _ VBN VBN _ 5 advcl _ _
10 the _ DT DT _ 12 det _ _
11 right _ JJ JJ _ 12 amod _ _
12 kind _ NN NN _ 9 dobj _ _
13 of _ IN IN _ 12 prep _ _
14 support _ NN NN _ 13 pobj _ _
15 . _ . . _ 2 punct _ _
18 units
(1),(2),(3)
18 units
(1),(2),(3)
12 units
(2),(3)
(1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; => 6
(2) The first and second leftmost / rightmost children of the top two words
on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. => 8
(3) The leftmost of leftmost / rightmost of rightmost children of the top two
words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. => 4
118. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
Google SyntaxNet with Deep Learning - Local Parser
1. SHIFT: Push another word onto the top of the stack, i.e. shifting one token from the buffer to
the stack.
2. LEFT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an
arc pointing to the left. Push the first word back on the stack.
3. RIGHT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an
arc point to the right. Push the second word back on the stack.
119. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
As we describe in the paper, there are several problems with the locally normalized models we just
trained. The most important is the label-bias problem: the model doesn't learn what a good parse
looks like, only what action to take given a history of gold decisions. This is because the scores are
normalized locally using a softmax for each decision.
Google SyntaxNet with Deep Learning - Global Training
120. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
What’s Beam Search Algorithm on RNN ?
https://www.youtube.com/watch?v=UXW6Cs82UKo
Instead of try only the best every iteration, try all cases to the end and choose the sum is maximum.
But if you try to calculate all cases algorithms will be too heavy, so remain only the best few every
step and remove others (pruning). This is for find global maximum predict result .
121. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
What’s Beam Search Algorithm on RNN ?
Follow best every step may can miss chance to find global optimal case
122. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
What’s Beam Search Algorithm on RNN ?
Consider all cases will require too much computing power
123. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
What’s Beam Search Algorithm on RNN ?
Remove low score cases for every step (Pruning)
124. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
http://universaldependencies.org/
Google SyntaxNet do not support Korean as a default language.
But as we can see bellow, we can train the model with Sejong corpus data.
Though we have to covert the format for SyntaxNet to understand.
Google SyntaxNet with Deep Learning - How about Korean
125. Session 1 - Understand NLP
NLP - Syntactic Analysis - SyntaxNet
Demo Site (we also use samples on this site)
http://sejongpsg.ddns.net/syntaxnet/psg_tree.htm
SyntaxNet Korean with Docker (We pretrained Korean corpus and set up webserver for service)
https://github.com/TensorMSA/tensormsa_syntax_docker
Google SyntaxNet with Deep Learning - Test it by yourself
126. Lexical Analysis
Syntactic Analysis
Semantic Analysis
NLU Server
(Understand)
NLG Server
(Generate)
Voice Recognition
Discourse Analysis
자연어 처리 이론
기본 이론
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
127. Session 1 - Understand NLP
NLP - Semantic Analysis
Sentential semantics
- Semantic role labeling (SRL)
- Phrase similarity (=paraphrase)
- Sentence Classification, Sentence Emotion Analysis and etc
What is Semantic in study of language
Three perspectives on meaning
- Lexical semantics : individual words
- Sentential semantics : individual sentences
- Discourse or Pragmatics : longer piece of text or conversation
NLP Tasks for Semantics
128. Session 1 - Understand NLP
NLP - Semantic Analysis
What is Semantic Role Labeling (SRL)
SRL = Semantic roles express the abstract role that arguments of a predicate
can take in the event.
The police arrested the suspect in the park last night
Agent predicate Theme Location Time
Who did what to whom where when
Can we figure out that these sentences have the same meaning?
Can we figure out the bought, sold, purchase used on sentence with same meaning?
XYZ corporation bought the stock.
The sold the stock to XYZ corporation.
The stock was bought by XYZ corporation.
The purchase of the stock by XYZ corporation.
129. Session 1 - Understand NLP
NLP - Semantic Analysis - Semantic Role Labeling
Common Semantic Role Labeling Architecture
http://naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl-2013-tutorial.pdf
Syntatic
Parse
Argument
Identification
Argument
Classification
Structural
Inference
Prune
Constituents
Candidates
Semantic
roles
Arguments
Step-1 Candidate Selection
- Parse the sentence
- Prune/filter the parse tree
(eliminate some tree constituents to speed up the execution)
Step-2 Argument Identification
- A binary classification of each node as Argument or NONE
- Local scoring
Step-3 Argument Classification
- A multi class (one-of-N) classification of all the argument candidates
- Global /joint scoring
ML
ML
ML
130. Session 1 - Understand NLP
Paper Model CoNLL2005 (F1
%)
CoNLL2012 (F1
%)
Collobert et
al.(2011)
CNN with parsing features 76.06
Tackstrom et
al.(2015)
Manual features with DP for inference 78.6 79.4
Zhou and
Xu(2015)
Bidirectional LSTM 81.07 81.27
He et al.(2017) Bidirectional LSTM with highway
connections
83.2 83.4
의미역 결정(Semantic Role Labeling, SRL)은 문장에서 술어(predicate)-논항(argument) 구조를 발견하는 것을 목표로 한다. 각 목표 동사(술어)에 대해, 동사의 의미역을
취하는 문장의 모든 구성요소가 인식된다. 전형적인 의미 논항은 행위주, 대상, 도구 등이며 위치, 시간, 방법, 원인 등도 포함된다(Zhou and Xu, 2015). 표7은 CoNLL 2015 및
2012 데이터셋에서 여러 모델의 성능을 보여준다.
전통적인 SRL 시스템은 여러 단계로 구성된다. 파싱 트리를 생성한 뒤 트리의 노드가 주어진 동사의 논항을 나타내는지 판별한 다음, 해당 SRL 태그를 결정한다. 각 분류
과정은 많은 피처를 추출하여 통계 모델(statistical model)로 전달하는 과정을 대개 수반한다. (Collobert et al., 2011)
Tackstrom et al. (2015)는 술어가 주어지면 파싱 트리를 기반으로 하는 일련의 피처로 구성요소의 범위와 해당 술어에 대한 의미역 후보들에 점수를 매긴다. 그들은 효율적인
추론을 위한 동적 프로그래밍(dynamic programming) 알고리즘을 제안했다. Collobert et al., (2011)은 추가적인 참조 테이블의 형태로 제공된 파싱 정보에 의해 보강된 CNN을
사용하여 유사한 결과를 얻었다. Zhou and Xu(2015)는 임의의 긴 문맥을 모델링하기 위해 bidirectional LSTM을 제안했는데, 파싱 트리 정보 없이도 성공적인 것으로
판명되었다. He et al. (2017)은 이 연구를 더욱 확장해 ‘highway connection’을 소개했다.
NLP - Semantic Analysis - Semantic Role Labeling
LSTM is effective of SRL problem too !
131. Session 1 - Understand NLP
NLP - Semantic Analysis - Semantic Role Labeling
Bidirectional LSTM with highway connections
Stack more layers on RNN with highway technique !
https://homes.cs.washington.edu/~luheng/files/acl2017_hllz.pdf
132. Session 1 - Understand NLP
NLP - Semantic Analysis - Semantic Role Labeling
Semantic Role Labeling Applications
Information : Anna is friend of mine.
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/neo4j/neo4j_basic.ipynb
Who WhoWhat
session.run("MATCH (you:Person {name:'You'})"
"FOREACH (name in ['Anna'] |"
" CREATE (you)-[:FRIEND]->(:Person {name:name}))")
result = session.run("MATCH (you {name:'You'})-[:FRIEND]->(yourFriends)"
"RETURN you, yourFriends")
Neo4j Insert Query
Neo4j Jupyter example & visualize
133. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq
134. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
What kind of problem we want to solve ?
Can we figure out that these sentences are positive or negative?
돈이 아깝지 않다 (긍정)
다시는 오지 않을 거야 (부정)
음식이 정말 맛이 없다 (부정)
이 식당은 정말 맛있다 (긍정)
Analysis negative and positive with dictionary
word “않다” is usually negative but ?
돈이 아깝지 않다 => Positive
다시는 오지 않을 거야 => Negative
135. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
There are many ways of doing text classification..
Traditional Rule based Machine Learning - Logistic & SVM
Deep Learning - CharCNN, RNN, Etc..
136. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
Paper Model SST-1 SST-2
Socher et al.(2013) Recursive Neural Tensor Network 45.7 85.4
Kim(2014) Multichannel CNN 47.4 88.1
Kalchbrenner et al.(2014) DCNN with k-max pooling 48.5 86.8
Tai et al.(2015) Bidirectional LSTM 48.5 87.2
Le and Mikolov(2014) Paragraph Vector 48.7 87.8
Tai et al.(2015) Constituency Tree-LSTM 51.0 88.0
Kumar et al.(2015) DMN 52.1 88.6
https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/
https://arxiv.org/pdf/1708.02709.pdf
Semantic Analysis - CharCNN
137. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Deep Learning Method CharCNN can be a solution for this kind of problem.
1 2
138. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Preparing Data for embedding is pretty similar to other neural networks
1. Word Embedding & OneHot didn’t show that much difference.
2. Personally, prefer to concat char onehot + word2vector오늘
메뉴
는
뭐
지?
PAD
PAD
1. Need to define sentence max length
2. Need padding like other nlp neural networks
139. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Using Multi Convolution Filter Size
140. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Other steps are same (fully connected > softmax > loss> optimizer)
141. Session 1 - Understand NLP
NLP - Semantic Analysis - CharCNN
You can see Char CNN can distinguish two sentences
142. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq
143. Session 1 - Understand NLP
NLP - Discourse Analysis
https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/
Paper Model bAbI (Mean accuracy
%)
Farbes (Accuracy
%)
Fader et al.(2013) Paraphrase-driven lexicon
learning
0.54
Bordes et al.(2014) Weekly supervised embedding 0.73
Weston et al.(2014) Memory Networks 93.3 0.83
Sukhbaatar et
al.(2015)
End-to-end Memory Networks 88.4
Kumar et al.(2015) DMN 93.6
Discourse Analysis - End2End Memory Network
147. Session 1 - Understand NLP
Convert word index to embedding vector (Training target vector A,B,C)
1
3
Vocab
Size
2 Dim
Size
vocab size
Mem Size
NLP - Discourse Analysis - Memory Network
148. Session 1 - Understand NLP
Embedding A from given context sentences multiply Input Question Embedding (using embedding B
which is not defined on this code) ※ if it’s a first layer, if not it would be output of t-1 layer
1
2 1
2
multiply
NLP - Discourse Analysis - Memory Network
149. Session 1 - Understand NLP
NLP - Lexical Analysis - Memory Network
Set embedding C(on the code it’s B) this is also the target variable for train
150. Session 1 - Understand NLP
Embedding C(one the code it’s B) Multiply softmax result
NLP - Discourse Analysis - Memory Network
151. Session 1 - Understand NLP
For the last multiply question and output of memory network again
NLP - Discourse Analysis - Memory Network
153. Session 1 - Understand NLP
Set fully connected layer and calculate error with softmax cross entropy
NLP - Discourse Analysis - Memory Network
154. Session 1 - Understand NLP
On the given code I removed 90% of data set because we are using CPU for education..
So result may can be poor…..
NLP - Discourse Analysis - Memory Network
156. Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Understand NLP
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq
157. Session 1 - Understand NLP
Seq2Seq 모델은 기계번역, 요약, 간단한 질답 등 말 그대로 Input 과 Output 이 모두 Sequence Data 인
다양한 케이스에 적용이 가능하며, 이를 간단한 트릭을 적용하여 답변을 생성하는 용도로 사용할 수 있다.
- Input : 딥 러닝 재미 즐거운 일
- Output : 딥 러닝은 재미있고 즐거운 일이다
https://arxiv.org/pdf/1406.1078.pdf
https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow
NLP - Response Generator - Seq2Seq
https://nlp.stanford.edu/pubs/emnlp15_attn.pdf
169. Session 1 - Understand NLP
NLP - Response Generator - Seq2Seq
Pointer Network
https://medium.com/@devnag/pointer-networks-in-tensorflow-with-sample-code-14645063f264
논문 저자들은 “포인터 네트워크"라는 새로운 뉴럴넷 구조를
제안합니다. 포인터 네트워크는 집중 메커니즘을 가진 seq2seq
구조로, 입력의 "인덱스"를 출력합니다. 출력 보카가 입력 시퀀스의
길이에 따라 달라지므로 다양한 크기의 입력을 다룰 수 있다는
장점이 있습니다. (주석: 기존의 seq2seq나 뉴럴 튜링 머신은
고정된 길이만 다룰 수 있었습니다.) 여기서 사용한 집중
메커니즘은 표준 seq2seq 집중 메커니즘을 살짝 변형했으며
O(n^2)의 시간 복잡도를 갖습니다.
논문 저자들은 제안한 구조를 평가하기 위해 컨벡스 헐, 딜루나이
삼각화, 순환 판매원 문제(TSP) 등 입력의 위치(순서)를 정답으로
출력해야하는 과제를 사용했습니다. 그 결과 포인터 네트워크는 잘
작동했고, 심지어 학습 데이터보다 더 긴 길이의 시퀀스에서도
동작했습니다.
What else ?
171. Session 2 - 강의 목표
Sessionn 1에서 배운 NLP에 대한 이해를 바탕으로 AI를 적용하여 전체
아키텍쳐를 이해하고 피자 주문 봇을 바탕으로 수강생분들이 자기만의
ChatBot을 만들어 가는 것을 목표로 함
Session 2 - Make ChatBot
172. Session 2 : Susang Kim healess1@gmail.com
●Chatbot Develover
○ Released in POSCO (Find people using by NLP/AI)
○ Deep Learning MSA (ML,DNN, CNN, RNN)
●Agile Develover (worked at Pivotal Labs)
○ TDD, CI, Pair programming, User Story
●iOS Develover (Ranked App store in 100th - 2011 Korea)
●Front-End Developer (React, D3, Typescript and ES6)
●OSS world Challenge 2017 (on top 12 , on progress now)
●POSCO MES ... (working at POSCO ICT for 10 year)
173. Facebook AI shut down after creating their own language
논문 https://arxiv.org/abs/1706.05125
174. Remind of Session 1
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning BasicNLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
SyntaxNet
Scenario
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System
ChatBot
Server
Numpy
Pandas
Tensorflow
파이프 라인 데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
데이터 수집
데이터 전처리
모델 훈련
모델 평가
모델 서비스
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message 기본 이론
관련 딥러닝
이론 설명
예제를 통한
구현 설명
Memory Network
Seq2SeqResponse Generation
Ontology
DM
Legacy Data Base
[AI Based] Chat-Bot Research Environment
Data MartMonitoring
Summary Result
Train Data
AI Model
Pipe Line
Session 2 - Make ChatBot
175. Session 2 - Make Chatbot
[출처 Deview 2016 - https://deview.kr/2016/schedule#session/176]
요즘 왜 Chatbot이 뜨는가??
직관적인 UX
일관성 있는 경험
음성과 연결 가능
별도 App 설치 필요 없음
다양한 서비스와 연결 가능
빠른 Feedback
플랫폼에 독립
176. Chatbot의 특징
• 많은 기술이 필요 (NLP, AI, F/W, Text Mining and 다양한 개발 skill)
• Deep Learning을 공부하는 입장에서 결과 확인이 빠름
- 적은 Computing으로 빠른 결과확인 가능 (Text 기반)
• 재미가 있음(Micro Data처리에 비해 Biz dependency가 적은편)
- 이미지(CNN)이나 정형Data(DNN)보다는 Data처리에 대한 부담감이 적음
(형태소 분석기등으로 쉽게 전처리 쓴다는 가정하에)
• 응용분야가 많음 (API기반의 다양한 서비스 연결 Smart Management)
- Intent와 Slot만 채워주면 어느 서비스와 연결가능
• 관련 오픈소스가 적어 블루오션 (한글은 대부분 자체개발해야함)
- 다행인건 딥러닝 기반의 언어독립적 Text algorithm이 많이 공개되어 활용 가능
• Bot Service가 있으나 가격부담, 한국어는 잘안됨, Customize 불가
Session 2 - Make ChatBot
177. Session 2 - Understand Chatbot
Chatbot은?
AI
(패턴,맥락)
언어학
(자연언어처리)
프로그래밍
(Data처리-Python)
Bot F/W
(Story/Slot설계)
Architecture
(응답속도)
Text Mining
(Data구성)
Chatbot
Chatbot 구현을 위해서는 많은 분야의 다양한 기술 필요
178. Session 2 - Make Chatbot
다양한 Chatbot Platform이 존재는하고 있음
API.AI로 코딩없이 챗봇 만들기 https://calyfactory.github.io/api.ai-chatbot/
모든 챗봇에는 의도와 개체인식이
존재 또한 그 것을 위해서는 Data가
중요함!!!
api.ai에 가입해서 챗봇을
만들어보면서 원리를 파악해보면
도움이 됨
179. Session 2 - Make Chatbot
Closed Domain vs Open Domain
Rule Based
General
(abstract)
Open
Closed
Retrieval
(accuracy)
Impossible Strong AI
Weak AI
level of difficulty
작은 Biz 도메인으로 시작해서 정확도를 높이면서 여러 Biz를 추가하는 상황
180. Session 2 - Make Chatbot
Rule Based vs AI
Computer
Input
Program
Output
Rule
이름, 지역, 팀등 조건별로 일일이 rule을 등록해야한다
- 정확도는 올라가나 모든 질문을 다 등록??
(룰을 백만개 등록하면 가능)
Computer
Input
Output
Program
AI(ML, DL)
라벨링된 Data만으로 결과를 구할 수 있는 모델을 만들 수 있다
- 비슷한 Data들도 잘찾는편(Word2Vec,Glove)
intent = 판교에 근무하는 김수상 찾아줘 => Intent : 특정 지역 사람 찾아줘
NER = 판교에 근무하는 김수상 찾아줘 => NER : B-Loc O O B-Name O
정확한 결과를
얻을 수 있으나
모든 질문은 불가
비슷한 유형의
질문은 적당히 잘
찾아줌 Data가
많을 수록 정확도
향상(학습효과)
If (loc = 판교 and comp = 포스코ICT)
person = 김수상
elif (loc = 판교 and comp = SK)
person = 가나다
else
person = 홍길동
181. Make ChatBot Now
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning BasicNLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
SyntaxNet
Scenario
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System
ChatBot
Server
Numpy
Pandas
Tensorflow
파이프 라인 데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
데이터 수집
데이터 전처리
모델 훈련
모델 평가
모델 서비스
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message 기본 이론
관련 딥러닝
이론 설명
예제를 통한
구현 설명
Memory Network
Seq2SeqResponse Generation
Ontology
DM
Legacy Data Base
[AI Based] Chat-Bot Research Environment
Data MartMonitoring
Summary Result
Train Data
AI Model
Pipe Line
Session 2 - Make ChatBot
This Lesson
182. Session 2 - Make Chatbot
나만의 ChatBot를 만들어보자
피자 주문 챗봇을 어떻게 만들지?
피자를 주문하려면 피자 종류도 여러가지고, 사이즈도 다양하고,장소와
날짜, 사이드메뉴도등 다양한데 어떻게 ChatBot으로 만들 수 있을까?
⇒ 피자주문과 관련된 스토리가 구성되야함
⇒ 딥러닝과 적당한 로직으로 피자 주문 Bot을 만들어보자
183. Session 2 - Make Chatbot
NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
Session 2 - Make Chatbot
질문 : 판교에 포스코ICT에 배달해줘
답변 : 사이즈를 선택해주세요
답변 : 장소를 입력해주세요
답변 : 피자주문 처리가 완료되었습니다.
Text(Message)
1
3
4
2
185. Session 2 - Make Chatbot
Story slot의 구성 (Frame-based DM)
피자 주문하고 싶어
Pizza Slot
Size
Type
Side menu
피자 주문 의도 파악
피자 Bot의 스토리 구성
1) 어떤 사이즈를 원하시나요?
2) 어떤 종류를 원하시나요?
3) 사이드 메뉴는 필요하신가요?
사용자 답변
- 페파로니 피자로 라지 사이즈에
콜라추가해주세요
NER처리 및 Slot 구성
Pizza Slot
Size Large
Type Pepperoni
Side menu cola
서비스 연결
(Slot API Call)
처리를 위해 Slot를 선택할
수 있게 보여주는 것도 방법
(UX기술까지 필요??)
186. Session 2 - Make Chatbot
1. 맥북 프로 검색해줘
2. 전처리 -> 맥북 프로 NER
3. 맥북프로 -> 대표 Entity처리 -> MacBook Pro API Call
4. 검색결과 출력
5. 상세 서비스 조회를 위한 Slot 출력
6. 새상담 원할 경우 새상담 클릭
Slot를 선택할 수 있게 화면에 출력함으로써 챗봇의
정확도를 대폭 향상 시킬 수 있음
(해당 Frame안에서만 선택할 수 있기에…)
ex) “삼성 노트북” 쳐보면 Slot별 선택
바로봇
http://www.11st.co.kr/toc/bridge.tmall?method=chatPage
Slot
Trigger
API
187. Session 2 - Make Chatbot
NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
Session 2 - Make Chatbot
판교에 포스코ICT에 배달해줘
NLU를 어떻게 한다는거지?
=> AI 적용을 위해 Vector로
변환을 해야함
1
188. Session 2 - Make Chatbot
Word Represention의 정의 (컴퓨터가 잘 이해할수 있게)
- One Hot은 단어별 강한 신호적 특성으로 Train 에 효과적 (Scope가 작을경우-Sparse)
- Word 단위 Embedding 은 단어를 잘 기억함 (But Sparse) / W2V (유사도)
- Gloves는 단어의 세부 종류까지도 구분 (카라칼-고양이)
- Char 단위 Embedding 은 미훈련 단어 처리에 용이 (Vector을 줄이기위한 영어변환)
- 한글을 변환한 영어 Char 단위 Embedding는 백터 수를 줄이면서 영어 처리도 가능
Train을 위한 Word Representation
15 한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구.pdf
189. Session 2 - Make Chatbot
일반적으로 Biz에 따른 Text는 존재하나 Deep Learning를 구현하기 위해서는
정제된 Text와 Tagging이 가능한 매우 많은 Data가 있어야함
한국어 Corpus를 일반적으로 세종 말뭉치를 사용하여 추가적인 Biz 어휘는 새로 학습시킴(노가다)
- Corpus (annotation) 세종말뭉치(2007 ) https://ithub.korean.go.kr/user/main.do
- 물결21 (2001~2014) 소스X http://corpus.korea.ac.kr/
- Web Crawling or down (Wiki, Namu Wiki)
- Domain Specific의 경우엔 Text Data는 직접 만들어야함(Augmentation)
특화된 단어의 경우 새로 학습시켜야함 (ㅎㅇ? , 방가방가)
※ 고유명사등 새로운 어휘가 생성될때 새로 등록을 해주어야함
Data를 어떻게 얻는가?
190. Session 2 - Make Chatbot
문체부·국립국어원 '2차 세종계획' 추진
4차 산업혁명의 기반인 인공지능(AI)의 핵심 중 하나는
사람과 기계의 자유로운 의사소통이다.
컴퓨터가 인간의 말이나 글을 제대로 이해하고
반응하려면 인간이 말하고 쓰는 자연언어를 처리할 수
있는 방대한 언어 데이터베이스가 필요하다.
이러한 언어 데이터베이스를 말뭉치(corpus)라고 한다.
최근 빠르게 보급되는 음성인식 인공지능의 정확도는
이러한 말뭉치가 얼마나 풍부하게 정교하게 구축돼
있느냐에 달려있다.
문화체육관광부와 국립국어원은 한국어 인공지능
기술의 발전을 위해 2018~2022년 총 154억7천만 어절의
말뭉치를 구축하는 국어 정보화사업 계획을 마련했다고
9일 밝혔다.
191. Session 2 - Make Chatbot
Train Vector를 정한 후 Feature를 뽑아야함
Cleansing -> Feature Engineering -> Train
(상황별 특수문자 제거, 의미 있는 단어 도출 - Tagging)
의도나 객체와 상관있는 단어만 추출해내어
성능을 향상시킴Train Cost를 줄이고 모델의 성능을 향상)
임베딩 차원도 줄이는 효과 (Dense Respresention-SVD)
abcd~z, 0~9, ?, !, (,),’,’,공백등 약 70여개
초중종성으로 글자를 쪼개기에는 어려움
.lower()를 활용하는것도 방법 백터 줄이기
학습시킬 Data의 구성
192. Session 2 - Make Chatbot
판교에 포스코ICT에 배달해줘
Data의 양이 적은데 어떻게
정제된 Data를 구하지?
[AI Based] Chat-Bot Research Environment
Data MartMonitoring
AI Model
Pipe Line
Session 2 - Make ChatBot
Session 2 - Make Chatbot
1
193. Session 2 - Make Chatbot
Data Augmentation for AI (Intent - tag)
판교에 오늘 피자 주문해줘 Story Definition
Intent Mapping주문 해줘 Entity Mapping 메뉴 : 피자, 장소 : 판교, 날짜 : 오늘
Pattern Generation
30% of Train Data
의도 : 피자 주문 (주문)
Preprocessing판교 오늘 피자 주문
Story key value (주문)
tagloc tagdate tagmenu 주문
Model Train(Char-CNN)
Evaluation
tagloc tagdate tagmenu 주문
tagloc tagdate 주문
tagdate tagmenu 주문
tagloc tagmenu 주문
Predictiontagloc tagdate 주문 tagmenu
Hyper parameter
Selection
의도 = 주문
194. Session 2 - Make Chatbot
Data flow for Model in AI (NER - BIO)
판교에 오늘 피자 주문해줘 Story Definition
tagloc tagdate tagmenu 주문
BIO-Mapping
Preprocessing판교 오늘 피자 주문
B_Loc / B_Date / B_menu
Model Train(Bi-LSTM)
B-loc B-date B-menu 주문
B-loc B-date 주문
B-date B-menu 주문
B-loc B-menu 주문
Text Generator
Pattern Matching
tagloc tagdate tagmenu 주문
tagloc tagdate 주문
tagdate tagmenu 주문
tagloc tagmenu 주문
W2V
30% of Train Data Evaluation
Prediction판교 오늘 피자 주문
Hyper parameter
Selection
피자 : 0.12
장소 : 0.7
메뉴 : 0.3
객체인식
B_loc O B_Date B_menu 주문 O
195. Session 2 - Make Chatbot
NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
Session 2 - Make Chatbot
판교에 포스코ICT에 배달해줘
Data는 구했는데 의도를
어떻게 알아내지?
1
196. Intent를 알아내는법 (Text Classification)
피자주문 하고 싶어 / 여행 정보 알려줘 / 호텔 예약해줘
주문, 정보, 예약의 3가지 의도
문장 내 Word검색으로 일일이 파악할 수도 있으나 한계가 있음
ex) 피쟈 시켜먹고 싶어 / 여행 좋은데 알려줘….
Deeplearning를 활용하면 이런 문제들을 해결 할 수 있음
Char + CNN으로 분류해보자
(CNN - Feature 주문, 정보, 예약)
(Word Similarity 피자, 피쟈 / 정보, 갈만한데)
197. Intent를 알아내는법 (Text Classification - Data 구성)
Word
피자
주문
하고
싶어
Vector가 많다면
영어발음변환
PIJA
JUMUN
HAGO
SIPO
숫자, 특수문자,공백등
모두 고려해야함
W2V(Pretrained)
피자 (0.12, 0.54, 0.72)
주문(0.56, 0.65, 0.64)
하고(0.67, 0.91, 0.13)
싶어(0.89, 0.14, 0.11)
Ont Hotencoding (Word단위 or 글자단위)
(0100000000)
(0000010000)
(0010000000)
(0000000100)
Ont Hotencoding (A~Z Vector)
(0100000000)
(0000010000)
(0010000000)
(0000000100)
198. Char CNN?
CNN은 일반적으로 이미지의 특징을
추출하여 인식하는데 많이 쓰이나
이미지도 결국은 Vector이고
텍스트도 Vector을 감안하면
텍스트의 Feature를
뽑아낼 수 있음
199. Text Classification - Char CNN
지금
피자
주문
하고
싶어
[논문 Convolutional Neural Networks for Sentence Classification - Yoon Kim - https://arxiv.org/abs/1408.5882]
예약
주문
정보
Feature
바라볼단어수
[3,4,5 filter]
Vector (W2V)
길이/차원/윈도우
Static / Non Static
/ Random
pooling
추상화
classification
분류
Char-CNN을 활용하여 의도를 파악해보자
200. Why Char-CNN??
Char-CNN이 일반적인 다른 알고리즘과 비교하여 좋은 성능을 보임
논문 Convolutional Neural Networks for Sentence Classification - Yoon Kim - https://arxiv.org/abs/1408.5882
202. Session 2 - Make Chatbot
NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
Session 2 - Make Chatbot
판교에 포스코ICT에 배달해줘
Entity는 어떻게 알아내지?
1
203. RNN에 대한 이해
연속된 Data에 대한 모델링에 유용
시퀀스를 입력으로 받기 때문에
Backpropagation을 시간에 대해서도 수행(BPTT)
http://aikorea.org/blog/rnn-tutorial-3/
207. Named Entity Recognition 알아내기
Bidirectional LSTM (양방향 Layer)
- RNN기반의 모델
- 특정위치에 있는 단어의 태깅에 유용
문장내 단어 위치에 따른 의미 처리하는 효과적인 방법
[ 한국어 정보처리 학술대회 - https://sites.google.com/site/2016hclt/jalyosil]
209. 피자 주문하고 싶어
B-Pizza B-Order O O
여행 정보 알려줘
B-Travel B-Information O
호텔 예약해줘
B-Hotel B-Reserve O
Named Entity Recognition 알아내기
brat를 활용 BIO Tagging
B-시작어휘
I-이어지는 어휘
O-어휘아님, 공백(OUT)
U-Unknown
(Word Embedding이 없을시)
※New York?,수상하다?
Brat - http://brat.nlplab.org/examples.html / https://wapiti.limsi.fr/
210. Bi-LSTM으로 사전 강화 -> 모델 학습
피자 주문하고 싶어
B-Pizza B-Order O O
여행 정보 알려줘
B-Travel B-Info O
호텔 예약해줘
B-Hotel B-Reserve O
피이쟈 주문하고 싶어
놀러갈 정보 알려줘
숙소 예약해줘
피자
여행
호텔
Bi-LSTM을 통해서 신규 어휘를 도출하고 학습Data에
반영하여 모델의 성능을 지속적으로 향상 시킴
211. Session 2 - Make Chatbot
NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
판교에 포스코ICT에 배달해줘
의도도 파악했고 Entity도
알아냈으니 서비스를 만들어보자 message
Session 2 - Make Chatbot
12
3
212. Session 2 - Make Chatbot
ChatBot Layer
Log File
Chatbot Architecture
Deep Learning Layer 위에 ChatBot Layer 와 같은 Application Layer 를
구성하고 각 Application Layer 는 필요한 기능을 DL Layer 와 연동.
DeepLearning Layer
Bi-LSTM
CRF
Char-CNN
SVM
Seq2Seq
Attention
NAS File
Model
Bot DB
Residual
Vgg
NLP
Context
Analyzer
Decision
Maker
Response
Genertor
※ 이미지검색을 위해 Residual등과 같은 모델 활용
Bot Builder
GPU
Deeplearning
Predict
Dict File
Bot config
Train
Train
Intent /
NER
214. Session 2 - Make Chatbot
Docker (Ubuntu) in AWS EC2
(c4.8xlarge / p2.xlarge GPU)
NAS
DB Server
Bot Builder
(analysis)
React
Chatbot Server (Django)
Python
Tensorflow
Postgres
SQL
Bootstrap
Web Service Architecture(MSA)
D3
SCSS
Konlpy
Nginx
Celery
Log File Model File
Rabbit
MQ
Service
Java
Node
Python
Rest
Gensim
Front-End
Java
(Trigger)
Rest
LB Rest
AP2
GPU Server
(HDF5)
GPU Server
(HDF5)
Dict File
Hbase
215. Session 2 - Make Chatbot
Bot Builder and UX (Story)
216. Session 2 - Make Chatbot
ChatBot
Definition
ChatBot Intent
ChatBot
Service
ChatBot Intent
Entity
ChatBot
Story
ChatBot
Response
ChatBot
Model
ChatBot
Tagging
ChatBot
Entity Relation
ChatBot
Synonym
Bot Builder DB
Service의 확대를 위해 가능하면 Common하게 구성
217. Session 2 - Make Chatbot
Rest API
Client
Input Data=페파로니 피자 주문할께
Intent=’’
Intent_History=[‘ ’,’’]
story_slot_entity
{
메뉴:’’,
사이즈:’’,
사이드:’’
}
request_type=text
service_type=’’
output_data=’’
Server
Input Data= 페파로니 피자 주문할께
Intent=피자주문
Intent_History=[‘피자주문’,’’]
story_slot_entity
{
메뉴:피자,
사이즈:라지,
사이드: 콜라
}
request_type=text
service_type=’’
output_data=주문완료
Chatbot API
※ 필수 값들만 JSON으로 통신하고 다른 값은 Dilog Manager(Log)에서 관리
218. Session 2 - Make Chatbot
Case별 Test Coverage 코드 구현
1. 로직 변경 (단위테스트)
2. Model 변경 (Hyper Parameter)
3. Data 변경(Slot, Dict, Entity,유의어)
4. 속성 값 변경 (Threshold, Rule기준)
단순 로직 변경과는 다르게 Data와 Model의 변경사항을 지속적 검증 할 수 있는 방안 필요
가동상황에서 정확도를 올리기 위해선 Continous Integration이 필수 (Jenkins / Travis CI등)
Test Codes for Chatbot
피자주문
호텔예약 의도점검->NER점검-> Slot점검
여행정보
input 판교에 피자주문할께 -> intent : 피자주문
slot : {메뉴,크리,사이드-extra}
220. 모델의 정합성을 올리기 위해 복수개의 모델과 로직으로 보완 (Scoring / Voting)
의도를 찾는 경우 여러모델을 비교하여 가장 근접한 값을 찾는다
Textming과 앙상블의 조합으로 정합도르 올리자(Fine tunning)
포스코ICT에
지금 피자
배달해줘
Char-CNN
VotingSVM(Multi-class) Result
naive_bayes.MultinomialNB 각 의도별 Slot 비교
배달의 경우엔
장소,시간이 필수
여행정보
메뉴배달
메뉴배달 피자 배달
Ensemble and Voting
모델별
가중치
Slot
비교
병렬 수행
221. Trigger 처리 (사랑, 이미지 검색)
1. 사랑단어가 포함될 경우
<실재 가동 사례>
직원 : XXX 사원에게 사랑한다고 포스톡 보내줘
챗봇 : 너무 쉽게 사랑하지 마세요.
직원 : 니가 먼제 내 사랑을 논해
챗봇 : 학습중이라 아직 잘 모르는게 많아요.
직원 : ㅋㅋㅋㅋ
챗봇 : ㅋㅋㅋ
[안녕, 사랑, ㅋㅋㅋ] 등에 Trigger를 적용하고 이에 확보된
Data를 Seq2Seq모델에 학습시켜 NLP전처리 모델로 사용
https://www.youtube.com/watch?v=x9bvkXJ-JeQ
2.이미지 검색 시(ResNet Model Call)
222. 필요시 Tone Generater을 쓰자
말투를 다르게만듬 (지역별, 존댓말 , 부하톤)
주문이 완료되었습니다 (일반)
주문이 완료되었단다 (공손)
주문이 완료되었어요 (존대)
주문이 완료되었다니깐 (짜증)
Seq2Seq Model활용 - Encoder에 명사등 구성
Decoder에 명사+조사 구성
Response Generator의 경우 형태소 분석기의 응용
223. 유의어 처리(N-Gram)
페파로니 - Pepperoni, 폐파로니, 페파피자..... / Mac Book Pro - 맥프로, 맥북프로...
고객별로 다양한 단어를 사용하나 API호출시에는 지정 값으로 해야 함
N-Gram을 활용하여 유의어로 학습한 결과를 Dict에 찾는 방식 (일반적 trigram)
링크 https://www.simplicity.be/article/throwing-dices-recognizing-west-flemish-and-other-languages/
각 Entity별 N과
Threshold 값을 적절하게 조절
※ threshold :
작을수록 비슷하게 찾음
224. Response Speed
LB 구성
Nginx 사용
적절한 수의 Thread와 AP
Caching of Data (Memory - API사용)
Chatbot에서 수용할수 있는 MAX Time반영
225. 학습시 병렬 처리를 위한 Coding
tf.device를 통해 연산할 Device를 지정
CPU와 GPU의 적절한 분배
GPU가 많다고 무조건 빠른지는...
227. 마무리
● 챗봇의 구현에 있어서 Hot한 기술의 사용도 중요하지만
무엇보다 Domain별 Data의 의미를 알고 컴퓨터가 잘 이해할 수 있게 해야함
● 학습할 Data와 예측 Data의 패턴을 일치화하는 것이 중요(일관성)
● 딥러닝은 대량의 정제된 Data와 확보가 중요함
● 딥러닝은 성능개선에 있어 충분한 해결 방안이 될 수 있음
228. When the singularity comes...
Google IO17 : https://www.youtube.com/watch?v=Y2VF8tmLFHw
229. Reference
모두를 위한 딥러닝
http://hunkim.github.io/ml/
제28회 한글 및 한국어 정보처리 학술 대회
한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구등
Stanford University CS231n
http://cs231n.stanford.edu/
Creating AI chat bot with Python 3 and Tensorflow[신정규]
https://speakerdeck.com/inureyes/building-ai-chat-bot-using-python-3-and-tensorflow
파이썬으로 챗봇_만들기 [김선동]
https://www.slideshare.net/KimSungdong1/20170227-72644192?next_slideshow=1
딥러닝을 이용한 지역 컨텍스트 검색 [김진호]
http://www.slideshare.net/deview/221-67605830
Developing Korean Chatbot 101 [조재민]
https://www.slideshare.net/JaeminCho6/developing-korean-chatbot-101-71013451
Tensorflow-Tutorials
https://github.com/golbin/TensorFlow-Tutorials