Sk t academy lecture note

Lecture BY
Session 1 : SeungWoo Kim tmddno1@gmail.com
Session 2 : SuSang Kim healess1@gmail.com
Python과 Tensorflow를 활용한 AI Chatbot 개발

1. 도커실행환경
https://github.com/TensorMSA/tensormsa_docker.git
./tensormsa_docker/docker_compose_cpu
2. 소스설명코드 - jupyter
git clone https://github.com/TensorMSA/tensormsa_jupyter.git
Session 1 : chap05_nlp
Session 2 : chap13_chatbot_lecture
시작 전 실습 환경 구성

●ML&DL Engineer (2014 ~ 2017)
○ POSCO Smart Factory Machine Learning Based Scheduling (2014~2015)
○ POSCO AI ChatBot (2016 ~ 2017)
○ Deep Learning Open Source Framework - TensorMSA (2016~2017)
●Android Developer - POSCO Mobile system (2010 ~ 2014)
○ LBS, IPS Vehicle & Navigation System
○ IPS with Deep Learning - Patent (2016)
●Awards
○ OSS world Challenge 2017 (on top 12 , on progress now)
○ Employee of the year 2015, 2017 on POSCO ICT
●Woori Bank AI (‘17.11.1 ~)
Session 1 : SeungWoo Kim tmddno1@gmail.com

Session 1 - 강의 목표
전체 ChatBot 아키텍처를 이해하고 서비스를 구성하기 위해 필요한 기반
지식에 대한 설명을 통해 Session 2 에서 실질적인 챗봇 개발에 대한 설명을
더 잘 이해 할 수 있도록 돕고자 함 .
챗봇 , 자연어 처리, 딥러닝 그리고 구현의 연관성을 이해하는 것에 중점 !
Session 1 - Understand NLP

About ChatBot
Natural Language
Understanding
Natural Language
Generation
User System
자연어
Semantic Frame자연어
Semantic Frame
Why we need nlp on ChatBot system?

About ChatBot
Sort of Chatbot
Easy Hard
Retrieval-based model Generative model
Traditional algorithms Deep Learning algorithms
Short Conversation Long Conversation
Closed Domain Open Domain

About ChatBot
Retrieval-Based vs Generative Models
Retrieval-based models (easier)
use a repository of predefined responses and some kind of heuristic to pick an
appropriate response based on the input and context. The heuristic could be as
simple as a rule-based expression match, or as complex as an ensemble of Machine
Learning classifiers. These systems don’t generate any new text, they just pick a
response from a fixed set.
Generative models (harder)
don’t rely on pre-defined responses. They generate new responses from scratch.
Generative models are typically based on Machine Translation techniques, but
instead of translating from one language to another, we “translate” from an input to
an output (response).

About ChatBot
Use Deep Learning or Not
Using Deep Learning
Using Deep Learning do not guarantee better performance
all the time to compared with using traditional techniques.
It’s more expensive to gather enough data and train heavy model.
Using traditional algorithms
Most of current chatbot systems are based on those traditional algorithms
and It has own strong points to compared with DL algorithms.
형태소 분석
품사 태깅
패턴 매칭
구문 분석
의미 분석
감성 분석
대화 처리
CharCNN
BiLSTMCrf
Seq2Seq
Word2Vec
RNN
DMN
E2E MMN
Attention
DNN
TFIDF
SVM
Dictionary
Bayesian
Logistic
LSA
HMM
USE
BOTH

About ChatBot
Long Conversation vs Short Conversation
Short Conversation
the goal is to create a single response to a single input. For example, you
may receive a specific question from a user and reply with an appropriate
answer.
Long conversation
go through multiple turns and need to keep track of what has been said. Customer
support conversations are typically long conversational threads with multiple
questions.

About ChatBot
Open Domain vs Closed Domain
“Closed Domain
You can ask a limited set of questions on specific topics.
(Easier). What is the Weather in Miami?”
“Open Domain
I can ask a question about any topic… and expect a relevant response.
(Harder) Think of a long conversation around refinancing my mortgage
where I could ask anything.” Mark Clark

OverView Session 1 - Understand NLP
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning BasicNLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
SyntaxNet
Scenario
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론[Retrieval Based] Chat-Bot System
ChatBot
Server
Numpy
Pandas
Tensorflow
파이프 라인 데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
데이터 수집
데이터 전처리
모델 훈련
모델 평가
모델 서비스
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
1 2
3
기본 이론
관련 딥러닝
이론 설명
예제를 통한
구현 설명
Memory Network
Seq2SeqResponse Generation
Ontology
DM
Legacy Data Base
[AI Based] Chat-Bot Research Environment
Data MartMonitoring
Summary Result
Train Data
AI Model
Pipe Line

Session 1 - Contents
1. 자연어 처리 이론
> 일반적으로 자연어를 처리하기 위해 필요한 언어학적 이론 설명
2. 딥러닝 이론
> 자연어 처리 이론에서 이야기하는 문제에 해당하는 딥러닝 이론
3. 구현
> 딥러닝 및 라이브러리 등을 사용한 이론의 구현

About NLP (Natural Language Process)
Mostly Solved Making Good Progress Still Really Hard
Spam Detection
(스팸분석)
Text Categorization
(텍스트 분류)
Part of Speech Tagging
(단어 분석)
Named Entity Recognition
(의미 구분 분석)
Information Extraction
(정보 추출)
Sentiment Analysis
(감정분석)
Coreference Resolution
(같은 단어 복수 참조)
Word Sense
Disambiguation
(복수 의미 분류)
Syntactic Parsing
(구문해석)
Machine Translation
(기계번역)
Semantic Search
(의미 분석 검색)
Question & Answer
(질의 응답)
Textual inference
(문장 추론)
Summarization
(텍스트 요약)
Discourse & Dialog
(대화 & 토론)

Text Categorization
Text Classification assigns one or more classes to a document according to their content. Classes are
selected from a previously established taxonomy (a hierarchy of catergories or classes).
Spam Detection
Spam Detection is also the part of Text Classification problem.
Part of Speech
grammatical tagging or word-category disambiguation, is the process of marking up a word in a
text (corpus) as corresponding to a particular part of speech, based on both its definition and its
context

Low Level Information Extraction

Information Extraction on Broader view
https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwievZKlmMzVAhVCgrwKHbM_D88QFggyMAE&url=https%3A
%2F%2Fweb.stanford.edu%2Fclass%2Fcs124%2Flec%2FInformation_Extraction_and_Named_Entity_Recognition.pptx&usg=AFQjCNFUT9ZjvrDrx
F9su0J9KiWobVP4Kg
Rule Based
Extraction
Named Entity
recognition
Syntax Anal
Relation Search
Ontology
Information
Extraction

Coreference Resolution
I did not vote for the Donald Trump because I think he is too reckless
Coreference resolution is the task of finding all expressions that refer to the same entity in a
text. It is an important step for a lot of higher level NLP tasks that involve natural language
understanding such as document summarization, question answering, and information
extraction.
Deep Reinforcement Learning for Mention-Ranking Coreference Models
Improving Coreference Resolution by Learning Entity-Level Distributed Representations
https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30

Word Sense Disambiguation
[Example]
1. a type of fish
2. tones of low frequency
and the sentences:
1. I went fishing for some sea bass.
2. The bass line of the song is too
weak.
http://www.cs.cornell.edu/courses/cs4740/2014sp/lectures/wsd-1.pdf
supervised way lable data example
simi-supervised way

Syntactic Parsing
syntactic parsing is Find structural relationships between words in a sentence
https://web.stanford.edu/~jurafsky/slp3/12.pdf

Machine translation (MT) is automated translation. It is the process by which computer software is
used to translate a text from one natural language (such as English) to another (such as Spanish).
Machine Translation

Semantic Search
Semantic search seeks to improve search accuracy by understanding a searcher’s intent through
contextual meaning.
Question and Answer
Able to answer questions in natural language based on Knowledge data (usually ontology)
ex) Best example is IBM Watson
Textural Inference
Recognize, generate, or extract pairs <T,H> of natural language
expressions, such that a human who reads (and trusts) T would infer that His most likely also true
Summarization
Extracting interesting parts of the text and create a summary by using these parts of the text and
allow for rephrasings to make summary more grammatically correct.
Discourse & Dialog
Do conversation with understanding the whole history of dialog and semantic meaning of speaker.

Standard Natural Language Process
Spoken Utterance
Lexical (어휘) Analysis : Word Structure
Speech Recognition
Written Utterance
Syntactic (구문) Analysis : Sentence Structure
Morphemes, Word
Semantic (의미) Analysis : Meaning of Words & Sentence
Sentence
Discourse (대화) Analysis : Relationship between sentence
Context beyond Sentence

Lexical Analysis
Syntactic Analysis
Semantic Analysis
NLU Server
(Understand)
NLG Server
(Generate)
Voice Recognition
Discourse Analysis
자연어 처리 이론
기본 이론
Session 1 - Now We are Here!
Response Generation

AI Speaker Alexa Alexa Microphone System
NLP - Voice Recognition

Deep Learning for Classification Hidden Markov Model for Language Model
NLP - Voice Recognition

NLP - Lexical Analysis
Main Factors on Lexical Analysis
1. Sentence Splitting
2. Tokenizing
3. Morphological
4. Part of speech Tagging

Lexical Analysis
What if there is no line change char (‘n’) ? Where is the EOS point?
What if sentence is not separated into words properly with space?
[Examples]
[Problems]

Word stemming lemmatization
Love Lov Love
Loves Lov Love
Loved Lov Love
Loving Lov Love
Innovation Innovat Innovation
Innovations Innovat Innovation
Innovate Innovat Innovate
Innovates Innovat Innovate
Innovative Innovat Innovative
Morphing Examples Stemming & lemmatization
Morphology is process of finding morpheme which is smallest“meaningful unit (Lexical meaning
or grammatical function)” and other features like stem in a language that carries information.
Lexical Analysis

Lexical Analysis
Ambiguity
“that” can be a subordinating conjunction or a relative pronoun
- The fact that/IN you’re here
- A man that/WDT I know
“Around” can be a preposition, particle, or adverb
- I bought it at the shop around/IN the corner.
- I never got around/RP to getting a car.
- A new Toyota Prius costs around/RB $25K.
Degree of ambiguity (in Brown corpus)
- 11.5% of word types (40% of word tokens) are ambiguous
# of Tags 1 2 3 4 5 6 7
# of Words 35340 3760 264 61 12 2 1
#Ambiguity Problem is much serious in Korean
Part-of-speech tagging is one of the most important text analysis tasks used to classify words into
their part-of-speech and label them according the tagset which is a collection of tags used for the pos
tagging. Part-of-speech tagging also known as word classes or lexical categories

Lexical Analysis
Hannanum Kkma Komoran Mecab Twitter
하늘 / N 하늘 / NNG 하늘 / NNG 하늘 / NNG 하늘 / Noun
을 / J 을 / JKO 을 / JKO 을 / JKO 을 / Josa
나 / N 날 / VV 나 / NP 나 / NP 나 / Noun
는 / J 는 / ETD 는 / JX 는 / JX 는 / Josa
자동차 / N 자동차 / NNG 자동차 /
NNG
자동차 /
NNG
자동차 /
Noun
Anal Result Comparison Library Performance Comparison

Lexical Analysis
[Code]

Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
자연어 처리 이론 ML & DL 이론
기본 이론
관련 딥러닝
이론 설명
Session 1 - Now We are Here !
Response Generation
Memory Network
Seq2Seq

(1) Word Segmentation
(2) POS Tagging
(3) Chunking
(4) Clause Identification
(5) Named Entity Recognition
(6) Semantic Role Labeling
(7) Information Extraction
What we can do with sequence labeling What’s sequence labeling
Sequence Labeling

Word POS Chunk NE
West NNP B-NP B-MISC
Indian NNP I-NP I-MISC
all-around NN I-NP O
Phil NNP I-NP B-PER
Simons NNP I-NP I-PER
took VBD B-VP O
four CD B-NP O
for IN B-PP O
38 CD B-NP O
on IN B-PP O
Friday NNP B-NP O
<iob data set example>
POS Tag 의미
ttps://docs.google.com/spreadsheet/ccc?key=0ApcJghR6UMXxdEdU
RGY2YzIwb3dSZ290RFpSaUkzZ0E&usp=sharing
Chunk Tag 의미
B : Begin of Chunk
I : Continuation of Chunk
E: End of Chunk
NP : Noun
VP : Verb
NER BIO Tag 의미
B : Start with new Chunk
I : word inside Chunk
O: Outside of Chunk
Sequence Labeling

BiLSTM-CRF Description
Sequence Labeling with Deep Learning
Deep Learning Basic
Word Embedding
DL FrameWorks
Prerequisite

VIDEO
Deep Learning Basic

New Algorithms
Back Propagation
CNN, RNN .. etc
Big Data
HDFS
MapReduce
Hardware
GPU Parallel Execution
Cloud Service
Deep Learning Basic

3
5
7
9
(1) Problem (2) Algorithm (3) Programming
Y = 2 * X + 1
function(x)
{
return x*2 + 1
}
Deep Learning Basic

3
5
7
9
(1) Problem (2) Algorithm (3) Programming
Y = w * X + b
3
5
7
9
initial
optimized
Deep Learning Basic

Supervised Learning Unsupervised Learning Reinforcement Learning
CAT
CAT
CAT
DOG
DOG
DOG
Deep Learning Basic

1. Perceptron
2. Activation Function
3. Cost
4. Gradient Descent
5. Back Propagation
6. Optimizers
Deep Learning Basic

Deep Learning Basic - Perceptron
wX + b

Deep Learning Basic - Perceptron
wX + b Activation Function

Deep Learning Basic - Activation Function
Logistic Regression Nonlinear Problems

Deep Learning Basic - Activation Function

Deep Learning Basic - Loss (Error)
Initial
Optimized
LOSS
x y y~
0 3 7
1 5 9
2 7 11
3 9 13
4 11 15
5 13 17
6 15 19
Y
X0 1 2 3
Y = wX + b

x y init opt
0 3 7 3
1 5 9 5
2 7 11 7
init : ((7-3)^2 + (9-5)^2 + (6-11)^2) / 3 = 16
opt : ((3-3)^2 + (5-5)^2 + (7-7)^2) / 3 = 0
HOW?
Deep Learning Basic - Loss (Error)
W, b
Cost(W, b)

Deep Learning Basic - Gradient Descent
weight Learning
Rate
gradient

Output Hidden Input
Train Data
Forward Propagation
y-y~
(Error)
Back Propagation
Update
Each Weight
partial derivative
chain rule
Deep Learning Basic - BackPropagation

Deep Learning Basic - Optimizer
https://www.youtube.com/watch?v=hMLUgM6kTp8

SGD
Adagrad
RMS
Momentum
Nag
Adadelta
Adam
Adaptive 계열 알고리즘
기존 진행 방향 반영
가속도 개념의 적용
Momentum과 유사
이동 위치에서 반영
2차 미분 값 활용
느린 것은 더 빨리
빠른 것은 더 꼼꼼히
누적 Gradient 를 Sum이
아닌 지수평균으로대체하여
G가 무한이 커지는 것을 방지
지수평균 사용, StepSize
변화 값의 제곱 사용
Adadelta, Momentum
특성 두 가지 모두 적용
http://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html

https://arxiv.org/pdf/1705.08292.pdf
"Gradient descent (GD)나 Stochastic gradient descent (SGD)를 이용하여 찾은 solution이
다른 adaptive methods (e.g. AdaGrad, RMSprop, and Adam)으로 찾은 solution보다 훨씬
generalization 측면에서 뛰어나다."
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia C. Wilson] , Rebecca Roelofs] ,
Mitchell Stern] , Nathan Srebro† , and Benjamin Recht]∗ ] University of California, Berkeley. † Toyota
Technological Institute at Chicago May 24, 2017
There is no optimizer best for all cases!!
When to use adaptive optimizer?
If input embedding vectors are sparse, it’s better to use adaptive optimizer!

# tf Graph input
x = tf.placeholder("float", [None, 784])
y = tf.placeholder("float", [None, 10])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([784, 256])),
'h2': tf.Variable(tf.random_normal([256, 256])),
'out': tf.Variable(tf.random_normal([256, 10]))
}
biases = {
'b1': tf.Variable(tf.random_normal([256])),
'b2': tf.Variable(tf.random_normal([256])),
'out': tf.Variable(tf.random_normal([10]))
}
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.relu(layer_2)
# Output layer with linear activation
pred = tf.matmul(layer_2, weights['out']) + biases['out']
hypothesis = tf.nn.softmax(pred )
# Define loss and optimizer
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis),
reduction_indices=1))
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
input Hidden Out
784
256
10
Hidden
256
784
256
784 256
256 10
256
S
O
F
T
M
A
X
Y=Activation(W*x + b)
[Error]
Cross
Entropy
W W1
A(W*x + b)
b
b
A(W*x + b)x
2
1
3
4
5
256
784
1
Deep Learning Basic

START 오늘 날씨 는 ? PAD PAD END
START 오늘 날씨 는 어때 ? PAD END
START 오늘 비가 오 려 나 ? END
Case of long sentence …
Vanishing Problem happens
Various length of data cause
waste of computing power
Here we have concept of Dynamic RNN
BiDirectional Lstm learn given data from backward Long Short Term Memory Cell
Cell State
https://brunch.co.kr/@chris-song/9
updateforget out
cell state
https://blog.altoros.com/the-magic-behind-google-translate-
sequence-to-sequence-models-and-tensorflow.html
Deep Learning Basic

Deep Learning Basic
Overfitting
Fine Tuning
Multi Tasking
Ensemble
Data Preprocessing
Drop Out
Batch Normalization
Network Compression
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdfhttps://arxiv.org/pdf/1510.00149.pdf
Adam+SGD
Learning Rate
Decaying
Fully Convolutional
1by1 Convolutional Filter
Quantize Neural
Networks
AutoML
Hyper Parameter
Random Search
Grid
Search
Genetic
Algorithm

Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
기본 이론
관련 딥러닝
이론 설명
Numpy
Pandas
Tensorflow
데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
구현
Response Generation
Memory Network
Seq2Seq

https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software#cite_note-29
NLP - Lexical Analysis - Implementation
Deep Learning Framework comparison
pytorch

dynamic vs static graph definition
Debugging Visualization
Deployment
VS

Deep Learning Framework - Tensorflow
with tf.Graph().as_default() :
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
Tensorflow : static graph definition Pytorch : dynamic graph definition

https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106
https://blog.paperspace.com/which-ml-framework-should-i-use/

Graph (Edge + Node)
+
Session

https://github.com/TensorMSA/tensormsa_jupyter/blob/master/chap03_basic_models/linear_regressions.ipynb

Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
기본 이론
관련 딥러닝
이론 설명
Response Generation
Memory Network
Seq2Seq

Word Embedding 이란 ?
텍스트를 구성하는 하나의 음소, 음절, 단어, 문장, 문서
단위를 수치화하여 표현하는 방법의 일종

NLP - Lexical Analysis - Word Embedding
Word Representation
Discrete Representation
WordNet OneHot Vector
Distributed Representation
Direct Prediction
Word2Vec
Count Based
Full Document Windows
LSA SVD of x Glove
FastText

WordNet
과거에는 WordNet과 같은 방법을 사용했다. WordNet이란, 각 단어끼리의 관계(상위단어, 동의어) 가 나타나 있는 트리구조의 그래프 모형이다.
물론 이를 구축하기 위한 작업은 전부 사람이 했다. 그러다보니 주관적이고 유지하는데 있어 많은 노동이 필요하다는 한계가 존재했다.

OneHot Vector

LSA(잠재적 의미 분석) with SVD(특이값 분해)
https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/06/pcasvdlsa/
- doc1 doc2 doc3
나 1 0 0
는 1 1 2
학교 1 1 0
에 1 1 0
가 1 1 0
ㄴ 1 0 0
다 1 0 1
영희 0 1 1
좋 0 0 1
truncated SVDSVD
LSA(잠재적 의미 분석)

SVD of X
https://swalloow.github.io/cs224d-lecture2
이 방법은 Window의 길이 (일반적으로 5 - 10) 에 따라 대칭적으로 이동하면서 확인하는 방법이다.
● I like deep learning.
● I like NLP.
● I enjoy flying
위와 같은 corpus가 있을 때, 이를 matrix로 표현하면 다음과 같다. 간단히 보면 각 단어의 빈도 수를 체크한 것이다.
SVD 로 차원 축소Window size로 빈도 조사 결과

https://www.tensorflow.org/tutorials/word2vec
http://w.elnn.kr/search/
Word2Vector Demo Site
장점 : 차원의 축소 , 의미적 유사성의 표현
단점 : 동음이의어 처리, 데이터 적을 경우 신경망 훈련시 신호 강도
Word2Vec

C-Bow
the quick brown fox jumped over the lazy dog
([brown, jumped], fox)
window size : 1
brown
jumped
over
the
.
.
brown
jumped
over
fox
.
.
Input OutputHidden
Hidden Size Hidden Size
Vocab
Size
Data Set
Original
Text
Word2Vec

(fox, brown), (fox, jumped)
window size : 1
brown
jumped
over
the
.
.
brown
jumped
over
fox
.
.
Input OutputHidden
Hidden Size Hidden Size
Vocab
Size
Data Set
Original
Text
Skip-Gram
Word2Vec

(1)PV-DM (2)PV-DBOW
(3)DM + DBOW (Vector Concat)
W2V W2V W2V
(4)AVG(TF-IDF * W2V)
(paragraph, the)
(paragraph, quick)
(paragraph, brown)
(paragraph, fox)
(paragraph, jumped)
([paragraph, quick, brown,
fox, juped], over)
([paragraph, quick, brown,
fox, juped,over],the)
vector vector vector
TF-IDF TF-IDF TF-IDF
X X X
vector
AVG
Doc2Vec

tfidf(t,d,D) = tf(t,d) x idf(t,D)
https://thinkwarelab.wordpress.com/2016/11/14/ir-tf-idf-%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B4%85%EC%8B%9C%EB%8B%A4/
http://www.popit.kr/bm25-elasticsearch-5-0%EC%97%90%EC%84%9C-%EA%B2%80%EC%83%89%ED%95%98%EB%8A%94-%EC%83%88%EB%A1%9C%EC%9A%B4-%EB%B0%A9%EB%B2%95/
Not exactly word embedding but used on nlp with deep learning pretty often
- Document similarity
- Words importance on document
- Used on search engine (like elasticsearch though it use BM25 for now)
TF-IDF

- Introduce several ways to embed char as vector
안 녕 하 세 요
1
가 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
나 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
다 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
라 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
마 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
바 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
사 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
아 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
자 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
An Neung Ha Se Yo (ㅇ ㅏ ㄴ) (ㄴ ㅕ ㅇ) . . . .
2
a 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
e 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3
ㄱ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ㄴ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ㄷ 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ㄹ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ㅁ 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
ㅂ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
ㅅ 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
ㅇ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
ㅈ 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Char Embeding

0.2 0.1 0.4 0.21 0 0 0
f o x fox
Word2Vector
0 1 0 0 0 0 1 0
OneHot
Encoding
OneHot
Encoding
OneHot
Encoding
1.Word2Vec 계열은 의미적 상관성을 잘 표현
2.OneHot 은 강한 신호적 특성으로 Train 에 효과적
3.Word 단위 Embedding 은 단어를 잘 기억함
4.Char 단위 Embedding 은 미훈련 단어 처리에 용이
+
Char +Word Concat

Words not exactly matched with the pretrained dict will return “UNKNOWN”
So FastText (by Facebook ) use ngram on their word embedding algorithm..
에어컨 ~ 에어조단 비교
에어컨
['$$에', '$에어', '에어컨', '어컨$', '컨$$'] => 5
에어조단
['$$에', '$에어', '에어조', '어조단', '조단$', '단$$'] => 6
일치
['$$에', '$에어'] => 2
점수
일치 2건 / 중복제거 전체 7건 => 0.2222
FastText

Glove
(their dot product equals the logarithm of the words’ probability of co-occurrence) “임베딩된 단어벡터 간 유사도 측정을 수월하게 하면서도 말뭉치
전체의 통계 정보를 좀 더 잘 반영해보자”가 GloVe가 지향하는 핵심 목표라 말할 수 있을 것 같습니다.
동시 등장 확률
https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/04/09/glove/
Glove 는 특정 문맥 단어가 주어졌을 때 임베딩된 두 단어벡터의 내적이 두 단어의 동시 등장 확률 간 비율이 되게끔 단어를 임베딩 하고자 하였음

Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
Deep Learning Basic
NLU Server
(Understand)
NLG Server
(Generate)
SyntaxNet
Voice Recognition
Discourse Analysis
기본 이론 관련 딥러닝
이론 설명
Numpy
Pandas
Tensorflow
데이터 처리 ML & DL Library
Scikit Learn
Konlpy
개발 관련
구현
Response Generation
Memory Network
Seq2Seq

OneHot Encoding : Simple Test Code show concept of onehot
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/wordembedding/
[Code]

Word2Vector : Using Gensim word2vec package

FastText : FaceBook fasttext with gensim wrapper

FastText : Possible to use pretrained vector and do find tuning on it
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md

N-grams are simply all combinations of adjacent words or letters of length n that you can
find in your source text.

For large dataset word2vec training GPU acceleration is needed
You can also think about using Tensorflow or Keras for training model
https://github.com/SimonPavlik/word2vec-keras-in-gensim/blob/keras106/word2veckeras/word2veckeras.py
https://github.com/tensorflow/models/blob/master/tutorials/embedding/word2vec.py

NLP - Lexical Analysis - DL ALgorithms
Paper Model CoNLL 2003 (F1 %)
Collobert et al.(2011) MLP with word embeddings+gazetteer 89.59
Passos et al.(2014) Lexicon Infused Phrase Embeddings 90.90
Chiu and Nichols(2015) Bi-LSTM with word+char+lexicon embeddings 90.77
Luo et al.(2015) Semi-CRF jointly trained with linking 91.20
Lample et al.(2016) Bi-LSTM-CRF with word+char embeddings 90.94
Lample et al.(2016) Bi-LSTM with word+char embeddings 89.15
https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/
NER (Named Entity Recognition) Algorithm Performance

NLP - Lexical Analysis - DL ALgorithms
what do we want to do with this algorithm?

NLP - Lexical Analysis - BiLstmCrf
김승우 B-PERSON
전화번호 B-TARGET
검색 O
김승우 B-PERSON
이메일 B-TARGET
검색 O
김승우 B-PERSON
이미지 B-TARGET
검색 O
IOB Data
김승우 전화번호 검색
김승우 이메일 검색
김승우 이미지 검색
Plain Data
Sentence
Splitting
Token Morphing
Part of
Speech
Tagging
Lexical Analysis
Word2Vector
OneHot Encoding
1 0 0 0
0 1 0 0
0 0 1 0
김승우
전화번호
이메일
검색
B-PERSON
B-TARGET
김
우
승
Index
List

김승우
전화번호
이메일
검색
B-PERSON
B-TARGET
김
우
승
Index
List
[Code]

김
우
승
김승우
전화번호
이메일
Concat Vector
[Code]

Concat Vector
김승우
전화번호
이메일
검색
B-PERSONB-TARGET
BiLstm
Fully Connected Layer
B-? B-? B-?
[Code]

Conditional Random Field Soft Max
[Code]

http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
Probabilistic Model for sequence data segmentation and labeling
https://www.slideshare.net/kanimozhiu/tdm-probabilistic-models-part-2
he first method makes local choices. In other words, even if we capture some information from the
context in our hh thanks to the bi-LSTM, the tagging decision is still local. We don’t make use of the
neighbooring tagging decisions. For instance, in New York, the fact that we are tagging York as a
location should help us to decide that New corresponds to the beginning of a location. Given a
sequence of words w1,…,wmw1,…,wm, a sequence of score vectors s1,…,sms1,…,sm and a
sequence of tags y1,…,ymy1,…,ym, a linear-chain CRF defines a global score s∈Rs∈R

Real Project BiLstm Result Sample Code Predict Test Result
Test data Not Included in Train Set
Predicts well
http://ip:8888/tree/tensormsa_jupyter/chap05_nlp/sequence_tagging/

Lexical Analysis
Syntactic Analysis
Semantic Analysis
NLU Server
(Understand)
NLG Server
(Generate)
Voice Recognition
Discourse Analysis
자연어 처리 이론
기본 이론
Response Generation

NLP - Lexical Analysis - SyntaxNet
구문 분석(構文分析, 문화어: 구문해석, 문장해석)은 문장을 그것을 이루고 있는
구성 성분으로 분해하고 그들 사이의 위계 관계를 분석하여 문장의 구조를
결정하는 것을 말한다.
Graph-Based Models Transition-Based Models
CYK Style Parsing MST finding Algorithm Projective & Non Projective Model

NLP - Syntactic Analysis
Transition-Based Models
Sentence W
Repeat until all words have their head
- Select two target words in data structure
(One dependent & one head candidate)
- Deterministically predict next parsing action from parsing model
- Modify structure according parsing action
C0 -> C1 -> C2 -> ……..C8 -> C9 -> C10 -> .… -> Cm D-tree
t1 t2 t3 t8 t9 t10 tm
Oracle
(Classifier)
Predict the best
transition

Transition-Based Models - Arc Eager Transition System

Assume that we are given an oracle :
- for any non-terminal configuration, it can predict the correct transition
(for deterministic parsing)
- That is, it takes two words & magically gives us the dependency
relation b/w item if one exists

Shift :
Move Economic from buffer B to stack S

Left-arc :
Add left-arc (had, news, nsubj) to A
Remove news from stack (since it now has head in A)

Right-arc :
Add right-arc (ROOT, had, root) to A
keep had in stack : because it can have other dependents on the right

Left-arc :
Add left-arc (effect, little, amod) to A
Remove little from stack (since it now has head in A)

Right-arc :
Add right-arc (had, effect, dobj) to A
Keep effect in stack : because it can have other dependents on right

Right-arc :
Add right-arc (effect, on, prep) to A
Keep on in stack : because it can have other dependents on the right

Shift :
Move financial from buffer B to stack S

Left-arc :
Add left-arc (market, financial, amod) to A
Remove financial from stack (since it now has head in A)

Right-arc :
Add right-arc (on, markets, pmod) to A
Keep markets in stack : because it can have other dependents on the right

Reduce :
Remove markets, on, effect from stack (since they already have head in A)
※ All decisions like right-arc, left-arc, reduce, shift will be made by oracle

Right-arc :
Add right-arc (had, period, p) to A
Keep period in stack
Done !

NLP - Syntactic Analysis - SyntaxNet
Parsing type Paper Model WSJ
Dependency
Parsing
Chen and
Manning(2014)
Fully-connected NN with features
including POS
91.8/89.6
(UAS/LAS)
Dependency
Parsing
Weiss et al.(2015) Deep fully-connected NN with features
including POS
94.3/92.4
(UAS/LAS)
Dependency
Parsing
Dyer et al.(2015) Stack LSTM 93.1/90.9
(UAS/LAS)
Constituency
Parsing
Petrov et al.(2006) Probabilistic context-free grammars
(PCFG)
91.8 (F1 Score)
Constituency
Parsing
Zhu et al.(2013) Feature-based transition parsing 91.3 (F1 Score)
Constituency
Parsing
Vinyals et
al.(2015b)
seq2seq learning with LSTM+Attention 93.5 (F1 Score)
Syntax Parsing Algorithm Performance
파싱(parsing, 구문분석)에는 두 가지 유형이 있다. 하나는 개별 단어를 이들 사이의 관계를 고려해 연결하는 의존구문분석(dependency
parsing)과 텍스트를 반복적으로 하위 구문으로 분리하는 구성성분분석(constituency parsing)이다.

We show this layout in the schematic below: the state of the system (a stack and a buffer, visualized
below for both the POS and the dependency parsing task) is used to extract sparse features, which
are fed into the network in groups. We show only a small subset of the features to simplify the
presentation in the schematic
Google SyntaxNet with Deep Learning - Pos Tagging

Google SyntaxNet with Deep Learning - A Fast and Accurate Dependency Parser using Neural Networks
1 2 3
1 I _ PRP PRP _ 2 nsubj _ _
2 knew _ VBD VBD _ 0 ROOT _ _
3 I _ PRP PRP _ 5 nsubj _ _
4 could _ MD MD _ 5 aux _ _
5 do _ VB VB _ 2 ccomp _ _
6 it _ PRP PRP _ 5 dobj _ _
7 properly _ RB RB _ 5 advmod _ _
8 if _ IN IN _ 9 mark _ _
9 given _ VBN VBN _ 5 advcl _ _
10 the _ DT DT _ 12 det _ _
11 right _ JJ JJ _ 12 amod _ _
12 kind _ NN NN _ 9 dobj _ _
13 of _ IN IN _ 12 prep _ _
14 support _ NN NN _ 13 pobj _ _
15 . _ . . _ 2 punct _ _
18 units
(1),(2),(3)
18 units
(1),(2),(3)
12 units
(2),(3)
(1) The top 3 words on the stack and buffer: s1, s2, s3, b1, b2, b3; => 6
(2) The first and second leftmost / rightmost children of the top two words
on the stack: lc1(si), rc1(si), lc2(si), rc2(si), i = 1, 2. => 8
(3) The leftmost of leftmost / rightmost of rightmost children of the top two
words on the stack: lc1(lc1(si)), rc1(rc1(si)), i = 1, 2. => 4

Google SyntaxNet with Deep Learning - Local Parser
1. SHIFT: Push another word onto the top of the stack, i.e. shifting one token from the buffer to
the stack.
2. LEFT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an
arc pointing to the left. Push the first word back on the stack.
3. RIGHT_ARC: Pop the top two words from the stack. Attach the second to the first, creating an
arc point to the right. Push the second word back on the stack.

As we describe in the paper, there are several problems with the locally normalized models we just
trained. The most important is the label-bias problem: the model doesn't learn what a good parse
looks like, only what action to take given a history of gold decisions. This is because the scores are
normalized locally using a softmax for each decision.
Google SyntaxNet with Deep Learning - Global Training

What’s Beam Search Algorithm on RNN ?
https://www.youtube.com/watch?v=UXW6Cs82UKo
Instead of try only the best every iteration, try all cases to the end and choose the sum is maximum.
But if you try to calculate all cases algorithms will be too heavy, so remain only the best few every
step and remove others (pruning). This is for find global maximum predict result .

Follow best every step may can miss chance to find global optimal case

Consider all cases will require too much computing power

Remove low score cases for every step (Pruning)

http://universaldependencies.org/
Google SyntaxNet do not support Korean as a default language.
But as we can see bellow, we can train the model with Sejong corpus data.
Though we have to covert the format for SyntaxNet to understand.
Google SyntaxNet with Deep Learning - How about Korean

Demo Site (we also use samples on this site)
http://sejongpsg.ddns.net/syntaxnet/psg_tree.htm
SyntaxNet Korean with Docker (We pretrained Korean corpus and set up webserver for service)
https://github.com/TensorMSA/tensormsa_syntax_docker
Google SyntaxNet with Deep Learning - Test it by yourself

NLP - Semantic Analysis
Sentential semantics
- Semantic role labeling (SRL)
- Phrase similarity (=paraphrase)
- Sentence Classification, Sentence Emotion Analysis and etc
What is Semantic in study of language
Three perspectives on meaning
- Lexical semantics : individual words
- Sentential semantics : individual sentences
- Discourse or Pragmatics : longer piece of text or conversation
NLP Tasks for Semantics

NLP - Semantic Analysis
What is Semantic Role Labeling (SRL)
SRL = Semantic roles express the abstract role that arguments of a predicate
can take in the event.
The police arrested the suspect in the park last night
Agent predicate Theme Location Time
Who did what to whom where when
Can we figure out that these sentences have the same meaning?
Can we figure out the bought, sold, purchase used on sentence with same meaning?
XYZ corporation bought the stock.
The sold the stock to XYZ corporation.
The stock was bought by XYZ corporation.
The purchase of the stock by XYZ corporation.

NLP - Semantic Analysis - Semantic Role Labeling
Common Semantic Role Labeling Architecture
http://naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl-2013-tutorial.pdf
Syntatic
Parse
Argument
Identification
Argument
Classification
Structural
Inference
Prune
Constituents
Candidates
Semantic
roles
Arguments
Step-1 Candidate Selection
- Parse the sentence
- Prune/filter the parse tree
(eliminate some tree constituents to speed up the execution)
Step-2 Argument Identification
- A binary classification of each node as Argument or NONE
- Local scoring
Step-3 Argument Classification
- A multi class (one-of-N) classification of all the argument candidates
- Global /joint scoring
ML
ML
ML

Paper Model CoNLL2005 (F1
%)
CoNLL2012 (F1
%)
Collobert et
al.(2011)
CNN with parsing features 76.06
Tackstrom et
al.(2015)
Manual features with DP for inference 78.6 79.4
Zhou and
Xu(2015)
Bidirectional LSTM 81.07 81.27
He et al.(2017) Bidirectional LSTM with highway
connections
83.2 83.4
의미역 결정(Semantic Role Labeling, SRL)은 문장에서 술어(predicate)-논항(argument) 구조를 발견하는 것을 목표로 한다. 각 목표 동사(술어)에 대해, 동사의 의미역을
취하는 문장의 모든 구성요소가 인식된다. 전형적인 의미 논항은 행위주, 대상, 도구 등이며 위치, 시간, 방법, 원인 등도 포함된다(Zhou and Xu, 2015). 표7은 CoNLL 2015 및
2012 데이터셋에서 여러 모델의 성능을 보여준다.
전통적인 SRL 시스템은 여러 단계로 구성된다. 파싱 트리를 생성한 뒤 트리의 노드가 주어진 동사의 논항을 나타내는지 판별한 다음, 해당 SRL 태그를 결정한다. 각 분류
과정은 많은 피처를 추출하여 통계 모델(statistical model)로 전달하는 과정을 대개 수반한다. (Collobert et al., 2011)
Tackstrom et al. (2015)는 술어가 주어지면 파싱 트리를 기반으로 하는 일련의 피처로 구성요소의 범위와 해당 술어에 대한 의미역 후보들에 점수를 매긴다. 그들은 효율적인
추론을 위한 동적 프로그래밍(dynamic programming) 알고리즘을 제안했다. Collobert et al., (2011)은 추가적인 참조 테이블의 형태로 제공된 파싱 정보에 의해 보강된 CNN을
사용하여 유사한 결과를 얻었다. Zhou and Xu(2015)는 임의의 긴 문맥을 모델링하기 위해 bidirectional LSTM을 제안했는데, 파싱 트리 정보 없이도 성공적인 것으로
판명되었다. He et al. (2017)은 이 연구를 더욱 확장해 ‘highway connection’을 소개했다.
LSTM is effective of SRL problem too !

Bidirectional LSTM with highway connections
Stack more layers on RNN with highway technique !
https://homes.cs.washington.edu/~luheng/files/acl2017_hllz.pdf

Semantic Role Labeling Applications
Information : Anna is friend of mine.
http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/neo4j/neo4j_basic.ipynb
Who WhoWhat
session.run("MATCH (you:Person {name:'You'})"
"FOREACH (name in ['Anna'] |"
" CREATE (you)-[:FRIEND]->(:Person {name:name}))")
result = session.run("MATCH (you {name:'You'})-[:FRIEND]->(yourFriends)"
"RETURN you, yourFriends")
Neo4j Insert Query
Neo4j Jupyter example & visualize

NLP - Semantic Analysis - CharCNN
What kind of problem we want to solve ?
Can we figure out that these sentences are positive or negative?
돈이 아깝지 않다 (긍정)
다시는 오지 않을 거야 (부정)
음식이 정말 맛이 없다 (부정)
이 식당은 정말 맛있다 (긍정)
Analysis negative and positive with dictionary
word “않다” is usually negative but ?
돈이 아깝지 않다 => Positive
다시는 오지 않을 거야 => Negative

There are many ways of doing text classification..
Traditional Rule based Machine Learning - Logistic & SVM
Deep Learning - CharCNN, RNN, Etc..

Paper Model SST-1 SST-2
Socher et al.(2013) Recursive Neural Tensor Network 45.7 85.4
Kim(2014) Multichannel CNN 47.4 88.1
Kalchbrenner et al.(2014) DCNN with k-max pooling 48.5 86.8
Tai et al.(2015) Bidirectional LSTM 48.5 87.2
Le and Mikolov(2014) Paragraph Vector 48.7 87.8
Tai et al.(2015) Constituency Tree-LSTM 51.0 88.0
Kumar et al.(2015) DMN 52.1 88.6
Semantic Analysis - CharCNN

http://localhost:8888/notebooks/tensormsa_jupyter/chap05_nlp/charcnn/charcnn.ipynb
Deep Learning Method CharCNN can be a solution for this kind of problem.
1 2

Preparing Data for embedding is pretty similar to other neural networks
1. Word Embedding & OneHot didn’t show that much difference.
2. Personally, prefer to concat char onehot + word2vector오늘
메뉴
는
뭐
지?
PAD
PAD
1. Need to define sentence max length
2. Need padding like other nlp neural networks

Using Multi Convolution Filter Size

Other steps are same (fully connected > softmax > loss> optimizer)

You can see Char CNN can distinguish two sentences

NLP - Discourse Analysis
Paper Model bAbI (Mean accuracy
%)
Farbes (Accuracy
%)
Fader et al.(2013) Paraphrase-driven lexicon
learning
0.54
Bordes et al.(2014) Weekly supervised embedding 0.73
Weston et al.(2014) Memory Networks 93.3 0.83
Sukhbaatar et
al.(2015)
End-to-end Memory Networks 88.4
Kumar et al.(2015) DMN 93.6
Discourse Analysis - End2End Memory Network

Discourse Analysis - End2End Memory Network
https://arxiv.org/pdf/1503.08895v4.pdf https://arxiv.org/pdf/1503.08895v4.pdf
NLP - Discourse Analysis

Here is the network architecture of end2end memory network
https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/
https://www.slideshare.net/mobile/carpedm20/ss-63116251
NLP - Discourse Analysis - Memory Network

(1) Feed data (“Sentences”, “Question”, “Target”)
1
2
3

Convert word index to embedding vector (Training target vector A,B,C)
1
3
Vocab
Size
2 Dim
Size
vocab size
Mem Size

Embedding A from given context sentences multiply Input Question Embedding (using embedding B
which is not defined on this code) ※ if it’s a first layer, if not it would be output of t-1 layer
1
2 1
2
multiply

NLP - Lexical Analysis - Memory Network
Set embedding C(on the code it’s B) this is also the target variable for train

Embedding C(one the code it’s B) Multiply softmax result

For the last multiply question and output of memory network again

stack more memory layers

Set fully connected layer and calculate error with softmax cross entropy

On the given code I removed 90% of data set because we are using CPU for education..
So result may can be poor…..

https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/
https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano
Dynamic Memory Networks Episodic Memory
Other types of memory networks ..

Seq2Seq 모델은 기계번역, 요약, 간단한 질답 등 말 그대로 Input 과 Output 이 모두 Sequence Data 인
다양한 케이스에 적용이 가능하며, 이를 간단한 트릭을 적용하여 답변을 생성하는 용도로 사용할 수 있다.
- Input : 딥 러닝 재미 즐거운 일
- Output : 딥 러닝은 재미있고 즐거운 일이다
https://www.slideshare.net/KeonKim/attention-mechanisms-with-tensorflow
NLP - Response Generator - Seq2Seq
https://nlp.stanford.edu/pubs/emnlp15_attn.pdf

NLP - Response Generator - Attention Mechanism
Attention Mechanism on Machine Translation
https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb

NLP - Response Generator - Attention Mechanism
Attention Mechanism on Machine Translation
Bahdanau
http://aclweb.org/anthology/D15-1166
Luong
https://blog.heuritech.com/2016/01/20/attention-mechanism
LocalGlobal Input Feeding

NLP - Response Generator - Bahdanau
https://blog.heuritech.com/2016/01/20/attention-mechanism/
Without Attention Mechanism With Attention Mechanism

NLP - Response Generator - Bahdanau
1.embedding layer with inputs
○ embedded = embedding(last_rnn_output)
2.attention layer with inputs and outputs , normalized to create
○ attn_energies[j] = attn_layer(last_hidden, encoder_outputs[j])
○ attn_weights = normalize(attn_energies)
3.context vector as an attention-weighted average of encoder outputs
○ context = sum(attn_weights * encoder_outputs)
4.RNN layer(s) with inputs and internal hidden state, outputting
○ rnn_input = concat(embedded, context)
○ rnn_output, rnn_hidden = rnn(rnn_input, last_hidden)
5.an output layer with inputs , outputting
○ output = out(embedded, rnn_output, context)

NLP - Response Generator - Implementation
http://localhost:8888/tree/chap05_nlp/attention_seq2seq
data_util
(1)Data Processing & Feed Data

(2)Word Embedding

(3)Encoder

(4)Attention

(5)Decoder & Attention

(6)Loss & Optimization

(7)Inference Task

NLP - Response Generator - Seq2Seq
Pointer Network
https://medium.com/@devnag/pointer-networks-in-tensorflow-with-sample-code-14645063f264
논문 저자들은 “포인터 네트워크"라는 새로운 뉴럴넷 구조를
제안합니다. 포인터 네트워크는 집중 메커니즘을 가진 seq2seq
구조로, 입력의 "인덱스"를 출력합니다. 출력 보카가 입력 시퀀스의
길이에 따라 달라지므로 다양한 크기의 입력을 다룰 수 있다는
장점이 있습니다. (주석: 기존의 seq2seq나 뉴럴 튜링 머신은
고정된 길이만 다룰 수 있었습니다.) 여기서 사용한 집중
메커니즘은 표준 seq2seq 집중 메커니즘을 살짝 변형했으며
O(n^2)의 시간 복잡도를 갖습니다.
논문 저자들은 제안한 구조를 평가하기 위해 컨벡스 헐, 딜루나이
삼각화, 순환 판매원 문제(TSP) 등 입력의 위치(순서)를 정답으로
출력해야하는 과제를 사용했습니다. 그 결과 포인터 네트워크는 잘
작동했고, 심지어 학습 데이터보다 더 긴 길이의 시퀀스에서도
동작했습니다.
What else ?

Session 2 - 강의 목표
Sessionn 1에서 배운 NLP에 대한 이해를 바탕으로 AI를 적용하여 전체
아키텍쳐를 이해하고 피자 주문 봇을 바탕으로 수강생분들이 자기만의
ChatBot을 만들어 가는 것을 목표로 함
Session 2 - Make ChatBot

Session 2 : Susang Kim healess1@gmail.com
●Chatbot Develover
○ Released in POSCO (Find people using by NLP/AI)
○ Deep Learning MSA (ML,DNN, CNN, RNN)
●Agile Develover (worked at Pivotal Labs)
○ TDD, CI, Pair programming, User Story
●iOS Develover (Ranked App store in 100th - 2011 Korea)
●Front-End Developer (React, D3, Typescript and ES6)
●OSS world Challenge 2017 (on top 12 , on progress now)
●POSCO MES ... (working at POSCO ICT for 10 year)

Facebook AI shut down after creating their own language
논문 https://arxiv.org/abs/1706.05125

Remind of Session 1
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
SyntaxNet
Scenario
Voice Recognition
Discourse Analysis
ChatBot
Server
Numpy
Pandas
Tensorflow
Scikit Learn
Konlpy
개발 관련
데이터 수집
데이터 전처리
모델 훈련
모델 평가
모델 서비스
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message 기본 이론
관련 딥러닝
이론 설명
예제를 통한
구현 설명
Memory Network
Ontology
DM
Legacy Data Base
Data MartMonitoring
Summary Result
Train Data
AI Model
Pipe Line

Session 2 - Make Chatbot
[출처 Deview 2016 - https://deview.kr/2016/schedule#session/176]
요즘 왜 Chatbot이 뜨는가??
직관적인 UX
일관성 있는 경험
음성과 연결 가능
별도 App 설치 필요 없음
다양한 서비스와 연결 가능
빠른 Feedback
플랫폼에 독립

Chatbot의 특징
• 많은 기술이 필요 (NLP, AI, F/W, Text Mining and 다양한 개발 skill)
• Deep Learning을 공부하는 입장에서 결과 확인이 빠름
- 적은 Computing으로 빠른 결과확인 가능 (Text 기반)
• 재미가 있음(Micro Data처리에 비해 Biz dependency가 적은편)
- 이미지(CNN)이나 정형Data(DNN)보다는 Data처리에 대한 부담감이 적음
(형태소 분석기등으로 쉽게 전처리 쓴다는 가정하에)
• 응용분야가 많음 (API기반의 다양한 서비스 연결 Smart Management)
- Intent와 Slot만 채워주면 어느 서비스와 연결가능
• 관련 오픈소스가 적어 블루오션 (한글은 대부분 자체개발해야함)
- 다행인건 딥러닝 기반의 언어독립적 Text algorithm이 많이 공개되어 활용 가능
• Bot Service가 있으나 가격부담, 한국어는 잘안됨, Customize 불가

Session 2 - Understand Chatbot
Chatbot은?
AI
(패턴,맥락)
언어학
(자연언어처리)
프로그래밍
(Data처리-Python)
Bot F/W
(Story/Slot설계)
Architecture
(응답속도)
Text Mining
(Data구성)
Chatbot
Chatbot 구현을 위해서는 많은 분야의 다양한 기술 필요

다양한 Chatbot Platform이 존재는하고 있음
API.AI로 코딩없이 챗봇 만들기 https://calyfactory.github.io/api.ai-chatbot/
모든 챗봇에는 의도와 개체인식이
존재 또한 그 것을 위해서는 Data가
중요함!!!
api.ai에 가입해서 챗봇을
만들어보면서 원리를 파악해보면
도움이 됨

Closed Domain vs Open Domain
Rule Based
General
(abstract)
Open
Closed
Retrieval
(accuracy)
Impossible Strong AI
Weak AI
level of difficulty
작은 Biz 도메인으로 시작해서 정확도를 높이면서 여러 Biz를 추가하는 상황

Rule Based vs AI
Computer
Input
Program
Output
Rule
이름, 지역, 팀등 조건별로 일일이 rule을 등록해야한다
- 정확도는 올라가나 모든 질문을 다 등록??
(룰을 백만개 등록하면 가능)
Computer
Input
Output
Program
AI(ML, DL)
라벨링된 Data만으로 결과를 구할 수 있는 모델을 만들 수 있다
- 비슷한 Data들도 잘찾는편(Word2Vec,Glove)
intent = 판교에 근무하는 김수상 찾아줘 => Intent : 특정 지역 사람 찾아줘
NER = 판교에 근무하는 김수상 찾아줘 => NER : B-Loc O O B-Name O
정확한 결과를
얻을 수 있으나
모든 질문은 불가
비슷한 유형의
질문은 적당히 잘
찾아줌 Data가
많을 수록 정확도
향상(학습효과)
If (loc = 판교 and comp = 포스코ICT)
person = 김수상
elif (loc = 판교 and comp = SK)
person = 가나다
else
person = 홍길동

Make ChatBot Now
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Word Embedding
BilstmCrf
CharCNN
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
SyntaxNet
Scenario
Voice Recognition
Discourse Analysis
ChatBot
Server
Numpy
Pandas
Tensorflow
Scikit Learn
Konlpy
개발 관련
데이터 수집
데이터 전처리
모델 훈련
모델 평가
모델 서비스
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message 기본 이론
관련 딥러닝
이론 설명
예제를 통한
구현 설명
Memory Network
Ontology
DM
Legacy Data Base
Data MartMonitoring
Summary Result
Train Data
AI Model
Pipe Line
This Lesson

나만의 ChatBot를 만들어보자
피자 주문 챗봇을 어떻게 만들지?
피자를 주문하려면 피자 종류도 여러가지고, 사이즈도 다양하고,장소와
날짜, 사이드메뉴도등 다양한데 어떻게 ChatBot으로 만들 수 있을까?
⇒ 피자주문과 관련된 스토리가 구성되야함
⇒ 딥러닝과 적당한 로직으로 피자 주문 Bot을 만들어보자

NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
질문 : 판교에 포스코ICT에 배달해줘
답변 : 사이즈를 선택해주세요
답변 : 장소를 입력해주세요
답변 : 피자주문 처리가 완료되었습니다.
Text(Message)
1
3
4
2

Chatbot Interface Flow
NLP
Context Analyzer
Decision Maker
판교에 포스코ICT에 배달해줘
Intent : 피자주문
Entity : 장소 = 판교 포스코ICT
Service Manager
Response
Generator
메뉴=null
시간=null
배달관련 Slot 분석(Knowlodge Base/Scenario)
Entity : 메뉴:Null, 시간:null
피자주문 처리가 완료되었습니다.
피자주문 Slot 완성
어떤 메뉴를 원하시나요?
어떤 메뉴를 원해? (Tone Gen)
Slot OK

Story slot의 구성 (Frame-based DM)
피자 주문하고 싶어
Pizza Slot
Size
Type
Side menu
피자 주문 의도 파악
피자 Bot의 스토리 구성
1) 어떤 사이즈를 원하시나요?
2) 어떤 종류를 원하시나요?
3) 사이드 메뉴는 필요하신가요?
사용자 답변
- 페파로니 피자로 라지 사이즈에
콜라추가해주세요
NER처리 및 Slot 구성
Pizza Slot
Size Large
Type Pepperoni
Side menu cola
서비스 연결
(Slot API Call)
처리를 위해 Slot를 선택할
수 있게 보여주는 것도 방법
(UX기술까지 필요??)

1. 맥북 프로 검색해줘
2. 전처리 -> 맥북 프로 NER
3. 맥북프로 -> 대표 Entity처리 -> MacBook Pro API Call
4. 검색결과 출력
5. 상세 서비스 조회를 위한 Slot 출력
6. 새상담 원할 경우 새상담 클릭
Slot를 선택할 수 있게 화면에 출력함으로써 챗봇의
정확도를 대폭 향상 시킬 수 있음
(해당 Frame안에서만 선택할 수 있기에…)
ex) “삼성 노트북” 쳐보면 Slot별 선택
바로봇
http://www.11st.co.kr/toc/bridge.tmall?method=chatPage
Slot
Trigger
API

NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
NLU를 어떻게 한다는거지?
=> AI 적용을 위해 Vector로
변환을 해야함
1

Word Represention의 정의 (컴퓨터가 잘 이해할수 있게)
- One Hot은 단어별 강한 신호적 특성으로 Train 에 효과적 (Scope가 작을경우-Sparse)
- Word 단위 Embedding 은 단어를 잘 기억함 (But Sparse) / W2V (유사도)
- Gloves는 단어의 세부 종류까지도 구분 (카라칼-고양이)
- Char 단위 Embedding 은 미훈련 단어 처리에 용이 (Vector을 줄이기위한 영어변환)
- 한글을 변환한 영어 Char 단위 Embedding는 백터 수를 줄이면서 영어 처리도 가능
Train을 위한 Word Representation
15 한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구.pdf

일반적으로 Biz에 따른 Text는 존재하나 Deep Learning를 구현하기 위해서는
정제된 Text와 Tagging이 가능한 매우 많은 Data가 있어야함
한국어 Corpus를 일반적으로 세종 말뭉치를 사용하여 추가적인 Biz 어휘는 새로 학습시킴(노가다)
- Corpus (annotation) 세종말뭉치(2007 ) https://ithub.korean.go.kr/user/main.do
- 물결21 (2001~2014) 소스X http://corpus.korea.ac.kr/
- Web Crawling or down (Wiki, Namu Wiki)
- Domain Specific의 경우엔 Text Data는 직접 만들어야함(Augmentation)
특화된 단어의 경우 새로 학습시켜야함 (ㅎㅇ? , 방가방가)
※ 고유명사등 새로운 어휘가 생성될때 새로 등록을 해주어야함
Data를 어떻게 얻는가?

문체부·국립국어원 '2차 세종계획' 추진
4차 산업혁명의 기반인 인공지능(AI)의 핵심 중 하나는
사람과 기계의 자유로운 의사소통이다.
컴퓨터가 인간의 말이나 글을 제대로 이해하고
반응하려면 인간이 말하고 쓰는 자연언어를 처리할 수
있는 방대한 언어 데이터베이스가 필요하다.
이러한 언어 데이터베이스를 말뭉치(corpus)라고 한다.
최근 빠르게 보급되는 음성인식 인공지능의 정확도는
이러한 말뭉치가 얼마나 풍부하게 정교하게 구축돼
있느냐에 달려있다.
문화체육관광부와 국립국어원은 한국어 인공지능
기술의 발전을 위해 2018~2022년 총 154억7천만 어절의
말뭉치를 구축하는 국어 정보화사업 계획을 마련했다고
9일 밝혔다.

Train Vector를 정한 후 Feature를 뽑아야함
Cleansing -> Feature Engineering -> Train
(상황별 특수문자 제거, 의미 있는 단어 도출 - Tagging)
의도나 객체와 상관있는 단어만 추출해내어
성능을 향상시킴Train Cost를 줄이고 모델의 성능을 향상)
임베딩 차원도 줄이는 효과 (Dense Respresention-SVD)
abcd~z, 0~9, ?, !, (,),’,’,공백등 약 70여개
초중종성으로 글자를 쪼개기에는 어려움
.lower()를 활용하는것도 방법 백터 줄이기
학습시킬 Data의 구성

Data의 양이 적은데 어떻게
정제된 Data를 구하지?
Data MartMonitoring
AI Model
Pipe Line
1

Data Augmentation for AI (Intent - tag)
판교에 오늘 피자 주문해줘 Story Definition
Intent Mapping주문 해줘 Entity Mapping 메뉴 : 피자, 장소 : 판교, 날짜 : 오늘
Pattern Generation
30% of Train Data
의도 : 피자 주문 (주문)
Preprocessing판교 오늘 피자 주문
Story key value (주문)
tagloc tagdate tagmenu 주문
Model Train(Char-CNN)
Evaluation
tagloc tagdate 주문
tagdate tagmenu 주문
tagloc tagmenu 주문
Predictiontagloc tagdate 주문 tagmenu
Hyper parameter
Selection
의도 = 주문

Data flow for Model in AI (NER - BIO)
판교에 오늘 피자 주문해줘 Story Definition
BIO-Mapping
Preprocessing판교 오늘 피자 주문
B_Loc / B_Date / B_menu
Model Train(Bi-LSTM)
B-loc B-date B-menu 주문
B-loc B-date 주문
B-date B-menu 주문
B-loc B-menu 주문
Text Generator
Pattern Matching
tagloc tagdate 주문
tagdate tagmenu 주문
tagloc tagmenu 주문
W2V
30% of Train Data Evaluation
Prediction판교 오늘 피자 주문
Hyper parameter
Selection
피자 : 0.12
장소 : 0.7
메뉴 : 0.3
객체인식
B_loc O B_Date B_menu 주문 O

NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
Data는 구했는데 의도를
어떻게 알아내지?
1

Intent를 알아내는법 (Text Classification)
피자주문 하고 싶어 / 여행 정보 알려줘 / 호텔 예약해줘
주문, 정보, 예약의 3가지 의도
문장 내 Word검색으로 일일이 파악할 수도 있으나 한계가 있음
ex) 피쟈 시켜먹고 싶어 / 여행 좋은데 알려줘….
Deeplearning를 활용하면 이런 문제들을 해결 할 수 있음
Char + CNN으로 분류해보자
(CNN - Feature 주문, 정보, 예약)
(Word Similarity 피자, 피쟈 / 정보, 갈만한데)

Intent를 알아내는법 (Text Classification - Data 구성)
Word
피자
주문
하고
싶어
Vector가 많다면
영어발음변환
PIJA
JUMUN
HAGO
SIPO
숫자, 특수문자,공백등
모두 고려해야함
W2V(Pretrained)
피자 (0.12, 0.54, 0.72)
주문(0.56, 0.65, 0.64)
하고(0.67, 0.91, 0.13)
싶어(0.89, 0.14, 0.11)
Ont Hotencoding (Word단위 or 글자단위)
(0100000000)
(0000010000)
(0010000000)
(0000000100)
Ont Hotencoding (A~Z Vector)
(0100000000)
(0000010000)
(0010000000)
(0000000100)

Char CNN?
CNN은 일반적으로 이미지의 특징을
추출하여 인식하는데 많이 쓰이나
이미지도 결국은 Vector이고
텍스트도 Vector을 감안하면
텍스트의 Feature를
뽑아낼 수 있음

Text Classification - Char CNN
지금
피자
주문
하고
싶어
[논문 Convolutional Neural Networks for Sentence Classification - Yoon Kim - https://arxiv.org/abs/1408.5882]
예약
주문
정보
Feature
바라볼단어수
[3,4,5 filter]
Vector (W2V)
길이/차원/윈도우
Static / Non Static
/ Random
pooling
추상화
classification
분류
Char-CNN을 활용하여 의도를 파악해보자

Why Char-CNN??
Char-CNN이 일반적인 다른 알고리즘과 비교하여 좋은 성능을 보임
논문 Convolutional Neural Networks for Sentence Classification - Yoon Kim - https://arxiv.org/abs/1408.5882

Text Classification (Multi-class SVM)
Char-CNN보다 간단하게 Machine Learning를 활용하여 의도를 파악할 수 있음

NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
message
Entity는 어떻게 알아내지?
1

RNN에 대한 이해
연속된 Data에 대한 모델링에 유용
시퀀스를 입력으로 받기 때문에
Backpropagation을 시간에 대해서도 수행(BPTT)
http://aikorea.org/blog/rnn-tutorial-3/

http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf
Seq2Seq (RNN+RNN) 이해
Chatbot에서는
Generator의 역활
Sentence Generator
영화 자막이나 소설책을 활용하여
학습시킬 수 있음
(형태소 분석기로 input/output정의)

LSTM에 대한 이해
Cell State
https://brunch.co.kr/@chris-song/9
updateforget out
cell state

ResNet과 RNN의 LSTM은 비슷한 개념

Named Entity Recognition 알아내기
Bidirectional LSTM (양방향 Layer)
- RNN기반의 모델
- 특정위치에 있는 단어의 태깅에 유용
문장내 단어 위치에 따른 의미 처리하는 효과적인 방법
[ 한국어 정보처리 학술대회 - https://sites.google.com/site/2016hclt/jalyosil]

Why Bi-LSTM CRF ?
[ Bidirectional LSTM-CRF Models for Sequence Tagging - https://arxiv.org/pdf/1508.01991.pdf ]

B-Pizza B-Order O O
여행 정보 알려줘
B-Travel B-Information O
호텔 예약해줘
B-Hotel B-Reserve O
Named Entity Recognition 알아내기
brat를 활용 BIO Tagging
B-시작어휘
I-이어지는 어휘
O-어휘아님, 공백(OUT)
U-Unknown
(Word Embedding이 없을시)
※New York?,수상하다?
Brat - http://brat.nlplab.org/examples.html / https://wapiti.limsi.fr/

Bi-LSTM으로 사전 강화 -> 모델 학습
B-Pizza B-Order O O
여행 정보 알려줘
B-Travel B-Info O
호텔 예약해줘
B-Hotel B-Reserve O
피이쟈 주문하고 싶어
놀러갈 정보 알려줘
숙소 예약해줘
피자
여행
호텔
Bi-LSTM을 통해서 신규 어휘를 도출하고 학습Data에
반영하여 모델의 성능을 지속적으로 향상 시킴

NLU Server
(Understand)
NLG Server
(Generate)
DM
Server
Messaging
Platform
BackEnd
Service Servers
Scenario
Chat-Bot System
ChatBot
Server
BackEnd
Service Servers
message
intent & slot
information
message
message
Semantic Frame
Semantic Frame
connect services
의도도 파악했고 Entity도
알아냈으니 서비스를 만들어보자 message
12
3

ChatBot Layer
Log File
Chatbot Architecture
Deep Learning Layer 위에 ChatBot Layer 와 같은 Application Layer 를
구성하고 각 Application Layer 는 필요한 기능을 DL Layer 와 연동.
DeepLearning Layer
Bi-LSTM
CRF
Char-CNN
SVM
Seq2Seq
Attention
NAS File
Model
Bot DB
Residual
Vgg
NLP
Context
Analyzer
Decision
Maker
Response
Genertor
※ 이미지검색을 위해 Residual등과 같은 모델 활용
Bot Builder
GPU
Deeplearning
Predict
Dict File
Bot config
Train
Train
Intent /
NER

NLP Architecture
Preprocessing
Python
Konlpy
Mecab
(Sejong Corpus)
Tensorflow
SVM
Char-CNN
Bi-LSTM CRF
Gensim
FastText
User-Dic
Synonym
Voting
Python
API Service
(Swagger)
판교 근무하는 포스코ICT에 김수상한테 피자 주문하고 싶어...
[Intent 도출]
피자 주문
[NER 도출]
판교 - Loc
포스코ICT - Loc
김수상 - Name
고유명사
('포스코'ICT'', 'NNP'),
('김수상', 'NNP'),
※Mecab 고유명사등록
링크
문장길이체크 , 특수기호
(...) 삭제
명사 추출
명사 추출
[('판교', 'NNG'),
('근무', 'NNG'),
('하', 'XSV'),
('는', 'ETM'),
('포스코'ICT'', 'NNP'),
('에', 'JKB'),
('김수상', 'NNP'),
('한테', 'JKB'),
('피자', 'NNG'),
('주문', 'NNG'),
('하', 'XSV'),
('고', 'EC'),
('싶', 'VX'),
('어', 'EC')]
Intent Slot 및 모델 비교
피자주문 Slot의 Entity값
NER 결과값
Input Data=’’ 판교 근무하는
포스코ICT에 김수상한테 피자
주문하고 싶어…”
Intent=’피자주문’
Intent_History=[‘피자주문’,’’]
story_slot_entity
{
‘메뉴’:피자’’,
‘지역’ : ’판교 포스코ICT’,
‘이름’ : ‘김수상’}
request_type=’text’
service_type=’pizza order’
output_data=’’
}
Meta

Docker (Ubuntu) in AWS EC2
(c4.8xlarge / p2.xlarge GPU)
NAS
DB Server
Bot Builder
(analysis)
React
Chatbot Server (Django)
Python
Tensorflow
Postgres
SQL
Bootstrap
Web Service Architecture(MSA)
D3
SCSS
Konlpy
Nginx
Celery
Log File Model File
Rabbit
MQ
Service
Java
Node
Python
Rest
Gensim
Front-End
Java
(Trigger)
Rest
LB Rest
AP2
GPU Server
(HDF5)
GPU Server
(HDF5)
Dict File
Hbase

Bot Builder and UX (Story)

ChatBot
Definition
ChatBot Intent
ChatBot
Service
ChatBot Intent
Entity
ChatBot
Story
ChatBot
Response
ChatBot
Model
ChatBot
Tagging
ChatBot
Entity Relation
ChatBot
Synonym
Bot Builder DB
Service의 확대를 위해 가능하면 Common하게 구성

Rest API
Client
Input Data=페파로니 피자 주문할께
Intent=’’
Intent_History=[‘ ’,’’]
story_slot_entity
{
메뉴:’’,
사이즈:’’,
사이드:’’
}
request_type=text
service_type=’’
output_data=’’
Server
Input Data= 페파로니 피자 주문할께
Intent=피자주문
Intent_History=[‘피자주문’,’’]
story_slot_entity
{
메뉴:피자,
사이즈:라지,
사이드: 콜라
}
request_type=text
service_type=’’
output_data=주문완료
Chatbot API
※ 필수 값들만 JSON으로 통신하고 다른 값은 Dilog Manager(Log)에서 관리

Case별 Test Coverage 코드 구현
1. 로직 변경 (단위테스트)
2. Model 변경 (Hyper Parameter)
3. Data 변경(Slot, Dict, Entity,유의어)
4. 속성 값 변경 (Threshold, Rule기준)
단순 로직 변경과는 다르게 Data와 Model의 변경사항을 지속적 검증 할 수 있는 방안 필요
가동상황에서 정확도를 올리기 위해선 Continous Integration이 필수 (Jenkins / Travis CI등)
Test Codes for Chatbot
피자주문
호텔예약 의도점검->NER점검-> Slot점검
여행정보
input 판교에 피자주문할께 -> intent : 피자주문
slot : {메뉴,크리,사이드-extra}

실무에서 발생하는
문제와 해결 Tips

모델의 정합성을 올리기 위해 복수개의 모델과 로직으로 보완 (Scoring / Voting)
의도를 찾는 경우 여러모델을 비교하여 가장 근접한 값을 찾는다
Textming과 앙상블의 조합으로 정합도르 올리자(Fine tunning)
포스코ICT에
지금 피자
배달해줘
Char-CNN
VotingSVM(Multi-class) Result
naive_bayes.MultinomialNB 각 의도별 Slot 비교
배달의 경우엔
장소,시간이 필수
여행정보
메뉴배달
메뉴배달 피자 배달
Ensemble and Voting
모델별
가중치
Slot
비교
병렬 수행

Trigger 처리 (사랑, 이미지 검색)
1. 사랑단어가 포함될 경우
<실재 가동 사례>
직원 : XXX 사원에게 사랑한다고 포스톡 보내줘
챗봇 : 너무 쉽게 사랑하지 마세요.
직원 : 니가 먼제 내 사랑을 논해
챗봇 : 학습중이라 아직 잘 모르는게 많아요.
직원 : ㅋㅋㅋㅋ
챗봇 : ㅋㅋㅋ
[안녕, 사랑, ㅋㅋㅋ] 등에 Trigger를 적용하고 이에 확보된
Data를 Seq2Seq모델에 학습시켜 NLP전처리 모델로 사용
https://www.youtube.com/watch?v=x9bvkXJ-JeQ
2.이미지 검색 시(ResNet Model Call)

필요시 Tone Generater을 쓰자
말투를 다르게만듬 (지역별, 존댓말 , 부하톤)
주문이 완료되었습니다 (일반)
주문이 완료되었단다 (공손)
주문이 완료되었어요 (존대)
주문이 완료되었다니깐 (짜증)
Seq2Seq Model활용 - Encoder에 명사등 구성
Decoder에 명사+조사 구성
Response Generator의 경우 형태소 분석기의 응용

유의어 처리(N-Gram)
페파로니 - Pepperoni, 폐파로니, 페파피자..... / Mac Book Pro - 맥프로, 맥북프로...
고객별로 다양한 단어를 사용하나 API호출시에는 지정 값으로 해야 함
N-Gram을 활용하여 유의어로 학습한 결과를 Dict에 찾는 방식 (일반적 trigram)
링크 https://www.simplicity.be/article/throwing-dices-recognizing-west-flemish-and-other-languages/
각 Entity별 N과
Threshold 값을 적절하게 조절
※ threshold :
작을수록 비슷하게 찾음

Response Speed
LB 구성
Nginx 사용
적절한 수의 Thread와 AP
Caching of Data (Memory - API사용)
Chatbot에서 수용할수 있는 MAX Time반영

학습시 병렬 처리를 위한 Coding
tf.device를 통해 연산할 Device를 지정
CPU와 GPU의 적절한 분배
GPU가 많다고 무조건 빠른지는...

마무리
● 챗봇의 구현에 있어서 Hot한 기술의 사용도 중요하지만
무엇보다 Domain별 Data의 의미를 알고 컴퓨터가 잘 이해할 수 있게 해야함
● 학습할 Data와 예측 Data의 패턴을 일치화하는 것이 중요(일관성)
● 딥러닝은 대량의 정제된 Data와 확보가 중요함
● 딥러닝은 성능개선에 있어 충분한 해결 방안이 될 수 있음

When the singularity comes...
Google IO17 : https://www.youtube.com/watch?v=Y2VF8tmLFHw

Reference
모두를 위한 딥러닝
http://hunkim.github.io/ml/
제28회 한글 및 한국어 정보처리 학술 대회
한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구등
Stanford University CS231n
http://cs231n.stanford.edu/
Creating AI chat bot with Python 3 and Tensorflow[신정규]
https://speakerdeck.com/inureyes/building-ai-chat-bot-using-python-3-and-tensorflow
파이썬으로 챗봇_만들기 [김선동]
https://www.slideshare.net/KimSungdong1/20170227-72644192?next_slideshow=1
딥러닝을 이용한 지역 컨텍스트 검색 [김진호]
http://www.slideshare.net/deview/221-67605830
Developing Korean Chatbot 101 [조재민]
https://www.slideshare.net/JaeminCho6/developing-korean-chatbot-101-71013451
Tensorflow-Tutorials
https://github.com/golbin/TensorFlow-Tutorials

Sk t academy lecture note

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Sk t academy lecture note

Semelhante a Sk t academy lecture note (20)

Mais de Susang Kim

Mais de Susang Kim (16)

Último

Último (20)

Sk t academy lecture note