SlideShare a Scribd company logo
1 of 52
Download to read offline
DL Chatbot seminar
Day 03
Seq2Seq / Attention
hello!
I am Jaemin Cho
● Vision & Learning Lab @ SNU
● NLP / ML / Generative Model
● Looking for Ph.D. / Research programs
You can find me at:
● heythisischo@gmail.com
● j-min
● J-min Cho
● Jaemin Cho
Today We will cover
✘ RNN Encoder-Decoder for Sequence Generation (Seq2Seq)
✘ Advanced Seq2Seq Architectures
✘ Attention Mechanism
○ PyTorch Demo
✘ Advanced Attention architectures
1.
RNN Encoder-Decoder
for Sequence Generation
RNN Encoder-Decoder
Neural Conversation Model
Alternative Objective: MMI
Sequence-to-Sequence
✘ Goal
○ Source Sentence를 받고 Target Sentence를 출력
“Sequence to Sequence Learning with Neural Networks” (2014)
이제 Source Sentence입력이 끝났다!
여기 뒤부터는 Target Sentence!
이제 디코딩 그만!
“Sequence to Sequence Learning with Neural Networks” (2014)
Vocab
hidden
Vocab
hidden
Vocab
embed embed
Batch x Vocab
Batch x hidden
Batch x embed
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x embed
“Sequence to Sequence Learning with Neural Networks” (2014)
Batch x Vocab
Batch x hidden
Batch x embed
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x embed
“Sequence to Sequence Learning with Neural Networks” (2014)
Vocab size 0.1
0.8
0.1
0.0
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x embed Batch x embed
“Sequence to Sequence Learning with Neural Networks” (2014)
0.1
0.8
0.1
0.0
Vocab size
0
1
0
0
Vocab size
vs
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x embed Batch x embed
“Sequence to Sequence Learning with Neural Networks” (2014)
Vocab size 0.1
0.8
0.1
0.0
idx=1 인 word 선택! (Greedy Sampling)
Word embedding 에서 idx=1인 단어의단어 벡터 추출
embed size 1.3
0.4
-0.1
-1.6
Neural Conversation Model
“A Neural Conversational Model” (2015)
IT Helpdesk Troubleshooting dataset
✘ First Seq2Seq Chatbot
✘ Too generic Response
✘ Source constraint on generation process
○ Only source of variation is at the output
✘ No persona
✘ Cannot capture ‘higher-level’ representations
Drawbacks of Seq2Seq for Chatbot
✘ Standard Seq2Seq Objective
○ Maximize log-likelihood (MLE)
✘ Only selects for targets given sources, not the converse
✘ Does not capture actual objective in human communication
Too Generic Response
“A Diversity-Promoting Objective Function for Neural Conversation Models” (2015)
문맥 파악 x
그냥 많이 나오는단어를많이 말하게됨
✘ Alternative objective
○ Maximum Mutual Information (MMI)
✘ With Hyperparameter λ
✘ With Bayes Theorem
○ Weighted MMI is rewritten as below
Maximum Mutual Information (MMI)
Avoid too generic response
=> Anti-Language Model (MMI-antiLM)
“A Diversity-Promoting Objective Function for Neural Conversation Models” (2015)
=> MMI-bidi
Prior 가 높은 (자주 나오는) 표현들에penalty 부과
✘ Generated more diverse response
Maximum Mutual Information (MMI)
“A Diversity-Promoting Objective Function for Neural Conversation Models” (2015)
2.
Advanced Seq2Seq Architectures
Image Captioning: Show and Tell
Hierarchical Seq2Seq: HRED / VHRED
Personalized embedding: Persona-Based Neural Conversation Model
Learning in Translation: CoVe
Show And Tell
“Show and Tell: A Neural Image Caption Generator” (2015)
Hierarchical Recurrent Encoder-Decoder (HRED)
“A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion” (2015)
Variational Autoencoder (VAE)
“Auto-Encoding Varational Bayes” (2014)
Latent Variable HRED (VHRED)
“A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues” (2016)
✘ Speaker embedding
✘ 채팅 데이터에 없는 정보를 추론할 수 있음
✘ Ex) 영국에 사는 Rob / Josh
○ 둘 다 영국 거주
=> 비슷한 speaker embedding
○ Rob 만 챗봇에게 자기가 영국에
산다고 대답한 적 있음
○ 하지만 speaker embedding 이
비슷하기 때문에 Josh도 영국에 살
것이라고 유추할 수 있음
Persona-Based Conversation Model
“A Persona-Based Neural Conversation Model” (2016)
CoVe (Context Vector)
“Learned in Translation: Contextualized Word Vectors” (2017)
✘ Transfer Learning
○ 대용량 데이터에서 학습한 모델은 다른 모델에게 학습한 내용을 잘 전달할 수 있다
○ Pretraining / Fine-tuning
✘ MT-LSTM
○ Seq2Seq + Attention 번역 모델은 임의의 텍스트를 좋은 distribution의 벡터를 생성할
것이다
○ 그럼 그냥 그 번역 모델의 Encoder를 Feature Extractor 로 사용하자!
Glove Glove
CoVe (Context Vector)
“Learned in Translation: Contextualized Word Vectors” (2017)
3.
Attention Mechanism
Attention Mechanism
Different attention scoring
Global / Local attention
✘ Encoder RNN 의 마지막 Output 만으로 Source 문장을 나타내는 것은 상당한 정보 손실
✘ Decoder의 매 스텝마다 Source 문장을 문맥에 맞게 새로 백터화, 이를 바탕으로 단어 생성
✘ Interactive demo of distill.pub
Attention
Attention
“Learned in Translation: Contextualized Word Vectors” (2017)
Attention
“Learned in Translation: Contextualized Word Vectors” (2017)
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x embed Batch x embed
“Sequence to Sequence Learning with Neural Networks” (2014)
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x Vocab
Batch x embed Batch x embed
Batch x Seq
Batch x hidden
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x Vocab
Batch x embed Batch x embed
Batch x Seq
Batch x hidden
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x Vocab
Batch x embed Batch x embed
Batch x Seq
Batch x hidden
Batch x hidden * 2
Batch x hidden
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x Vocab
Batch x embed Batch x embed
Batch x Seq
Batch x hidden
Batch x hidden * 2
Batch x hidden
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
Batch x Vocab
Batch x hidden
Batch x Vocab
Batch x Vocab
Batch x embed Batch x embed
Batch x hidden
Batch x hidden
Batch x hidden * 2
Batch x Seq
Bahdanau Attention
“Neural machine translation by jointly learning to align and translate” (2015)
Different scoring methods
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
✘ Luong
✘ Bahdanau
✘ ht
: target state
✘ hs
: source states
✘ s: target state
✘ h: source state
Global and Local Attention
“Effective Approaches to Attention-based Neural Machine Translation” (2015)
4.
Advanced Attention Mechanism
Image Captioning: Show, Attend and Tell
Pointing Attention: Pointer Networks
Copying Mechanism: CopyNet
Bidirectional Attention: Bidirectional Attention Flow (BIDAF)
Self-Attention: Transformer
Show, Attend And Tell
“Show, Attend and Tell: Neural Image Caption Generation with Visual Attention” (2015)
Pointing Attention (Pointer Networks)
“Pointer Networks” (2016)
Seq2Seq + Attention Pointer Networks
Copying Mechanism (CopyNet)
“Incorporating Copying Mechanism in Sequence-to-Sequence Learning” (2016)
Bi-directional Attention Flow (BIDAF)
“Bi-Directional Attention Flow for Machine Comprehension” (2016)
✘ Question Answering
○ 2016년 말 SQUAD SOTA
○ 정답은 지문 속에 있는 sub-phrase
○ ‘시작’ 과 ‘끝’ 부분 찾아서 ‘밑줄 치기’
✘ Bi-attention
○ 현재 읽는 지문의 단어들과 가장
가까운 문제의 단어는?
○ 현재 읽는 문제의 단어들과 가장
가까운 지문의 단어는?
✘ Embedding
○ Word-embedding
○ Char-embedding
○ Highway Network
✘ Decoder
○ Pointer-like Network
Bi-directional Attention Flow (BIDAF)
“Bi-Directional Attention Flow for Machine Comprehension” (2016)
Transformer
“Attention is All You need” (2017)
✘ Machine Translation
✘ Self-attention Encoder-Decoder
✘ Multi-layer “Scaled Dot-product Multi-head” attention
✘ Positional Embeddings
✘ Residual Connection
✘ Byte-pair Encoding (BPE)
Transformer
“Attention is All You need” (2017)
✘ 기존의 CNN/RNN 기반 Language Understanding의 단점
○ Path-length 가 멀다
■ 네트워크 내에서 한 단어와 다른 단어 node 사이의 거리
■ 멀수록 Long-Term dependency 잡아내기 힘듦
○ Dilated Convolution, Attention 등으로 path length를 줄여 왔음
✘ Transformer
○ self-attention만으로 이루어진 encoder-decoder
○ 주어진 token 개수만큼 hidden representation 생성
Transformer
“Attention is All You need” (2017)
✘ Positional Encoding
○ 토큰의 순서를 모델링
■ i: 벡터의 몇 번째 원소인지
✘ OpenNMT-py’s implementation
✘ Visualization
Transformer
“Attention is All You need” (2017)
✘ Attention
○ Attention layer를 encoder/decoder에 6겹 쌓음
○ 3개의 입력
■ Q, K, V (Query, Key, Value)
■ End-to-End Memory Networks 와 유사
○ Attention Weight
■ Q, K의 dot product & softmax
■ dk
0.5
로 scaling (smoothing)
● 자기 자신에만 attention 쏠리는 것 방지
○ V 벡터들의 weighted sum 출력
Transformer
“Attention is All You need” (2017)
✘ Attention 입력 문장
○ Encoder
■ Q, K, V 모두 source sentence
○ Decoder
■ 첫 레이어
● Q, K, V 모두 target sentence
■ 이후 레이어
● Q: target sentence
● K, V: source sentence
Transformer
“Attention is All You need” (2017)
✘ Multi-head Attention
○ 이전 레이어의 출력
■ Q, K, V
■ 모두 dmodel
차원
○ h개의 WQ
i
, WK
i
, WV
i
를 이용해서 dk
, dk
, dv
차원으로 projection
○ i = 1..h
■ Qi
= Q @ WQ
i
(dmodel
=> dk
)
■ Ki
= K @ WK
i
(dmodel
=> dk
)
■ Vi
= V @ WV
i
(dmodel
=> dv
)
○ h개의 attention 결과를 concatenation (Inception과 비슷)
✘ 논문에서는 아래의 값 사용
○ h=8
○ dk
= dv
= dmodel
/ h = 64
Transformer
“Attention is All You need” (2017)
✘ 기타
○ Point-wise Feed-Forward
■ 2-layer NN + ReLU
○ Layer norm / Residual Connection
■ Sublayer(x) = FFN(Multi-head Attention(x)
○ Embedding
■ dmodel
* 0.5
○ Optimizer
■ Adam
○ Regularization
■ Residual Dropout
● p = 0.1
■ Attention Dropout
■ Label smoothing
● e = 0.1
Transformer
“Attention is All You need” (2017)
✘ Complexity / Maximum Path Length
○ n: sequence length
○ r: restricted size of the neighborhood (local attention)
✘ BLEU Score on WMT 2014
Seq2Seq Implementations in PyTorch
✘ https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation
✘ https://github.com/OpenNMT/OpenNMT-py
✘ https://github.com/eladhoffer/seq2seq.pytorch
✘ https://github.com/IBM/pytorch-seq2seq
✘ https://github.com/allenai/allennlp
✘ Also, check out the curated ‘Awesome’ lists.
○ https://github.com/ritchieng/the-incredible-pytorch
○ https://github.com/bharathgs/Awesome-pytorch-list
thanks!
Any questions?

More Related Content

What's hot

[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture noteSusang Kim
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text MiningWill Stanton
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVERNAVER D2
 
Conversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningConversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningAndherson Maeda
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Universitat Politècnica de Catalunya
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 

What's hot (20)

[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER
 
Conversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningConversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep Learning
 
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 

Viewers also liked

Cs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural NetworksCs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural NetworksYanbin Kong
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
iPhone5c的最后猜测
iPhone5c的最后猜测iPhone5c的最后猜测
iPhone5c的最后猜测Yanbin Kong
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP HackRoelof Pieters
 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep LearningMelanie Swan
 
State of Blockchain 2017: Smartnetworks and the Blockchain Economy
State of Blockchain 2017:  Smartnetworks and the Blockchain EconomyState of Blockchain 2017:  Smartnetworks and the Blockchain Economy
State of Blockchain 2017: Smartnetworks and the Blockchain EconomyMelanie Swan
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHenning Spjelkavik
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedMelanie Swan
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelYanbin Kong
 

Viewers also liked (20)

Cs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural NetworksCs231n 2017 lecture10 Recurrent Neural Networks
Cs231n 2017 lecture10 Recurrent Neural Networks
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
iPhone5c的最后猜测
iPhone5c的最后猜测iPhone5c的最后猜测
iPhone5c的最后猜测
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
 
State of Blockchain 2017: Smartnetworks and the Blockchain Economy
State of Blockchain 2017:  Smartnetworks and the Blockchain EconomyState of Blockchain 2017:  Smartnetworks and the Blockchain Economy
State of Blockchain 2017: Smartnetworks and the Blockchain Economy
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 

Similar to Deep Learning for Chatbot (3/4)

Smart Data Meetup - NLP At Scale
Smart Data Meetup - NLP At ScaleSmart Data Meetup - NLP At Scale
Smart Data Meetup - NLP At ScaleSteffen Wenz
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 
Machine translation survey - vol1
Machine translation survey  - vol1Machine translation survey  - vol1
Machine translation survey - vol1gohyunwoong
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
2022 - Delivering Powerful Technical Presentations.pdf
2022 - Delivering Powerful Technical Presentations.pdf2022 - Delivering Powerful Technical Presentations.pdf
2022 - Delivering Powerful Technical Presentations.pdfWesley Reisz
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetEric Haibin Lin
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoSebastian Ruder
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjajaCodePolitan
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
Extending JS WU2016 Toronto
Extending JS WU2016 TorontoExtending JS WU2016 Toronto
Extending JS WU2016 TorontoFrancis Bourre
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...Kyuri Kim
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with KerasQuantUniversity
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 

Similar to Deep Learning for Chatbot (3/4) (20)

Smart Data Meetup - NLP At Scale
Smart Data Meetup - NLP At ScaleSmart Data Meetup - NLP At Scale
Smart Data Meetup - NLP At Scale
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Machine translation survey - vol1
Machine translation survey  - vol1Machine translation survey  - vol1
Machine translation survey - vol1
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
2022 - Delivering Powerful Technical Presentations.pdf
2022 - Delivering Powerful Technical Presentations.pdf2022 - Delivering Powerful Technical Presentations.pdf
2022 - Delivering Powerful Technical Presentations.pdf
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...How to use transfer learning to bootstrap image classification and question a...
How to use transfer learning to bootstrap image classification and question a...
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
Attention
AttentionAttention
Attention
 
Image captioning
Image captioningImage captioning
Image captioning
 
Extending JS WU2016 Toronto
Extending JS WU2016 TorontoExtending JS WU2016 Toronto
Extending JS WU2016 Toronto
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Deep learning with Keras
Deep learning with KerasDeep learning with Keras
Deep learning with Keras
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 

Deep Learning for Chatbot (3/4)

  • 1. DL Chatbot seminar Day 03 Seq2Seq / Attention
  • 2. hello! I am Jaemin Cho ● Vision & Learning Lab @ SNU ● NLP / ML / Generative Model ● Looking for Ph.D. / Research programs You can find me at: ● heythisischo@gmail.com ● j-min ● J-min Cho ● Jaemin Cho
  • 3. Today We will cover ✘ RNN Encoder-Decoder for Sequence Generation (Seq2Seq) ✘ Advanced Seq2Seq Architectures ✘ Attention Mechanism ○ PyTorch Demo ✘ Advanced Attention architectures
  • 4. 1. RNN Encoder-Decoder for Sequence Generation RNN Encoder-Decoder Neural Conversation Model Alternative Objective: MMI
  • 5. Sequence-to-Sequence ✘ Goal ○ Source Sentence를 받고 Target Sentence를 출력 “Sequence to Sequence Learning with Neural Networks” (2014) 이제 Source Sentence입력이 끝났다! 여기 뒤부터는 Target Sentence! 이제 디코딩 그만!
  • 6. “Sequence to Sequence Learning with Neural Networks” (2014) Vocab hidden Vocab hidden Vocab embed embed
  • 7. Batch x Vocab Batch x hidden Batch x embed Batch x Vocab Batch x hidden Batch x Vocab Batch x embed “Sequence to Sequence Learning with Neural Networks” (2014)
  • 8. Batch x Vocab Batch x hidden Batch x embed Batch x Vocab Batch x hidden Batch x Vocab Batch x embed “Sequence to Sequence Learning with Neural Networks” (2014) Vocab size 0.1 0.8 0.1 0.0
  • 9. Batch x Vocab Batch x hidden Batch x Vocab Batch x hidden Batch x Vocab Batch x embed Batch x embed “Sequence to Sequence Learning with Neural Networks” (2014) 0.1 0.8 0.1 0.0 Vocab size 0 1 0 0 Vocab size vs
  • 10. Batch x Vocab Batch x hidden Batch x Vocab Batch x hidden Batch x Vocab Batch x embed Batch x embed “Sequence to Sequence Learning with Neural Networks” (2014) Vocab size 0.1 0.8 0.1 0.0 idx=1 인 word 선택! (Greedy Sampling) Word embedding 에서 idx=1인 단어의단어 벡터 추출 embed size 1.3 0.4 -0.1 -1.6
  • 11. Neural Conversation Model “A Neural Conversational Model” (2015) IT Helpdesk Troubleshooting dataset ✘ First Seq2Seq Chatbot
  • 12. ✘ Too generic Response ✘ Source constraint on generation process ○ Only source of variation is at the output ✘ No persona ✘ Cannot capture ‘higher-level’ representations Drawbacks of Seq2Seq for Chatbot
  • 13. ✘ Standard Seq2Seq Objective ○ Maximize log-likelihood (MLE) ✘ Only selects for targets given sources, not the converse ✘ Does not capture actual objective in human communication Too Generic Response “A Diversity-Promoting Objective Function for Neural Conversation Models” (2015) 문맥 파악 x 그냥 많이 나오는단어를많이 말하게됨
  • 14. ✘ Alternative objective ○ Maximum Mutual Information (MMI) ✘ With Hyperparameter λ ✘ With Bayes Theorem ○ Weighted MMI is rewritten as below Maximum Mutual Information (MMI) Avoid too generic response => Anti-Language Model (MMI-antiLM) “A Diversity-Promoting Objective Function for Neural Conversation Models” (2015) => MMI-bidi Prior 가 높은 (자주 나오는) 표현들에penalty 부과
  • 15. ✘ Generated more diverse response Maximum Mutual Information (MMI) “A Diversity-Promoting Objective Function for Neural Conversation Models” (2015)
  • 16. 2. Advanced Seq2Seq Architectures Image Captioning: Show and Tell Hierarchical Seq2Seq: HRED / VHRED Personalized embedding: Persona-Based Neural Conversation Model Learning in Translation: CoVe
  • 17. Show And Tell “Show and Tell: A Neural Image Caption Generator” (2015)
  • 18. Hierarchical Recurrent Encoder-Decoder (HRED) “A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion” (2015)
  • 20. Latent Variable HRED (VHRED) “A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues” (2016)
  • 21. ✘ Speaker embedding ✘ 채팅 데이터에 없는 정보를 추론할 수 있음 ✘ Ex) 영국에 사는 Rob / Josh ○ 둘 다 영국 거주 => 비슷한 speaker embedding ○ Rob 만 챗봇에게 자기가 영국에 산다고 대답한 적 있음 ○ 하지만 speaker embedding 이 비슷하기 때문에 Josh도 영국에 살 것이라고 유추할 수 있음 Persona-Based Conversation Model “A Persona-Based Neural Conversation Model” (2016)
  • 22. CoVe (Context Vector) “Learned in Translation: Contextualized Word Vectors” (2017) ✘ Transfer Learning ○ 대용량 데이터에서 학습한 모델은 다른 모델에게 학습한 내용을 잘 전달할 수 있다 ○ Pretraining / Fine-tuning ✘ MT-LSTM ○ Seq2Seq + Attention 번역 모델은 임의의 텍스트를 좋은 distribution의 벡터를 생성할 것이다 ○ 그럼 그냥 그 번역 모델의 Encoder를 Feature Extractor 로 사용하자! Glove Glove
  • 23. CoVe (Context Vector) “Learned in Translation: Contextualized Word Vectors” (2017)
  • 24. 3. Attention Mechanism Attention Mechanism Different attention scoring Global / Local attention
  • 25. ✘ Encoder RNN 의 마지막 Output 만으로 Source 문장을 나타내는 것은 상당한 정보 손실 ✘ Decoder의 매 스텝마다 Source 문장을 문맥에 맞게 새로 백터화, 이를 바탕으로 단어 생성 ✘ Interactive demo of distill.pub Attention
  • 26. Attention “Learned in Translation: Contextualized Word Vectors” (2017)
  • 27. Attention “Learned in Translation: Contextualized Word Vectors” (2017)
  • 28. Batch x Vocab Batch x hidden Batch x Vocab Batch x hidden Batch x Vocab Batch x embed Batch x embed “Sequence to Sequence Learning with Neural Networks” (2014)
  • 29. “Effective Approaches to Attention-based Neural Machine Translation” (2015) Batch x Vocab Batch x hidden Batch x Vocab Batch x Vocab Batch x embed Batch x embed Batch x Seq Batch x hidden
  • 30. “Effective Approaches to Attention-based Neural Machine Translation” (2015) Batch x Vocab Batch x hidden Batch x Vocab Batch x Vocab Batch x embed Batch x embed Batch x Seq Batch x hidden
  • 31. “Effective Approaches to Attention-based Neural Machine Translation” (2015) Batch x Vocab Batch x hidden Batch x Vocab Batch x Vocab Batch x embed Batch x embed Batch x Seq Batch x hidden Batch x hidden * 2 Batch x hidden
  • 32. “Effective Approaches to Attention-based Neural Machine Translation” (2015) Batch x Vocab Batch x hidden Batch x Vocab Batch x Vocab Batch x embed Batch x embed Batch x Seq Batch x hidden Batch x hidden * 2 Batch x hidden
  • 33. “Effective Approaches to Attention-based Neural Machine Translation” (2015) Batch x Vocab Batch x hidden Batch x Vocab Batch x Vocab Batch x embed Batch x embed Batch x hidden Batch x hidden Batch x hidden * 2 Batch x Seq
  • 34. Bahdanau Attention “Neural machine translation by jointly learning to align and translate” (2015)
  • 35. Different scoring methods “Effective Approaches to Attention-based Neural Machine Translation” (2015) ✘ Luong ✘ Bahdanau ✘ ht : target state ✘ hs : source states ✘ s: target state ✘ h: source state
  • 36. Global and Local Attention “Effective Approaches to Attention-based Neural Machine Translation” (2015)
  • 37. 4. Advanced Attention Mechanism Image Captioning: Show, Attend and Tell Pointing Attention: Pointer Networks Copying Mechanism: CopyNet Bidirectional Attention: Bidirectional Attention Flow (BIDAF) Self-Attention: Transformer
  • 38. Show, Attend And Tell “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention” (2015)
  • 39. Pointing Attention (Pointer Networks) “Pointer Networks” (2016) Seq2Seq + Attention Pointer Networks
  • 40. Copying Mechanism (CopyNet) “Incorporating Copying Mechanism in Sequence-to-Sequence Learning” (2016)
  • 41. Bi-directional Attention Flow (BIDAF) “Bi-Directional Attention Flow for Machine Comprehension” (2016) ✘ Question Answering ○ 2016년 말 SQUAD SOTA ○ 정답은 지문 속에 있는 sub-phrase ○ ‘시작’ 과 ‘끝’ 부분 찾아서 ‘밑줄 치기’ ✘ Bi-attention ○ 현재 읽는 지문의 단어들과 가장 가까운 문제의 단어는? ○ 현재 읽는 문제의 단어들과 가장 가까운 지문의 단어는? ✘ Embedding ○ Word-embedding ○ Char-embedding ○ Highway Network ✘ Decoder ○ Pointer-like Network
  • 42. Bi-directional Attention Flow (BIDAF) “Bi-Directional Attention Flow for Machine Comprehension” (2016)
  • 43. Transformer “Attention is All You need” (2017) ✘ Machine Translation ✘ Self-attention Encoder-Decoder ✘ Multi-layer “Scaled Dot-product Multi-head” attention ✘ Positional Embeddings ✘ Residual Connection ✘ Byte-pair Encoding (BPE)
  • 44. Transformer “Attention is All You need” (2017) ✘ 기존의 CNN/RNN 기반 Language Understanding의 단점 ○ Path-length 가 멀다 ■ 네트워크 내에서 한 단어와 다른 단어 node 사이의 거리 ■ 멀수록 Long-Term dependency 잡아내기 힘듦 ○ Dilated Convolution, Attention 등으로 path length를 줄여 왔음 ✘ Transformer ○ self-attention만으로 이루어진 encoder-decoder ○ 주어진 token 개수만큼 hidden representation 생성
  • 45. Transformer “Attention is All You need” (2017) ✘ Positional Encoding ○ 토큰의 순서를 모델링 ■ i: 벡터의 몇 번째 원소인지 ✘ OpenNMT-py’s implementation ✘ Visualization
  • 46. Transformer “Attention is All You need” (2017) ✘ Attention ○ Attention layer를 encoder/decoder에 6겹 쌓음 ○ 3개의 입력 ■ Q, K, V (Query, Key, Value) ■ End-to-End Memory Networks 와 유사 ○ Attention Weight ■ Q, K의 dot product & softmax ■ dk 0.5 로 scaling (smoothing) ● 자기 자신에만 attention 쏠리는 것 방지 ○ V 벡터들의 weighted sum 출력
  • 47. Transformer “Attention is All You need” (2017) ✘ Attention 입력 문장 ○ Encoder ■ Q, K, V 모두 source sentence ○ Decoder ■ 첫 레이어 ● Q, K, V 모두 target sentence ■ 이후 레이어 ● Q: target sentence ● K, V: source sentence
  • 48. Transformer “Attention is All You need” (2017) ✘ Multi-head Attention ○ 이전 레이어의 출력 ■ Q, K, V ■ 모두 dmodel 차원 ○ h개의 WQ i , WK i , WV i 를 이용해서 dk , dk , dv 차원으로 projection ○ i = 1..h ■ Qi = Q @ WQ i (dmodel => dk ) ■ Ki = K @ WK i (dmodel => dk ) ■ Vi = V @ WV i (dmodel => dv ) ○ h개의 attention 결과를 concatenation (Inception과 비슷) ✘ 논문에서는 아래의 값 사용 ○ h=8 ○ dk = dv = dmodel / h = 64
  • 49. Transformer “Attention is All You need” (2017) ✘ 기타 ○ Point-wise Feed-Forward ■ 2-layer NN + ReLU ○ Layer norm / Residual Connection ■ Sublayer(x) = FFN(Multi-head Attention(x) ○ Embedding ■ dmodel * 0.5 ○ Optimizer ■ Adam ○ Regularization ■ Residual Dropout ● p = 0.1 ■ Attention Dropout ■ Label smoothing ● e = 0.1
  • 50. Transformer “Attention is All You need” (2017) ✘ Complexity / Maximum Path Length ○ n: sequence length ○ r: restricted size of the neighborhood (local attention) ✘ BLEU Score on WMT 2014
  • 51. Seq2Seq Implementations in PyTorch ✘ https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation ✘ https://github.com/OpenNMT/OpenNMT-py ✘ https://github.com/eladhoffer/seq2seq.pytorch ✘ https://github.com/IBM/pytorch-seq2seq ✘ https://github.com/allenai/allennlp ✘ Also, check out the curated ‘Awesome’ lists. ○ https://github.com/ritchieng/the-incredible-pytorch ○ https://github.com/bharathgs/Awesome-pytorch-list