SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
Introduction Language Modeling Machine Translation End
Contemporary Models of Natural Language Processing
Ekaterina Vylomova
June, 2017
Introduction Language Modeling Machine Translation End
Contents
1 Introduction
2 Language Modeling
N-Grams
Distributional Semantics
Learning Representations
Neural Language Models
Evaluation
3 Machine Translation
Statistical Machine Translation
Neural Machine Translation
Attentional Mechanism
Comparison of MT systems
Google MT
Introduction Language Modeling Machine Translation End
NLP and CL
Natural Language Processing: the art of solving engineering problems that need to analyze
(or generate) natural language text
Computational Linguistics: computational methods to answer the scientic questions of
linguistics
Introduction Language Modeling Machine Translation End
Tasks
Language Modeling
Sentiment Analysis
Machine Translation
POS Tagging
Text Classication
Question Answering
Recommender Systems
.. and many others!
Introduction Language Modeling Machine Translation End
Contents
1 Introduction
2 Language Modeling
N-Grams
Distributional Semantics
Learning Representations
Neural Language Models
Evaluation
3 Machine Translation
Statistical Machine Translation
Neural Machine Translation
Attentional Mechanism
Comparison of MT systems
Google MT
Introduction Language Modeling Machine Translation End
N-Grams
Language Modeling
A probability distribution over sequences of words P(w1w2w3...wn)-?.
Markov Assumption: P(w1w2w3...wn) ≈ i P(wi |wi−k ...wi−1)
N-Grams:
P(w1w2w3) = P(w1)P(w2|w1)P(w3|w2, w1)
Unigram: P(w1w2w3) = P(w1)P(w2)P(w3)
BiGram: P(w1w2w3) = P(w1)P(w2|w1)P(w3|w2)
...3-grams, 4-grams, etc.
where P(wi |wi−1) = count(wi−1,wi )
count(wi−1) Maximum Likelihood Estimation
Introduction Language Modeling Machine Translation End
N-Grams
Smoothing
(simple) Laplacian:
P(wi |wi−1
i−n+1) = δ+count(wi−n+1,wi )
δ|V |+ wi
count(wi−n+1,wi )
(advanced) Kneser-Ney (bigrams):
pKN (wi |wi−1) = max(count(wi−1,wi )−δ,0)
w count(wi−1,w ) + λwi−1
pKN (wi )
pKN (wi ) = |w :0count(w ,wi )|
|(w ,w ):0count(w ,w )|
λwi−1
= δ
w c(wi−1,w ) |w : 0  c(wi−1, w )|
Introduction Language Modeling Machine Translation End
N-Grams
N-Grams
Insucient because of:
Language has long-distance dependences
Not enough generalisation
Surface forms instead of meanings
Introduction Language Modeling Machine Translation End
Distributional Semantics
Distributional Semantics
Where the meaning comes from?
How to get the meaning of a word, a
phrase, a sentence?
Introduction Language Modeling Machine Translation End
Distributional Semantics
Distributional approach
Frege, Firth, Harris : A meaning of a word ≈ a meaning of a context
An example for Russian (word's stem is in bold)
...îí íå ïðîáåæèò áûñòðåå ãåïàðäà...
...ïåðåä ïðîáåãîì ëûæíèêè è âåòåðàíû...
...îíè âûáåãàþò íà ïðîåçæóþ ÷àñòü...
...åñòü , áåæàòü , ñïàòü è äûøàòü...
...áåãàòü è ïðûãàòü íà ïëàòôîðìó...
...ìû áûñòðî ïîáåæàëè ê ïëÿæó...
...äíåâíèê : ïî÷åìó èìåííî áåã...
...áåæàòü ââåðõ , ÿ ñòðóñèë ÷òî ëè...
...ïîáåæàë ê ñâîåé ìàìå...
...ãäå äåòè ìîãóò áåãàòü è èãðàòü...
...çàòåì îí âûáåãàë , îïðîêèäûâàë...
...äåòè èç ëþáûõ ñåìåé ñáåãàþò èç äîìó...
Introduction Language Modeling Machine Translation End
Learning Representations
Distributed Representations
Moving away from localrepresentations:
cat = [0000100]
dog = [0100000]
to Distributed :
doc1: A dog eating meat.
doc2 A dog chases a cat.
doc3: A car drives.
Recall Term-Document matrix:
a car cat chases dog drives eating meat
doc1 1 0 0 0 1 0 1 1
doc2 2 0 1 1 1 0 0 0
doc3 1 1 0 0 0 1 0 0
Term - Document matrix.
Introduction Language Modeling Machine Translation End
Learning Representations
Distributed Representations
doc1: A dog eating meat.
doc2 A dog chases a cat.
doc3: A car drives.
=== Term-Term Matrix: set the context! Window of size = [1,10]. Let's take 1.
Shorter windows (1-3) == syntactic
Longer windows (4-10)== semantic
a car cat chases dog drives eating meat
a 0 1 1 0 2 0 0 0
car 1 0 0 0 0 0 0 0
cat 1 0 0 0 0 0 0 0
chases 1 0 0 0 1 0 0 0
dog 1 0 0 0 0 0 1 0
drives 0 1 0 0 0 0 0 0
eating 0 0 0 0 0 0 0 1
meat 0 0 0 0 0 0 1 0
Introduction Language Modeling Machine Translation End
Learning Representations
Learning Representations
Raw counts are very skewed == Pointwise Mutual Information (PMI)
Negative numbers are problematic, so we will use Positive PMI:
PPMI(x, y) = max(PMI(x, y), 0)
Introduction Language Modeling Machine Translation End
Learning Representations
Learning Representations
The matrix is too large (typically millions of tokens/types)
The matrix is too sparse!
Curse of dimensionality
We want dense and short representations!
Introduction Language Modeling Machine Translation End
Learning Representations
Learning Representations
SVD/PCA
approximate N-dimensional data with fewer (most important) dimensions
by rotating the axes into a new space
the highest dimension captures the most variance in the original data
the next dimension captures the next most variance, etc.
Introduction Language Modeling Machine Translation End
Learning Representations
Learning Representations
Principal Component Analysis
Introduction Language Modeling Machine Translation End
Learning Representations
Learning Representations
Singular Value Decomposition
Store top k singular values instead of all m dimensions. So, each row in W is k-dimensional
Introduction Language Modeling Machine Translation End
Learning Representations
Learning Representations
Dense vs. Sparse representations
Dense vectors lead to:
denoising
better generalization
easier for classiers to properly weight the dimensions for the task
better at capturing higher order co-occurrence
Introduction Language Modeling Machine Translation End
Neural Language Models
Models
HLBL: A Scalable Hierarchical Distributed Language Model (Mnih, 2009)
J =
1
T
T
i=1
exp(˜wi wi + bi )
V
k=1 exp(˜wi wk + bk )
,
˜wi =
n−1
j=1 Cj wi−j is the context embedding, {Cj } are scaling matrices and b∗ bias terms.
SENNA (CW): Natural Language Processing (almost) from Scratch (Collobert, 2011)
J =
1
T
T
i=1
V
k=1
max 0, 1 − f (wi−c , . . . , wi−1, wi ) + f (wi−c , . . . , wi−1, wk ) ,
where the last c − 1 words are used as context, and f (x) is a non-linear function of the input
Introduction Language Modeling Machine Translation End
Neural Language Models
RNN: Linguistic Regularities in Continuous Space Word Representations
(Mikolov, 2013)
Recurrent Neural Language Model (Inspired by Elman, 1992)
s(t) = f (Uw(t) + Ws(t − 1)), y(t) = g(Vs(t))
f (z) = 1
1+e−z , g(zm) =
ezm
k ezk
Introduction Language Modeling Machine Translation End
Neural Language Models
Learning Representations  predictive models
Ecient Estimation of Word Representations in Vector Space (Mikolov, 2013)
Word2Vec CBOW and Skip-grams models (Mikolov, 2013)
Introduction Language Modeling Machine Translation End
Neural Language Models
Learning Representations
Skip-Gram Model (Mikolov, 2013)
Introduction Language Modeling Machine Translation End
Neural Language Models
Learning Representations: Skip-Gram with Negative Sampling
where σ(x) = 1/(1 + exp(−x))
Introduction Language Modeling Machine Translation End
Neural Language Models
GloVe: Global Vectors for Word Representation (Pennington, 2014)
GloVe
J =
1
2
V
f (Pij )(wi ˜wj − log Pij )2
wi is a vector for the left context, wj - vector for the right context, Pij - relative frequency of
word j in the context of word i, and f - weighting function
Introduction Language Modeling Machine Translation End
Neural Language Models
Evaluation
Intrinsic
Perplexity : how well a probability distribution or probability model predicts a sample.
2H(p)
= 2−sumx p(x)log2p(x)
,
where H(p) is the entropy of the distribution and x ranges over events.
Cross-Entropy: H(˜p, q) = − x ˜p(x)log2q(x),
where q is the model, ˜p(x) is the empirical distribution of the test sample
Extrinsic
Word Analogy Tasks
Introduction Language Modeling Machine Translation End
Neural Language Models
Linguistic Regularities in Continuous Space Word Representations (Mikolov,
2013)
Word Analogy Task, initially designed for word2vec
king is to man as queen is to ?
good is to best as smart is to ?
china is to beijing as russia is to ?
Introduction Language Modeling Machine Translation End
Neural Language Models
Linguistic Regularities in Continuous Space Word Representations (Mikolov,
2013)
Word Analogy Task vector(king) − vector(man) + vector(woman) ≈ vector(queen)
Use cosine similarity: x = argmaxx cos(x , a∗
− a + b), where a∗
, a, b are typically excluded
Introduction Language Modeling Machine Translation End
Neural Language Models
Comparison of Neural Language Models
Introduction Language Modeling Machine Translation End
Neural Language Models
Comparison of Neural Language Models
Introduction Language Modeling Machine Translation End
Neural Language Models
Is word2vec better than SVD and GloVE?
LevyGoldberg, 2014 : Neural Word Embedding as Implicit Matrix Factorization (word2vec
skip-gram with negative sampling)
Vylomova, 2016 : word2vec performs similar to SVD-PPMI on semantic and syntactic
evaluation tasks
Introduction Language Modeling Machine Translation End
Neural Language Models
Deep Models
Now let's get deeper...
Introduction Language Modeling Machine Translation End
Neural Language Models
Deep Models: LSTMs
Problem of vanishing gradient in RNN (for long-term dependencies). Solution: LSTMs(Long
Short-Term Memory) and GRUs(Gated-Recurrent Unit), perform similar.
More: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Introduction Language Modeling Machine Translation End
Neural Language Models
Deep Models: CNNs
Originating from Computer Vision
Introduction Language Modeling Machine Translation End
Neural Language Models
Deep Models: CNN-Highway (Kim et al, 2015)
Char-CNN+biLSTM
Introduction Language Modeling Machine Translation End
Neural Language Models
Deep Models: CNN-Highway (Kim et al, 2015)
Char-CNN+biLSTM
Introduction Language Modeling Machine Translation End
Contents
1 Introduction
2 Language Modeling
N-Grams
Distributional Semantics
Learning Representations
Neural Language Models
Evaluation
3 Machine Translation
Statistical Machine Translation
Neural Machine Translation
Attentional Mechanism
Comparison of MT systems
Google MT
Introduction Language Modeling Machine Translation End
Statistical Machine Translation
Parallel Corpus
Parallel Corpus
Popular corpora: Europarl, CommonCrawl MT Workshop: http://www.statmt.org/
Introduction Language Modeling Machine Translation End
Statistical Machine Translation
Learning Alignment
Alignment  word/phrase translation probabilities
Alignment for phrase-based MT
Introduction Language Modeling Machine Translation End
Statistical Machine Translation
SMT Model: Noisy Channel Model
Noisy Channel from Information Theory
Introduction Language Modeling Machine Translation End
Statistical Machine Translation
SMT Model: Noisy Channel Model
Recall Bayes Theorem P(B|A) = P(A|B)P(B)
P(A)
Bayesian Approach
Introduction Language Modeling Machine Translation End
Neural Machine Translation
First Neural MT models
Encoder-Decoder: encode the source sentence using RNN into a single vector and then
iteratively decode until EOS symbol is produced.
Introduction Language Modeling Machine Translation End
Neural Machine Translation
Sutskever et al, 2014 Sequence to Sequence Learning with Neural Networks
Deep LSTMs with 4 layers, 1000 cells at each layer and 1000 dimensional word embeddings
|Ve| = 160, 000 |Vf | = 80, 000. The resulting LSTM has 384M parameters.
Introduction Language Modeling Machine Translation End
Neural Machine Translation
Sutskever et al, 2014 Sequence to Sequence Learning with Neural Networks
Vector Space
Introduction Language Modeling Machine Translation End
Attentional Mechanism
Bahdanau et al., 2014 Neural Machine Translation by Jointly Learning to
Align and Translate
Let's learn the alignment!
Attentional Mechanism
Introduction Language Modeling Machine Translation End
Attentional Mechanism
Bahdanau et al., 2014 Neural Machine Translation by Jointly Learning to
Align and Translate
Attentional Mechanism (shamelessly stolen from nvidia tutorial)
Introduction Language Modeling Machine Translation End
Attentional Mechanism
Bahdanau et al., 2014 Neural Machine Translation by Jointly Learning to
Align and Translate
Good news: the ability to interpret and visualize what the model is doing (the alignment
weights)
An example of Alignment Matrix
Introduction Language Modeling Machine Translation End
Attentional Mechanism
Other applications of attentions
Other great papers to read
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (Xu et al.,
2015)
Grammar as a Foreign Language (Vinyals et al.,2014)
Teaching Machines to Read and Comprehend (Hermann et al., 2015)
Introduction Language Modeling Machine Translation End
Comparison of MT systems
Comparison of the SMT/Neural MT Models
Phrase-Based SMT baseline vs. Attentional vs. Seq2Seq (taken from Sutskever's paper)
Introduction Language Modeling Machine Translation End
Google MT
Google Translator
Google Translation is ocially neural!
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine
Translation (Wu and many others, 2016)
Introduction Language Modeling Machine Translation End
Thank you!
Introduction Language Modeling Machine Translation End
And many other ...
Subword models (character-level, morpheme-level)
Dialogue systems (iPavlov challenge from MIPT)
Transfer Learning, NLP for Low-Resource Languages
Other models such as memory networks, adversarial networks, etc.
Great researchers: Yoshua Bengio, Georey Hinton, Tomas Mikolov, Chris Dyer, Russ
Salakhutdinov, Kyunghyun Cho, Chris Manning, Hinrich Schuetze, Dan Jurafsky
RuSSIR-2017! Deadline June, 25th

Mais conteúdo relacionado

Mais procurados

GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015RIILP
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distancesGanesh Borle
 
Methods in Unsupervised Dependency Parsing
Methods in Unsupervised Dependency ParsingMethods in Unsupervised Dependency Parsing
Methods in Unsupervised Dependency ParsingMohammad Sadegh Rasooli
 
Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1Katerina Vylomova
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimEdgar Marca
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingFlorian Leitner
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy재연 윤
 
Propositional logic & inference
Propositional logic & inferencePropositional logic & inference
Propositional logic & inferenceSlideshare
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
natural language processing
natural language processing natural language processing
natural language processing sunanthakrishnan
 

Mais procurados (20)

GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Methods in Unsupervised Dependency Parsing
Methods in Unsupervised Dependency ParsingMethods in Unsupervised Dependency Parsing
Methods in Unsupervised Dependency Parsing
 
Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language Modeling
 
Fasttext 20170720 yjy
Fasttext 20170720 yjyFasttext 20170720 yjy
Fasttext 20170720 yjy
 
Nlp
NlpNlp
Nlp
 
Propositional logic & inference
Propositional logic & inferencePropositional logic & inference
Propositional logic & inference
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
natural language processing
natural language processing natural language processing
natural language processing
 

Semelhante a Contemporary Models of Natural Language Processing

Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptLamhotNaibaho3
 
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...asahiushio1
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translationguest873a50
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNetDomyoung Lee
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture NotesFellowBuddy.com
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Wordshytae
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Master's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciMaster's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciAlessandro Calmanovici
 

Semelhante a Contemporary Models of Natural Language Processing (20)

Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Esa act
Esa actEsa act
Esa act
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Master's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciMaster's Thesis Alessandro Calmanovici
Master's Thesis Alessandro Calmanovici
 

Mais de Katerina Vylomova

Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Katerina Vylomova
 
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...Katerina Vylomova
 
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionSigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionKaterina Vylomova
 
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...Katerina Vylomova
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionKaterina Vylomova
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Katerina Vylomova
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyKaterina Vylomova
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological InflectionKaterina Vylomova
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyKaterina Vylomova
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Katerina Vylomova
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Katerina Vylomova
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Katerina Vylomova
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsKaterina Vylomova
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian languageKaterina Vylomova
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationKaterina Vylomova
 

Mais de Katerina Vylomova (16)

Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages
 
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
 
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionSigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
 
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological Inflection
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational Morphology
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chants
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian language
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentation
 

Último

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 

Último (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Contemporary Models of Natural Language Processing

  • 1. Introduction Language Modeling Machine Translation End Contemporary Models of Natural Language Processing Ekaterina Vylomova June, 2017
  • 2. Introduction Language Modeling Machine Translation End Contents 1 Introduction 2 Language Modeling N-Grams Distributional Semantics Learning Representations Neural Language Models Evaluation 3 Machine Translation Statistical Machine Translation Neural Machine Translation Attentional Mechanism Comparison of MT systems Google MT
  • 3. Introduction Language Modeling Machine Translation End NLP and CL Natural Language Processing: the art of solving engineering problems that need to analyze (or generate) natural language text Computational Linguistics: computational methods to answer the scientic questions of linguistics
  • 4. Introduction Language Modeling Machine Translation End Tasks Language Modeling Sentiment Analysis Machine Translation POS Tagging Text Classication Question Answering Recommender Systems .. and many others!
  • 5. Introduction Language Modeling Machine Translation End Contents 1 Introduction 2 Language Modeling N-Grams Distributional Semantics Learning Representations Neural Language Models Evaluation 3 Machine Translation Statistical Machine Translation Neural Machine Translation Attentional Mechanism Comparison of MT systems Google MT
  • 6. Introduction Language Modeling Machine Translation End N-Grams Language Modeling A probability distribution over sequences of words P(w1w2w3...wn)-?. Markov Assumption: P(w1w2w3...wn) ≈ i P(wi |wi−k ...wi−1) N-Grams: P(w1w2w3) = P(w1)P(w2|w1)P(w3|w2, w1) Unigram: P(w1w2w3) = P(w1)P(w2)P(w3) BiGram: P(w1w2w3) = P(w1)P(w2|w1)P(w3|w2) ...3-grams, 4-grams, etc. where P(wi |wi−1) = count(wi−1,wi ) count(wi−1) Maximum Likelihood Estimation
  • 7. Introduction Language Modeling Machine Translation End N-Grams Smoothing (simple) Laplacian: P(wi |wi−1 i−n+1) = δ+count(wi−n+1,wi ) δ|V |+ wi count(wi−n+1,wi ) (advanced) Kneser-Ney (bigrams): pKN (wi |wi−1) = max(count(wi−1,wi )−δ,0) w count(wi−1,w ) + λwi−1 pKN (wi ) pKN (wi ) = |w :0count(w ,wi )| |(w ,w ):0count(w ,w )| λwi−1 = δ w c(wi−1,w ) |w : 0 c(wi−1, w )|
  • 8. Introduction Language Modeling Machine Translation End N-Grams N-Grams Insucient because of: Language has long-distance dependences Not enough generalisation Surface forms instead of meanings
  • 9. Introduction Language Modeling Machine Translation End Distributional Semantics Distributional Semantics Where the meaning comes from? How to get the meaning of a word, a phrase, a sentence?
  • 10. Introduction Language Modeling Machine Translation End Distributional Semantics Distributional approach Frege, Firth, Harris : A meaning of a word ≈ a meaning of a context An example for Russian (word's stem is in bold) ...îí íå ïðîáåæèò áûñòðåå ãåïàðäà... ...ïåðåä ïðîáåãîì ëûæíèêè è âåòåðàíû... ...îíè âûáåãàþò íà ïðîåçæóþ ÷àñòü... ...åñòü , áåæàòü , ñïàòü è äûøàòü... ...áåãàòü è ïðûãàòü íà ïëàòôîðìó... ...ìû áûñòðî ïîáåæàëè ê ïëÿæó... ...äíåâíèê : ïî÷åìó èìåííî áåã... ...áåæàòü ââåðõ , ÿ ñòðóñèë ÷òî ëè... ...ïîáåæàë ê ñâîåé ìàìå... ...ãäå äåòè ìîãóò áåãàòü è èãðàòü... ...çàòåì îí âûáåãàë , îïðîêèäûâàë... ...äåòè èç ëþáûõ ñåìåé ñáåãàþò èç äîìó...
  • 11. Introduction Language Modeling Machine Translation End Learning Representations Distributed Representations Moving away from localrepresentations: cat = [0000100] dog = [0100000] to Distributed : doc1: A dog eating meat. doc2 A dog chases a cat. doc3: A car drives. Recall Term-Document matrix: a car cat chases dog drives eating meat doc1 1 0 0 0 1 0 1 1 doc2 2 0 1 1 1 0 0 0 doc3 1 1 0 0 0 1 0 0 Term - Document matrix.
  • 12. Introduction Language Modeling Machine Translation End Learning Representations Distributed Representations doc1: A dog eating meat. doc2 A dog chases a cat. doc3: A car drives. === Term-Term Matrix: set the context! Window of size = [1,10]. Let's take 1. Shorter windows (1-3) == syntactic Longer windows (4-10)== semantic a car cat chases dog drives eating meat a 0 1 1 0 2 0 0 0 car 1 0 0 0 0 0 0 0 cat 1 0 0 0 0 0 0 0 chases 1 0 0 0 1 0 0 0 dog 1 0 0 0 0 0 1 0 drives 0 1 0 0 0 0 0 0 eating 0 0 0 0 0 0 0 1 meat 0 0 0 0 0 0 1 0
  • 13. Introduction Language Modeling Machine Translation End Learning Representations Learning Representations Raw counts are very skewed == Pointwise Mutual Information (PMI) Negative numbers are problematic, so we will use Positive PMI: PPMI(x, y) = max(PMI(x, y), 0)
  • 14. Introduction Language Modeling Machine Translation End Learning Representations Learning Representations The matrix is too large (typically millions of tokens/types) The matrix is too sparse! Curse of dimensionality We want dense and short representations!
  • 15. Introduction Language Modeling Machine Translation End Learning Representations Learning Representations SVD/PCA approximate N-dimensional data with fewer (most important) dimensions by rotating the axes into a new space the highest dimension captures the most variance in the original data the next dimension captures the next most variance, etc.
  • 16. Introduction Language Modeling Machine Translation End Learning Representations Learning Representations Principal Component Analysis
  • 17. Introduction Language Modeling Machine Translation End Learning Representations Learning Representations Singular Value Decomposition Store top k singular values instead of all m dimensions. So, each row in W is k-dimensional
  • 18. Introduction Language Modeling Machine Translation End Learning Representations Learning Representations Dense vs. Sparse representations Dense vectors lead to: denoising better generalization easier for classiers to properly weight the dimensions for the task better at capturing higher order co-occurrence
  • 19. Introduction Language Modeling Machine Translation End Neural Language Models Models HLBL: A Scalable Hierarchical Distributed Language Model (Mnih, 2009) J = 1 T T i=1 exp(˜wi wi + bi ) V k=1 exp(˜wi wk + bk ) , ˜wi = n−1 j=1 Cj wi−j is the context embedding, {Cj } are scaling matrices and b∗ bias terms. SENNA (CW): Natural Language Processing (almost) from Scratch (Collobert, 2011) J = 1 T T i=1 V k=1 max 0, 1 − f (wi−c , . . . , wi−1, wi ) + f (wi−c , . . . , wi−1, wk ) , where the last c − 1 words are used as context, and f (x) is a non-linear function of the input
  • 20. Introduction Language Modeling Machine Translation End Neural Language Models RNN: Linguistic Regularities in Continuous Space Word Representations (Mikolov, 2013) Recurrent Neural Language Model (Inspired by Elman, 1992) s(t) = f (Uw(t) + Ws(t − 1)), y(t) = g(Vs(t)) f (z) = 1 1+e−z , g(zm) = ezm k ezk
  • 21. Introduction Language Modeling Machine Translation End Neural Language Models Learning Representations predictive models Ecient Estimation of Word Representations in Vector Space (Mikolov, 2013) Word2Vec CBOW and Skip-grams models (Mikolov, 2013)
  • 22. Introduction Language Modeling Machine Translation End Neural Language Models Learning Representations Skip-Gram Model (Mikolov, 2013)
  • 23. Introduction Language Modeling Machine Translation End Neural Language Models Learning Representations: Skip-Gram with Negative Sampling where σ(x) = 1/(1 + exp(−x))
  • 24. Introduction Language Modeling Machine Translation End Neural Language Models GloVe: Global Vectors for Word Representation (Pennington, 2014) GloVe J = 1 2 V f (Pij )(wi ˜wj − log Pij )2 wi is a vector for the left context, wj - vector for the right context, Pij - relative frequency of word j in the context of word i, and f - weighting function
  • 25. Introduction Language Modeling Machine Translation End Neural Language Models Evaluation Intrinsic Perplexity : how well a probability distribution or probability model predicts a sample. 2H(p) = 2−sumx p(x)log2p(x) , where H(p) is the entropy of the distribution and x ranges over events. Cross-Entropy: H(˜p, q) = − x ˜p(x)log2q(x), where q is the model, ˜p(x) is the empirical distribution of the test sample Extrinsic Word Analogy Tasks
  • 26. Introduction Language Modeling Machine Translation End Neural Language Models Linguistic Regularities in Continuous Space Word Representations (Mikolov, 2013) Word Analogy Task, initially designed for word2vec king is to man as queen is to ? good is to best as smart is to ? china is to beijing as russia is to ?
  • 27. Introduction Language Modeling Machine Translation End Neural Language Models Linguistic Regularities in Continuous Space Word Representations (Mikolov, 2013) Word Analogy Task vector(king) − vector(man) + vector(woman) ≈ vector(queen) Use cosine similarity: x = argmaxx cos(x , a∗ − a + b), where a∗ , a, b are typically excluded
  • 28. Introduction Language Modeling Machine Translation End Neural Language Models Comparison of Neural Language Models
  • 29. Introduction Language Modeling Machine Translation End Neural Language Models Comparison of Neural Language Models
  • 30. Introduction Language Modeling Machine Translation End Neural Language Models Is word2vec better than SVD and GloVE? LevyGoldberg, 2014 : Neural Word Embedding as Implicit Matrix Factorization (word2vec skip-gram with negative sampling) Vylomova, 2016 : word2vec performs similar to SVD-PPMI on semantic and syntactic evaluation tasks
  • 31. Introduction Language Modeling Machine Translation End Neural Language Models Deep Models Now let's get deeper...
  • 32. Introduction Language Modeling Machine Translation End Neural Language Models Deep Models: LSTMs Problem of vanishing gradient in RNN (for long-term dependencies). Solution: LSTMs(Long Short-Term Memory) and GRUs(Gated-Recurrent Unit), perform similar. More: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 33. Introduction Language Modeling Machine Translation End Neural Language Models Deep Models: CNNs Originating from Computer Vision
  • 34. Introduction Language Modeling Machine Translation End Neural Language Models Deep Models: CNN-Highway (Kim et al, 2015) Char-CNN+biLSTM
  • 35. Introduction Language Modeling Machine Translation End Neural Language Models Deep Models: CNN-Highway (Kim et al, 2015) Char-CNN+biLSTM
  • 36. Introduction Language Modeling Machine Translation End Contents 1 Introduction 2 Language Modeling N-Grams Distributional Semantics Learning Representations Neural Language Models Evaluation 3 Machine Translation Statistical Machine Translation Neural Machine Translation Attentional Mechanism Comparison of MT systems Google MT
  • 37. Introduction Language Modeling Machine Translation End Statistical Machine Translation Parallel Corpus Parallel Corpus Popular corpora: Europarl, CommonCrawl MT Workshop: http://www.statmt.org/
  • 38. Introduction Language Modeling Machine Translation End Statistical Machine Translation Learning Alignment Alignment word/phrase translation probabilities Alignment for phrase-based MT
  • 39. Introduction Language Modeling Machine Translation End Statistical Machine Translation SMT Model: Noisy Channel Model Noisy Channel from Information Theory
  • 40. Introduction Language Modeling Machine Translation End Statistical Machine Translation SMT Model: Noisy Channel Model Recall Bayes Theorem P(B|A) = P(A|B)P(B) P(A) Bayesian Approach
  • 41. Introduction Language Modeling Machine Translation End Neural Machine Translation First Neural MT models Encoder-Decoder: encode the source sentence using RNN into a single vector and then iteratively decode until EOS symbol is produced.
  • 42. Introduction Language Modeling Machine Translation End Neural Machine Translation Sutskever et al, 2014 Sequence to Sequence Learning with Neural Networks Deep LSTMs with 4 layers, 1000 cells at each layer and 1000 dimensional word embeddings |Ve| = 160, 000 |Vf | = 80, 000. The resulting LSTM has 384M parameters.
  • 43. Introduction Language Modeling Machine Translation End Neural Machine Translation Sutskever et al, 2014 Sequence to Sequence Learning with Neural Networks Vector Space
  • 44. Introduction Language Modeling Machine Translation End Attentional Mechanism Bahdanau et al., 2014 Neural Machine Translation by Jointly Learning to Align and Translate Let's learn the alignment! Attentional Mechanism
  • 45. Introduction Language Modeling Machine Translation End Attentional Mechanism Bahdanau et al., 2014 Neural Machine Translation by Jointly Learning to Align and Translate Attentional Mechanism (shamelessly stolen from nvidia tutorial)
  • 46. Introduction Language Modeling Machine Translation End Attentional Mechanism Bahdanau et al., 2014 Neural Machine Translation by Jointly Learning to Align and Translate Good news: the ability to interpret and visualize what the model is doing (the alignment weights) An example of Alignment Matrix
  • 47. Introduction Language Modeling Machine Translation End Attentional Mechanism Other applications of attentions Other great papers to read Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (Xu et al., 2015) Grammar as a Foreign Language (Vinyals et al.,2014) Teaching Machines to Read and Comprehend (Hermann et al., 2015)
  • 48. Introduction Language Modeling Machine Translation End Comparison of MT systems Comparison of the SMT/Neural MT Models Phrase-Based SMT baseline vs. Attentional vs. Seq2Seq (taken from Sutskever's paper)
  • 49. Introduction Language Modeling Machine Translation End Google MT Google Translator Google Translation is ocially neural! Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu and many others, 2016)
  • 50. Introduction Language Modeling Machine Translation End Thank you!
  • 51. Introduction Language Modeling Machine Translation End And many other ... Subword models (character-level, morpheme-level) Dialogue systems (iPavlov challenge from MIPT) Transfer Learning, NLP for Low-Resource Languages Other models such as memory networks, adversarial networks, etc. Great researchers: Yoshua Bengio, Georey Hinton, Tomas Mikolov, Chris Dyer, Russ Salakhutdinov, Kyunghyun Cho, Chris Manning, Hinrich Schuetze, Dan Jurafsky RuSSIR-2017! Deadline June, 25th