SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Representation Learning
of Vector of Words and
Phrases
Felipe Moraes
felipemoraes@dcc.ufmg.br
1
LATIN - LAboratory for Treating INformation
Agenda
• Motivation
• One-hot-encoding
• Language Models
• Neural Language Models
• Neural Net Language Models (NN-LMs) (Bengio et al., ’03)
• Word2Vec (Mikolov et al., ’13).
• Supervised Prediction Tasks
• Recursive NNs (Socher et al., ’11).
• Paragraph Vector (Le & Mikolov, ’14).
Motivation
Input Learning Algorithm Output
Let's start with words for now
How to learn good
representations for texts?
One-hot encoding
• Form vocabulary of words that maps lemmatized words to a
unique ID (position of word in vocabulary)
• Typical vocabulary sizes will vary between 10 000 and 250 000
One-hot encoding
• From its word ID, we get a basic representation of a word through
the one-hot encoding of the ID
• the one-hot vector of an ID is a vector filled with 0s, except for a 1
at the position associated with the ID
• ex.: for vocabulary size D=10, the one-hot vector of word ID w=4
is

e(w) = [ 0 0 0 1 0 0 0 0 0 0 ]
• a one-hot encoding makes no assumption about word similarity
• all words are equally different from each other
• this is a natural representation to start with, though a poor one
One-hot encoding
• The major problem with the one-hot representation is that it is very high-
dimensional
• the dimensionality of e(w) is the size of the vocabulary
• a typical vocabulary size is ≈100 000
• a window of 10 words would correspond to an input vector of at least 1 000
000 units
• This has 2 consequences:
• vulnerability to overfitting
• millions of inputs means millions of parameters to train in a regular neural
network
• computationally expensive
Language Modeling
• A language model is a probabilistic model that assigns
probabilities to any sequence of words p(w1, ... ,wT)
• language modeling is the task of learning a language model
that assigns high probabilities to well formed sentences
• plays a crucial role in speech recognition and machine
translation systems
uma pessoa inteligente
a smart person
a person smart
N-gram Model
• An n-gram is a sequence of n words
• unigrams(n=1):’‘is’’,‘‘a’’,‘‘sequence’’,etc.
• bigrams(n=2): [‘‘is’’,‘‘a’’], [‘’a’’,‘‘sequence’’],etc.
• trigrams(n=3): [‘’is’’,‘‘a’’,‘‘sequence’’],[‘‘a’’,‘‘sequence’’,‘‘of’’],etc.
• n-gram models estimate the conditional from n-grams counts
• the counts are obtained from a training corpus (a data set of word
text)

N-gram Model
• Issue: data sparsity
• we want n to be large, for the model to be realistic
• however, for large values of n, it is likely that a given n-gram
will not have been observed in the training corpora
• smoothing the counts can help
• combine count(w1 , w2 , w3 , w4), count(w2 , w3 , w4),
count(w3 , w4), and count(w4) to estimate p(w4 |w1, w2,
w3)
• this only partly solves the problem
Neural Network Language
Model
• Solution:
• model the conditional p(wt | wt−(n−1) , ... ,wt−1)
with a neural network
• learn word representations to allow transfer to n-
grams not observed in training corpus
Neural Network Language
Model
Bengio, Y., Schwenk, H., Sencal, J. S., Morin, F., & Gauvain, J. L. (2006). Neural probabilistic
language models. In Innovations in Machine Learning (pp. 137-186). Springer Berlin Heidelberg.
NNLMs
• Predicting the probability of each next word is slow
in NNLMs because the output layer of the network
is the size of the dictionary.
Word2Vec
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jerey Dean. Distributed
Representations of Words and Phrases and their Compositionality. NIPS, 2013.
Word2Vec
San Francisco France
• It was recently shown that the word vectors capture many linguistic regularities, fo
example:
• vector operations vector('Paris') - vector('France') + vector('Italy') results in a v
that is very close to vector('Rome')
• vector('king') - vector('man') + vector('woman') is close to vector(‘queen’)
From Words to Phrases
• How could we learn representations for phrases of arbitrary
length?
• can we model relationships between words and multiword
expressions
• ex.: ‘‘ consider ’’ ≈ ‘‘ take into account ’’
• can we extract a representation of full sentences that
preserves some of its semantic meaning
• ex.: ‘‘ word representations were learned from a corpus ’’
≈ ‘‘ we trained word representations on a text data set’‘
Recursive Neural Networks
• Idea: recursively merge pairs of word/phrase representations























• We need 2 things:
• a model that merges pairs of representations
• a model that determines the tree structure
Paragraph Vector
Paragraph Vector
Paragraph Vector
Applications
Paragraph Vector
Applications

Mais conteúdo relacionado

Mais procurados

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
Saswat Padhi
 

Mais procurados (20)

Natural Language Processing in Artificial Intelligence - Codeup #5 - PayU
Natural Language Processing in Artificial Intelligence  - Codeup #5 - PayU Natural Language Processing in Artificial Intelligence  - Codeup #5 - PayU
Natural Language Processing in Artificial Intelligence - Codeup #5 - PayU
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Word embedding
Word embedding Word embedding
Word embedding
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
AI_Session 11: searching with Non-Deterministic Actions and partial observati...
AI_Session 11: searching with Non-Deterministic Actions and partial observati...AI_Session 11: searching with Non-Deterministic Actions and partial observati...
AI_Session 11: searching with Non-Deterministic Actions and partial observati...
 
Lower bound
Lower boundLower bound
Lower bound
 
RegularLanguageProperties.pptx
RegularLanguageProperties.pptxRegularLanguageProperties.pptx
RegularLanguageProperties.pptx
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
First order logic
First order logicFirst order logic
First order logic
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
META-LEARNING.pptx
META-LEARNING.pptxMETA-LEARNING.pptx
META-LEARNING.pptx
 

Semelhante a Representation Learning of Vectors of Words and Phrases

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 

Semelhante a Representation Learning of Vectors of Words and Phrases (20)

Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
State-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsState-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word Representations
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 

Último

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 

Último (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Representation Learning of Vectors of Words and Phrases

  • 1. Representation Learning of Vector of Words and Phrases Felipe Moraes felipemoraes@dcc.ufmg.br 1 LATIN - LAboratory for Treating INformation
  • 2. Agenda • Motivation • One-hot-encoding • Language Models • Neural Language Models • Neural Net Language Models (NN-LMs) (Bengio et al., ’03) • Word2Vec (Mikolov et al., ’13). • Supervised Prediction Tasks • Recursive NNs (Socher et al., ’11). • Paragraph Vector (Le & Mikolov, ’14).
  • 4. Let's start with words for now How to learn good representations for texts?
  • 5. One-hot encoding • Form vocabulary of words that maps lemmatized words to a unique ID (position of word in vocabulary) • Typical vocabulary sizes will vary between 10 000 and 250 000
  • 6. One-hot encoding • From its word ID, we get a basic representation of a word through the one-hot encoding of the ID • the one-hot vector of an ID is a vector filled with 0s, except for a 1 at the position associated with the ID • ex.: for vocabulary size D=10, the one-hot vector of word ID w=4 is
 e(w) = [ 0 0 0 1 0 0 0 0 0 0 ] • a one-hot encoding makes no assumption about word similarity • all words are equally different from each other • this is a natural representation to start with, though a poor one
  • 7. One-hot encoding • The major problem with the one-hot representation is that it is very high- dimensional • the dimensionality of e(w) is the size of the vocabulary • a typical vocabulary size is ≈100 000 • a window of 10 words would correspond to an input vector of at least 1 000 000 units • This has 2 consequences: • vulnerability to overfitting • millions of inputs means millions of parameters to train in a regular neural network • computationally expensive
  • 8. Language Modeling • A language model is a probabilistic model that assigns probabilities to any sequence of words p(w1, ... ,wT) • language modeling is the task of learning a language model that assigns high probabilities to well formed sentences • plays a crucial role in speech recognition and machine translation systems uma pessoa inteligente a smart person a person smart
  • 9. N-gram Model • An n-gram is a sequence of n words • unigrams(n=1):’‘is’’,‘‘a’’,‘‘sequence’’,etc. • bigrams(n=2): [‘‘is’’,‘‘a’’], [‘’a’’,‘‘sequence’’],etc. • trigrams(n=3): [‘’is’’,‘‘a’’,‘‘sequence’’],[‘‘a’’,‘‘sequence’’,‘‘of’’],etc. • n-gram models estimate the conditional from n-grams counts • the counts are obtained from a training corpus (a data set of word text)

  • 10. N-gram Model • Issue: data sparsity • we want n to be large, for the model to be realistic • however, for large values of n, it is likely that a given n-gram will not have been observed in the training corpora • smoothing the counts can help • combine count(w1 , w2 , w3 , w4), count(w2 , w3 , w4), count(w3 , w4), and count(w4) to estimate p(w4 |w1, w2, w3) • this only partly solves the problem
  • 11. Neural Network Language Model • Solution: • model the conditional p(wt | wt−(n−1) , ... ,wt−1) with a neural network • learn word representations to allow transfer to n- grams not observed in training corpus
  • 12. Neural Network Language Model Bengio, Y., Schwenk, H., Sencal, J. S., Morin, F., & Gauvain, J. L. (2006). Neural probabilistic language models. In Innovations in Machine Learning (pp. 137-186). Springer Berlin Heidelberg.
  • 13. NNLMs • Predicting the probability of each next word is slow in NNLMs because the output layer of the network is the size of the dictionary.
  • 14. Word2Vec Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jerey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013.
  • 15. Word2Vec San Francisco France • It was recently shown that the word vectors capture many linguistic regularities, fo example: • vector operations vector('Paris') - vector('France') + vector('Italy') results in a v that is very close to vector('Rome') • vector('king') - vector('man') + vector('woman') is close to vector(‘queen’)
  • 16. From Words to Phrases • How could we learn representations for phrases of arbitrary length? • can we model relationships between words and multiword expressions • ex.: ‘‘ consider ’’ ≈ ‘‘ take into account ’’ • can we extract a representation of full sentences that preserves some of its semantic meaning • ex.: ‘‘ word representations were learned from a corpus ’’ ≈ ‘‘ we trained word representations on a text data set’‘
  • 17. Recursive Neural Networks • Idea: recursively merge pairs of word/phrase representations
 
 
 
 
 
 
 
 
 
 
 
 • We need 2 things: • a model that merges pairs of representations • a model that determines the tree structure