An Introduction to Recent Advances in the Field of NLP
1. What is the best a machine
can do with text?
Introduction to recent advances in the filed of NLP
Rrubaa Panchendrarajan
Ph.D. Student
National University of Singapore
3. Natural Language Processing (NLP)
• A sub-filed of Artificial Intelligence (AI)
• Aim : To build intelligent computers that can interact with human
being like a human being
• Interactions are either as writing or speaking (text/audio)
5. Why does Preprocessing play a major role?
• Machines can understand only the numbers
• Text is unstructured
• Natural language is highly ambiguous
6. Ambiguity at word level
A world record
A record of the conversation
Record it
7. Ambiguity at sentence level
“I saw the man on the hill with a telescope”
1. I saw the man. The man was on the hill. I was using a telescope.
2. I saw the man. I was on the hill. I was using a telescope.
3. I saw the man. The man was on the hill. The hill had a telescope.
4. I saw the man. I was on the hill. The hill had a telescope.
5. I saw the man. The man was on the hill. I saw him using a telescope.
8. Why does Preprocessing play a major role?
• Machines can understand only the numbers
• Text is unstructured
• Natural language is highly ambiguous
• Language evolves with time
9. Core research areas
1. Lemmatization
2. Stemming
3. Sentence breaking
4. Morphological Analysis
5. Part-of-speech Tagging
6. Named-entity recognition
7. Word sense disambiguation
8. Lot more….
10. Named-entity recognition (NER)
• Task of identifying proper names in text and classifying into set of
predefined categories of interest
Lady Gaga is playing a concert for the Bushes in Texas next September
Person Person Location Time
• Applications
1. Question Answering (When is Lady Gaga playing … ? Obviously a time)
2. Machine Translation (Do not need to translate named entities)
3. etc.
12. How to convert words to numbers?
• Straightforward option : One-hot vector representation
“the cat sat on the mat”
Vocabulary = {the, cat, sat, on, mat}
the = [1,0,0,0,0]
cat = [0,1,0,0,0]
sat = [0,0,1,0,0]
13. Issue with One-Hot vector representation
• Curse of dimensionality
Problem arises with the increase in dimension (vocabulary size)
e.g. memory, performance, processing time
• Not meaningful
Each word is represented arbitrarily & independently (Similarity between any two
vector is 0)
e.g. happy = [1,0,0], joy = [0,1,0]
Cosine similarity = 1*0 + 0*1 + 0*0 = 0
14. Better solution
• Learn a matrix WV*N , V : vocabulary size, N: fixed & small e.g. 100
• ith row in W indicates the vector (array) representation of ith word
• Train a model to learn W
• W is referred to as “Word Embedding”
15. Neural Networks came into play
• Organized as layers
• Each layer contains set of neurons
• Job of a single neuron is to process all the inputs to it and pass it to all
the neurons in the next layer
• When the number of hidden layer
is increased, the network become
“deep”
16. Neuron
• Each w is called weight
• It is initialized to random values and learnt during the learning
process
17. Neural Network
Network learns
3*4 weights here
Network learns
4*3 weights here
Network learns
3*1 weights here
Each layer learns a
weight matrix of size
input_size*neuron_size
18. Word2vec in 2013
• Created by a team of researchers led by Tomas Mikolov at Google
• Proposed two models to learn word embedding
1. CBOW
2. Skip Grams
19. Word2Vec
• Given a word in a sentence, its N surrounding words are called
“context”
• Given a word, Skip Gram trains a single layer neural network to
predict a word from its context
Context size = 2
20. Idea behind Word2Vec
I like to eat apple a lot
I like to eat orange a lot
• Context of both are same {to, eat, a, lot}
• In such case, model learns similar vector representation for apple &
orange
21. Skip Gram Model
Input word
represented as one-
hot vector
Size = V We define a small N
e.g. 100 to 1000
Out is probability
distribution over V
words
Size of the weight
matrix = V*N
23. In practice
• Word2Vec is trained using Google news corpus of size 6B
• Most frequent 1M words are set as the vocabulary
• Another one called Glove released in 2014
• Both are publicly available & commonly used
Word2Vec - https://code.google.com/archive/p/word2vec/
Glove - https://nlp.stanford.edu/projects/glove/
24. Next focus of the community?
• Words in a sentence are ordered
• Focus was on handling long sequential information using neural
networks
• Different types of neural networks are exploited with time
RNN -> BRU -> LSTM
• Adopting these architecture led to human level performance in many
application
25. Language Modelling (LM)
• Given a sequence of words, predict the next word
• X = sequence of words, Y = next word in the corpus
• First layer always learns word embedding matrix
Models knows
breakfast, lunch &
dinner are similar
terms.
26. Popular Language Models
• GPT3 – Released by OpenAI
Largest model so far. 175B parameters in the neural network
Trained using everything scrapped from the internet
Team itself warned about the misuse of the model
• BERT – Released by Google
• CodeBERT – Released by Microsoft
27. Was it written by a machine or a human
Talk to Transformers: https://app.inferkit.com/demo
28. Can these models replace human?
• Chatbots for customer assistance is already there
• Soon its going to challenge many other professions that involves
writing including software developers1
1. https://analyticsindiamag.com/5-jobs-that-gpt-3-might-challenge/
29. Future of NLP
• Human level performance in existing research areas
• Replacing human with machines wherever it is possible
• Scaling the performance with increase of content in internet
• Expanding the research to more fields e.g. medical, literature
• Understanding the user with what they write