SlideShare uma empresa Scribd logo
1 de 30
What is the best a machine
can do with text?
Introduction to recent advances in the filed of NLP
Rrubaa Panchendrarajan
Ph.D. Student
National University of Singapore
NLP tools in day-to-day life
Natural Language Processing (NLP)
• A sub-filed of Artificial Intelligence (AI)
• Aim : To build intelligent computers that can interact with human
being like a human being
• Interactions are either as writing or speaking (text/audio)
Dealing with text
Preprocessing Learning Application
Why does Preprocessing play a major role?
• Machines can understand only the numbers
• Text is unstructured
• Natural language is highly ambiguous
Ambiguity at word level
A world record
A record of the conversation
Record it
Ambiguity at sentence level
“I saw the man on the hill with a telescope”
1. I saw the man. The man was on the hill. I was using a telescope.
2. I saw the man. I was on the hill. I was using a telescope.
3. I saw the man. The man was on the hill. The hill had a telescope.
4. I saw the man. I was on the hill. The hill had a telescope.
5. I saw the man. The man was on the hill. I saw him using a telescope.
Why does Preprocessing play a major role?
• Machines can understand only the numbers
• Text is unstructured
• Natural language is highly ambiguous
• Language evolves with time
Core research areas
1. Lemmatization
2. Stemming
3. Sentence breaking
4. Morphological Analysis
5. Part-of-speech Tagging
6. Named-entity recognition
7. Word sense disambiguation
8. Lot more….
Named-entity recognition (NER)
• Task of identifying proper names in text and classifying into set of
predefined categories of interest
Lady Gaga is playing a concert for the Bushes in Texas next September
Person Person Location Time
• Applications
1. Question Answering (When is Lady Gaga playing … ? Obviously a time)
2. Machine Translation (Do not need to translate named entities)
3. etc.
Libraries for preprocessing
• NLTK, Genism & Spacy for Python
• Apache OpenNLP & Standford CoreNLP for Java
How to convert words to numbers?
• Straightforward option : One-hot vector representation
“the cat sat on the mat”
Vocabulary = {the, cat, sat, on, mat}
the = [1,0,0,0,0]
cat = [0,1,0,0,0]
sat = [0,0,1,0,0]
Issue with One-Hot vector representation
• Curse of dimensionality
Problem arises with the increase in dimension (vocabulary size)
e.g. memory, performance, processing time
• Not meaningful
Each word is represented arbitrarily & independently (Similarity between any two
vector is 0)
e.g. happy = [1,0,0], joy = [0,1,0]
Cosine similarity = 1*0 + 0*1 + 0*0 = 0
Better solution
• Learn a matrix WV*N , V : vocabulary size, N: fixed & small e.g. 100
• ith row in W indicates the vector (array) representation of ith word
• Train a model to learn W
• W is referred to as “Word Embedding”
Neural Networks came into play
• Organized as layers
• Each layer contains set of neurons
• Job of a single neuron is to process all the inputs to it and pass it to all
the neurons in the next layer
• When the number of hidden layer
is increased, the network become
“deep”
Neuron
• Each w is called weight
• It is initialized to random values and learnt during the learning
process
Neural Network
Network learns
3*4 weights here
Network learns
4*3 weights here
Network learns
3*1 weights here
Each layer learns a
weight matrix of size
input_size*neuron_size
Word2vec in 2013
• Created by a team of researchers led by Tomas Mikolov at Google
• Proposed two models to learn word embedding
1. CBOW
2. Skip Grams
Word2Vec
• Given a word in a sentence, its N surrounding words are called
“context”
• Given a word, Skip Gram trains a single layer neural network to
predict a word from its context
Context size = 2
Idea behind Word2Vec
I like to eat apple a lot
I like to eat orange a lot
• Context of both are same {to, eat, a, lot}
• In such case, model learns similar vector representation for apple &
orange
Skip Gram Model
Input word
represented as one-
hot vector
Size = V We define a small N
e.g. 100 to 1000
Out is probability
distribution over V
words
Size of the weight
matrix = V*N
Power of word embedding vectors
In practice
• Word2Vec is trained using Google news corpus of size 6B
• Most frequent 1M words are set as the vocabulary
• Another one called Glove released in 2014
• Both are publicly available & commonly used
Word2Vec - https://code.google.com/archive/p/word2vec/
Glove - https://nlp.stanford.edu/projects/glove/
Next focus of the community?
• Words in a sentence are ordered
• Focus was on handling long sequential information using neural
networks
• Different types of neural networks are exploited with time
RNN -> BRU -> LSTM
• Adopting these architecture led to human level performance in many
application
Language Modelling (LM)
• Given a sequence of words, predict the next word
• X = sequence of words, Y = next word in the corpus
• First layer always learns word embedding matrix
Models knows
breakfast, lunch &
dinner are similar
terms.
Popular Language Models
• GPT3 – Released by OpenAI
Largest model so far. 175B parameters in the neural network
Trained using everything scrapped from the internet
Team itself warned about the misuse of the model
• BERT – Released by Google
• CodeBERT – Released by Microsoft
Was it written by a machine or a human
Talk to Transformers: https://app.inferkit.com/demo
Can these models replace human?
• Chatbots for customer assistance is already there
• Soon its going to challenge many other professions that involves
writing including software developers1
1. https://analyticsindiamag.com/5-jobs-that-gpt-3-might-challenge/
Future of NLP
• Human level performance in existing research areas
• Replacing human with machines wherever it is possible
• Scaling the performance with increase of content in internet
• Expanding the research to more fields e.g. medical, literature
• Understanding the user with what they write
Thank You

Mais conteúdo relacionado

Mais procurados

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Pythonbotsplash.com
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingCloudxLab
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash CourseCharlie Greenbacker
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue SystemsMLReview
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 wordsananth
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaAI Frontiers
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 

Mais procurados (20)

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash Course
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 

Semelhante a An Introduction to Recent Advances in the Field of NLP

Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language AnalysisRudradeb Mitra
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and originShubhankar Mohan
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptxbuivantan_uneti
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?NAVER Engineering
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0Plain Concepts
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 

Semelhante a An Introduction to Recent Advances in the Field of NLP (20)

Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language Analysis
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

An Introduction to Recent Advances in the Field of NLP

  • 1. What is the best a machine can do with text? Introduction to recent advances in the filed of NLP Rrubaa Panchendrarajan Ph.D. Student National University of Singapore
  • 2. NLP tools in day-to-day life
  • 3. Natural Language Processing (NLP) • A sub-filed of Artificial Intelligence (AI) • Aim : To build intelligent computers that can interact with human being like a human being • Interactions are either as writing or speaking (text/audio)
  • 4. Dealing with text Preprocessing Learning Application
  • 5. Why does Preprocessing play a major role? • Machines can understand only the numbers • Text is unstructured • Natural language is highly ambiguous
  • 6. Ambiguity at word level A world record A record of the conversation Record it
  • 7. Ambiguity at sentence level “I saw the man on the hill with a telescope” 1. I saw the man. The man was on the hill. I was using a telescope. 2. I saw the man. I was on the hill. I was using a telescope. 3. I saw the man. The man was on the hill. The hill had a telescope. 4. I saw the man. I was on the hill. The hill had a telescope. 5. I saw the man. The man was on the hill. I saw him using a telescope.
  • 8. Why does Preprocessing play a major role? • Machines can understand only the numbers • Text is unstructured • Natural language is highly ambiguous • Language evolves with time
  • 9. Core research areas 1. Lemmatization 2. Stemming 3. Sentence breaking 4. Morphological Analysis 5. Part-of-speech Tagging 6. Named-entity recognition 7. Word sense disambiguation 8. Lot more….
  • 10. Named-entity recognition (NER) • Task of identifying proper names in text and classifying into set of predefined categories of interest Lady Gaga is playing a concert for the Bushes in Texas next September Person Person Location Time • Applications 1. Question Answering (When is Lady Gaga playing … ? Obviously a time) 2. Machine Translation (Do not need to translate named entities) 3. etc.
  • 11. Libraries for preprocessing • NLTK, Genism & Spacy for Python • Apache OpenNLP & Standford CoreNLP for Java
  • 12. How to convert words to numbers? • Straightforward option : One-hot vector representation “the cat sat on the mat” Vocabulary = {the, cat, sat, on, mat} the = [1,0,0,0,0] cat = [0,1,0,0,0] sat = [0,0,1,0,0]
  • 13. Issue with One-Hot vector representation • Curse of dimensionality Problem arises with the increase in dimension (vocabulary size) e.g. memory, performance, processing time • Not meaningful Each word is represented arbitrarily & independently (Similarity between any two vector is 0) e.g. happy = [1,0,0], joy = [0,1,0] Cosine similarity = 1*0 + 0*1 + 0*0 = 0
  • 14. Better solution • Learn a matrix WV*N , V : vocabulary size, N: fixed & small e.g. 100 • ith row in W indicates the vector (array) representation of ith word • Train a model to learn W • W is referred to as “Word Embedding”
  • 15. Neural Networks came into play • Organized as layers • Each layer contains set of neurons • Job of a single neuron is to process all the inputs to it and pass it to all the neurons in the next layer • When the number of hidden layer is increased, the network become “deep”
  • 16. Neuron • Each w is called weight • It is initialized to random values and learnt during the learning process
  • 17. Neural Network Network learns 3*4 weights here Network learns 4*3 weights here Network learns 3*1 weights here Each layer learns a weight matrix of size input_size*neuron_size
  • 18. Word2vec in 2013 • Created by a team of researchers led by Tomas Mikolov at Google • Proposed two models to learn word embedding 1. CBOW 2. Skip Grams
  • 19. Word2Vec • Given a word in a sentence, its N surrounding words are called “context” • Given a word, Skip Gram trains a single layer neural network to predict a word from its context Context size = 2
  • 20. Idea behind Word2Vec I like to eat apple a lot I like to eat orange a lot • Context of both are same {to, eat, a, lot} • In such case, model learns similar vector representation for apple & orange
  • 21. Skip Gram Model Input word represented as one- hot vector Size = V We define a small N e.g. 100 to 1000 Out is probability distribution over V words Size of the weight matrix = V*N
  • 22. Power of word embedding vectors
  • 23. In practice • Word2Vec is trained using Google news corpus of size 6B • Most frequent 1M words are set as the vocabulary • Another one called Glove released in 2014 • Both are publicly available & commonly used Word2Vec - https://code.google.com/archive/p/word2vec/ Glove - https://nlp.stanford.edu/projects/glove/
  • 24. Next focus of the community? • Words in a sentence are ordered • Focus was on handling long sequential information using neural networks • Different types of neural networks are exploited with time RNN -> BRU -> LSTM • Adopting these architecture led to human level performance in many application
  • 25. Language Modelling (LM) • Given a sequence of words, predict the next word • X = sequence of words, Y = next word in the corpus • First layer always learns word embedding matrix Models knows breakfast, lunch & dinner are similar terms.
  • 26. Popular Language Models • GPT3 – Released by OpenAI Largest model so far. 175B parameters in the neural network Trained using everything scrapped from the internet Team itself warned about the misuse of the model • BERT – Released by Google • CodeBERT – Released by Microsoft
  • 27. Was it written by a machine or a human Talk to Transformers: https://app.inferkit.com/demo
  • 28. Can these models replace human? • Chatbots for customer assistance is already there • Soon its going to challenge many other professions that involves writing including software developers1 1. https://analyticsindiamag.com/5-jobs-that-gpt-3-might-challenge/
  • 29. Future of NLP • Human level performance in existing research areas • Replacing human with machines wherever it is possible • Scaling the performance with increase of content in internet • Expanding the research to more fields e.g. medical, literature • Understanding the user with what they write