SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Natural Language Processing: Techniques, Current Trends and
Applications in Industry
Rajkiran Veluri
What we will cover
• We will cover some of the common techniques used by NLP
practitioners

• We will discuss some interesting research trends

• We will discuss a few industry cases to illustrate the potential
of NLP

• Natural Language Processing is a very wide discipline. Hence,
we may not be able to cover the entire spectrum of NLP.
What is NLP
• Methods and Techniques that enable machines to analyse
and understand natural (human) language. Involves the
following concepts:

• Understanding language

• Reasoning about language

• Generating language

• Translating language
NATURAL LANGUAGE PROCESSING
What is NLP
COMPUTER
SCIENCE
LINGUISTICS
MACHINE
LEARNING
NLP: Main Components
• Morphology: Analysis and description of the structure of words . Morpheme:
smallest linguistic word with semantic meaning Examples: un,install . Lexeme:
unit that corresponds to set of forms taken by a word: Examples install -> install,
installed, installation,installing
• Lexicon: A particular meaning or properties associated with a single word 

• Syntax: The structure and order in which words can be combined to form
sentences

• Semantics: Combination of morphology and syntax with lexical meaning to form
the meaning of words and sentences

• Pragmatics: Use of language in a particular context. 

• Discourse Analysis: Analysis of relationship between sentences as they occur in a
sequence. Could be a monologue (one person) or dialogue (multiple people)
A bit of history….
• Machine Translation was one of the earliest applications(1940s).
Based on a dictionary lookup a sentence in one language could be
translated into another. 

• Machine Translation as code-breaking. A carry over of the Second
World War research on code-breaking. Most important was
German to English translation. Problems: Ambiguity in language
was a challenge to the MT approach.

• Linguistics was one of the main sources of contributions to NLP .
Noam Chomsky - Generative Grammar approach to understand
and generate language (1957)

• Contrasting approaches to NLP : statistical and linguistic (1960s)
A bit of history….
• Systems (1960 - 1980) focused on Case Grammars (Linking
verbs and nouns by prepositions), Augmented Transition
Networks (Using knowledge of language grammars to parse
sentences ) and Semantic Representations (Conceptual
Dependency between parts of a sentence). Combining
domain knowledge and statistical inference to design rule
based systems. 

• Current systems (2000-Present) — Machine Learning and
Deep Learning with faster CPUs, GPUs and storage. Combine
linguistics and statistics in machine learning models.
Research on contextual understanding and reasoning.
Ambiguity in Natural Language
• She wore small shoes and socks.

• Two interpretations for the noun modifier

• Source: https://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_history.html
?
PRP VBD JJ NN CC NN
Ambiguity in Natural Language
• Coreference Resolution: The trophy doesn’t fit into the brown suitcase
because it is too [large/small]’

• Need to go beyond syntax and semantics

• Source: (2018) Eisenstein J., "Natural Language Processing”, Ch 1.1, pg 3
pre
Main components of a NLP Pipeline
Sentence
Detection
Text
Cleaning
Tokenization
Domain Specific
Feature Extraction
Stopword
Removal
Stemming/
Lemmatization
Semantic Role
Labeling (SRL)
Word Sense
Disambiguation
Tagging
Part of Speech /
Dependency
Modeling with Machine
Learning Based / Rule
Based Algorithm
Downstream tasks
Spell Correction
Language
Detection
Some Examples of Downstream Tasks
• Named Entity Recognition (The capital of India is New Delhi -> India = Country, New Delhi = City)

• Sentiment Analysis (The movie was too good, I liked it very much -> Sentiment = Positive)

• Dialogue Generation (Application: Chatbots) (Example, User: I need to reset my password. ChatBot:
I can certainly help you with it)

• Question Answering (Example, given a snippet: Where does Bob live? -> Answer: New York)

• Sentence or Document Classification (Tweets, Emails and so on) (Example, given an email ->
Classification = Spam)

• Machine Translation (Example: (English) where have you been? -> (German) wo bist du gewesen?

• Natural Language Inference (NLI) (Sentence 1: Father and son are walking to the store. Sentence 2:
Three people are walking to the store -> Inference: Contradiction)

• Topic Modeling (Example, given a AirBnB dataset -> Reviews of private rooms)
Applications of NLP: HealthCare
• Vast amounts of data is patient data generated in healthcare by
clinicians, nurses and laboratory reports

• A lot of this data captured in patient Electronic Health Records
(EHRs). EHRs preserve historical patient information across
hospital visits within a EHR system/Healthcare provider.

• EHRs have lot of unstructured textual data and the format
across hospital systems vary a lot

• Domain-specific abbreviations, non-standard observations in
short text fragments, hypotheses, clinician notes during patient
visit (outpatient) as well as nurse notes (inpatient)
Applications of NLP: HealthCare
Source: https://ctakes.apache.org/whycTAKES.html
• Using a medical lexicon

• Match terms of interest from 

• Example-> ENT: Examined and Normal
Information Extraction
Disease/Diagnosis
Lexicon
Examined
Normal
Enlarged
…….
……..
……..
Regular
Expressions
(a-zA-Z)[:](a-zA-Z)([
and,.]? (not|no)?(a-
zA-Z ){1-3} )*
lexicon
• Using regular expressions (regex)
• Capture terms of interest based on
regular expression patterns
• Example-> Extremities: Ankle scar,
no joint damage
Machine Learning
Why Machine Learning?
• Rules are easy to create but need extensive testing for coverage
• Rules are difficult to maintain-> If format of dataset changes, rules need to be changed
• A machine learning algorithm with good generalisation can outperform rule-based
systems
• Machine Learning algorithms need a good amount of data to be trained
• Data is usually labeled examples which the machine learning model can learn
parameters. They can use these parameters to generalise to unseen examples.
• Labeled training data in some domains is hard to obtain!
• HealthCare - Hard and Expensive to obtain data
• News, Government Records (Public) - Might be easier to obtain
Deep Learning
Using a BiLSTM-CRF-LSTM character-embedding model for information extraction from a clinical note
Source: https://miro.medium.com/max/1320/1*3OHMG4dTYpGLwcAcyl6t2Q.png
Some other applications of
NLP in Healthcare
• Analysing medical transcription records

• Clinical trial matching 

• Data mining for research on disease information and
public health

• Computer assisted code generation for automated billing

• Biomarker discovery and computational phenotyping

• Clinical decision-making
Applications of NLP: Brand
Monitoring
• User sentiment and opinion analysis

• Going beyond star-ratings. Collect user preferences in
detail.

• Develop product recommendations from user opinions

• Discovering user group sentiments about a particular
product at a particular time period

• Strategy for new product development and existing
product improvement
Brand Monitoring: NLP Overview
Twitter
Facebook
Company
Website
Other
websites /
blogs
Sentiment
analysis on
reviews
comments
• Identify top
complaints from
users
• Identify products that
have bad reviews
• Identify if specific
customer segments
show specific
sentiment (location,
user type etc)
• Develop products for
specific groups of
customers that show
similar preferences
• Identify products
that need
improvement over
competitor products
• Recommend similar
products or to
similar customers
Sentim
ent
Analysis
User preferences,
product
review
m
etrics
of self and
com
petitor products
User Review Data
Customer Service
Strategy, Marketing,
Development
Calculate
similarities within
users and products
and cluster them
Classify reviews
by product,
product type,
geography
Word Embeddings -
Word2Vec
• Unsupervised method - Provides a notion of relatedness
between two words by capturing co-occurences between
words and projecting them onto a vector space.

• Shallow neural network . Two models:

• CBOW - Predicts the centre word from the surrounding
context words . Example : A chair made of wood

• Skip Gram - Predicts the surrounding context words from
the centre words. Example: A chair made of wood
Word2Vec Visualization
Image Source: https://blog.plaid.com/making-sense-of-messy-data/
Each learnt word is represented
by an n-dimensional vector
KING = [0.43 0.57 0.238 0.66 …. 0.5]
Word embeddings can be used as
inputs for
tasks like classification,
Named Entity Recognition
Why do we need contextual
word embeddings
• Non-contextual word embeddings (like word2vec) do not capture multiple
meanings of a word.

• For example, (1) ship -> dispatch (2) ship -> vehicle for navigating an ocean. 

• Context 1: The ship sailed across the Indian Ocean.

• Context 2: I will ship the required items today.

• Contextual word embeddings also capture the context of a word in the
sentence it occurs. So they take into account the whole sentence before
assigning it a vector value. It is natural to assume that the meaning of a word
depends on the context in which it is used.

• Examples of contextual word embeddings: BERT, ELMo, GPT-2
An example of Contextual
Embeddings - BERT
• BERT is developed on the concept of language models coupled with
Transformers

• The main motivation of BERT is transfer learning: to provide a
contextual basis for the learnt embeddings so that they could be
used to improve accuracy on downstream tasks.

• BERT improves accuracy on many downstream tasks - Natural
Language Inference - 4.6%, Question Answering (SQUAD) - 5.1%
and several other NLP tasks

• Bert Paper: https://arxiv.org/pdf/1810.04805.pdf

• Transformer Paper: https://arxiv.org/abs/1706.03762
Transformer
Source: https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04
BERT - Overview
Image Source: https://jalammar.github.io/illustrated-bert/
BERT is a trained Transformer Encoder stack developed by Google (2018) . BERT-Base
has 12 layers whereas BERT-Large has 24 layers
Bidirectional Encoder Representations from Transformers
Using BERT-Fine Tuning
Image Credit: https://jalammar.github.io/illustrated-bert/
Using BERT-Feature
Extraction
Image Source: https://jalammar.github.io/illustrated-bert/
BERT - Training
Image Source: https://jalammar.github.io/illustrated-bert/
Word Masking Next Sentence Prediction
Model Interpretability
• What is Model Interpretability?

• Model interpretability is the ability of humans to understand and
explain how a machine learning algorithm arrives at a decision.

• Motivations

• Reducing training data bias and improving fairness of models

• Transparency for ethical and legal reasons . Example: Why was a
person’s loan application rejected?

• Understanding generalisation and improving model performance
LIME ModelLocal Interpretable Model-Agnostic Explanations
LIME-General Procedure of Implementation
• Take a point which we want to interpret, P
• Sample instances around P and
weigh them by distance to P
• Learn a linear model from this procedure
• This linear model is a good local representation of the vicinity of
P but may not be generalisable globally
• Check out this link if you want to use Lime with
sklearn: https://marcotcr.github.io/lime/tutorials/Lime%20-
%20basic%20usage%2C%20two%20class%20case.html
Image Source: https://github.com/marcotcr/lime
LIME ModelLocal Interpretable Model-Agnostic Explanations
Removing “Posting” and “NNTP” from the input text reduces the
class prediction probability of “atheism” by 0.58 - (0.15 + 0.11) = 0.32
Image Source: https://github.com/marcotcr/lime
Thank you
• Email : rjk.veluri@gmail.com

Mais conteúdo relacionado

Mais procurados

A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languageshs0041
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingMustafa Jarrar
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 

Mais procurados (20)

A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language Processing
 
NLTK
NLTKNLTK
NLTK
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
NLP
NLPNLP
NLP
 
Nlp
NlpNlp
Nlp
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 

Semelhante a Natural Language Processing, Techniques, Current Trends and Applications in Industry

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsParisa Niksefat
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyAkshayaNagarajan10
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...hajinouha0
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptxAmanBadesra1
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
A Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IA Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IUNCResearchHub
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionMohammad Ilyas Malik
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for developmentAravind Reddy
 

Semelhante a Natural Language Processing, Techniques, Current Trends and Applications in Industry (20)

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
The tipping point
The tipping pointThe tipping point
The tipping point
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
A Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis IA Gentle Introduction to Text Analysis I
A Gentle Introduction to Text Analysis I
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
IR
IRIR
IR
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Natural Language Processing, Techniques, Current Trends and Applications in Industry

  • 1. Natural Language Processing: Techniques, Current Trends and Applications in Industry Rajkiran Veluri
  • 2. What we will cover • We will cover some of the common techniques used by NLP practitioners • We will discuss some interesting research trends • We will discuss a few industry cases to illustrate the potential of NLP • Natural Language Processing is a very wide discipline. Hence, we may not be able to cover the entire spectrum of NLP.
  • 3. What is NLP • Methods and Techniques that enable machines to analyse and understand natural (human) language. Involves the following concepts: • Understanding language • Reasoning about language • Generating language • Translating language NATURAL LANGUAGE PROCESSING
  • 5. NLP: Main Components • Morphology: Analysis and description of the structure of words . Morpheme: smallest linguistic word with semantic meaning Examples: un,install . Lexeme: unit that corresponds to set of forms taken by a word: Examples install -> install, installed, installation,installing • Lexicon: A particular meaning or properties associated with a single word • Syntax: The structure and order in which words can be combined to form sentences • Semantics: Combination of morphology and syntax with lexical meaning to form the meaning of words and sentences • Pragmatics: Use of language in a particular context. • Discourse Analysis: Analysis of relationship between sentences as they occur in a sequence. Could be a monologue (one person) or dialogue (multiple people)
  • 6. A bit of history…. • Machine Translation was one of the earliest applications(1940s). Based on a dictionary lookup a sentence in one language could be translated into another. • Machine Translation as code-breaking. A carry over of the Second World War research on code-breaking. Most important was German to English translation. Problems: Ambiguity in language was a challenge to the MT approach. • Linguistics was one of the main sources of contributions to NLP . Noam Chomsky - Generative Grammar approach to understand and generate language (1957) • Contrasting approaches to NLP : statistical and linguistic (1960s)
  • 7. A bit of history…. • Systems (1960 - 1980) focused on Case Grammars (Linking verbs and nouns by prepositions), Augmented Transition Networks (Using knowledge of language grammars to parse sentences ) and Semantic Representations (Conceptual Dependency between parts of a sentence). Combining domain knowledge and statistical inference to design rule based systems. • Current systems (2000-Present) — Machine Learning and Deep Learning with faster CPUs, GPUs and storage. Combine linguistics and statistics in machine learning models. Research on contextual understanding and reasoning.
  • 8. Ambiguity in Natural Language • She wore small shoes and socks. • Two interpretations for the noun modifier • Source: https://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_history.html ? PRP VBD JJ NN CC NN
  • 9. Ambiguity in Natural Language • Coreference Resolution: The trophy doesn’t fit into the brown suitcase because it is too [large/small]’ • Need to go beyond syntax and semantics • Source: (2018) Eisenstein J., "Natural Language Processing”, Ch 1.1, pg 3
  • 10. pre Main components of a NLP Pipeline Sentence Detection Text Cleaning Tokenization Domain Specific Feature Extraction Stopword Removal Stemming/ Lemmatization Semantic Role Labeling (SRL) Word Sense Disambiguation Tagging Part of Speech / Dependency Modeling with Machine Learning Based / Rule Based Algorithm Downstream tasks Spell Correction Language Detection
  • 11. Some Examples of Downstream Tasks • Named Entity Recognition (The capital of India is New Delhi -> India = Country, New Delhi = City) • Sentiment Analysis (The movie was too good, I liked it very much -> Sentiment = Positive) • Dialogue Generation (Application: Chatbots) (Example, User: I need to reset my password. ChatBot: I can certainly help you with it) • Question Answering (Example, given a snippet: Where does Bob live? -> Answer: New York) • Sentence or Document Classification (Tweets, Emails and so on) (Example, given an email -> Classification = Spam) • Machine Translation (Example: (English) where have you been? -> (German) wo bist du gewesen? • Natural Language Inference (NLI) (Sentence 1: Father and son are walking to the store. Sentence 2: Three people are walking to the store -> Inference: Contradiction) • Topic Modeling (Example, given a AirBnB dataset -> Reviews of private rooms)
  • 12. Applications of NLP: HealthCare • Vast amounts of data is patient data generated in healthcare by clinicians, nurses and laboratory reports • A lot of this data captured in patient Electronic Health Records (EHRs). EHRs preserve historical patient information across hospital visits within a EHR system/Healthcare provider. • EHRs have lot of unstructured textual data and the format across hospital systems vary a lot • Domain-specific abbreviations, non-standard observations in short text fragments, hypotheses, clinician notes during patient visit (outpatient) as well as nurse notes (inpatient)
  • 13. Applications of NLP: HealthCare Source: https://ctakes.apache.org/whycTAKES.html
  • 14. • Using a medical lexicon • Match terms of interest from • Example-> ENT: Examined and Normal Information Extraction Disease/Diagnosis Lexicon Examined Normal Enlarged ……. …….. …….. Regular Expressions (a-zA-Z)[:](a-zA-Z)([ and,.]? (not|no)?(a- zA-Z ){1-3} )* lexicon • Using regular expressions (regex) • Capture terms of interest based on regular expression patterns • Example-> Extremities: Ankle scar, no joint damage
  • 15. Machine Learning Why Machine Learning? • Rules are easy to create but need extensive testing for coverage • Rules are difficult to maintain-> If format of dataset changes, rules need to be changed • A machine learning algorithm with good generalisation can outperform rule-based systems • Machine Learning algorithms need a good amount of data to be trained • Data is usually labeled examples which the machine learning model can learn parameters. They can use these parameters to generalise to unseen examples. • Labeled training data in some domains is hard to obtain! • HealthCare - Hard and Expensive to obtain data • News, Government Records (Public) - Might be easier to obtain
  • 16. Deep Learning Using a BiLSTM-CRF-LSTM character-embedding model for information extraction from a clinical note Source: https://miro.medium.com/max/1320/1*3OHMG4dTYpGLwcAcyl6t2Q.png
  • 17. Some other applications of NLP in Healthcare • Analysing medical transcription records • Clinical trial matching • Data mining for research on disease information and public health • Computer assisted code generation for automated billing • Biomarker discovery and computational phenotyping • Clinical decision-making
  • 18. Applications of NLP: Brand Monitoring • User sentiment and opinion analysis • Going beyond star-ratings. Collect user preferences in detail. • Develop product recommendations from user opinions • Discovering user group sentiments about a particular product at a particular time period • Strategy for new product development and existing product improvement
  • 19. Brand Monitoring: NLP Overview Twitter Facebook Company Website Other websites / blogs Sentiment analysis on reviews comments • Identify top complaints from users • Identify products that have bad reviews • Identify if specific customer segments show specific sentiment (location, user type etc) • Develop products for specific groups of customers that show similar preferences • Identify products that need improvement over competitor products • Recommend similar products or to similar customers Sentim ent Analysis User preferences, product review m etrics of self and com petitor products User Review Data Customer Service Strategy, Marketing, Development Calculate similarities within users and products and cluster them Classify reviews by product, product type, geography
  • 20. Word Embeddings - Word2Vec • Unsupervised method - Provides a notion of relatedness between two words by capturing co-occurences between words and projecting them onto a vector space. • Shallow neural network . Two models: • CBOW - Predicts the centre word from the surrounding context words . Example : A chair made of wood • Skip Gram - Predicts the surrounding context words from the centre words. Example: A chair made of wood
  • 21. Word2Vec Visualization Image Source: https://blog.plaid.com/making-sense-of-messy-data/ Each learnt word is represented by an n-dimensional vector KING = [0.43 0.57 0.238 0.66 …. 0.5] Word embeddings can be used as inputs for tasks like classification, Named Entity Recognition
  • 22. Why do we need contextual word embeddings • Non-contextual word embeddings (like word2vec) do not capture multiple meanings of a word. • For example, (1) ship -> dispatch (2) ship -> vehicle for navigating an ocean. • Context 1: The ship sailed across the Indian Ocean. • Context 2: I will ship the required items today. • Contextual word embeddings also capture the context of a word in the sentence it occurs. So they take into account the whole sentence before assigning it a vector value. It is natural to assume that the meaning of a word depends on the context in which it is used. • Examples of contextual word embeddings: BERT, ELMo, GPT-2
  • 23. An example of Contextual Embeddings - BERT • BERT is developed on the concept of language models coupled with Transformers • The main motivation of BERT is transfer learning: to provide a contextual basis for the learnt embeddings so that they could be used to improve accuracy on downstream tasks. • BERT improves accuracy on many downstream tasks - Natural Language Inference - 4.6%, Question Answering (SQUAD) - 5.1% and several other NLP tasks • Bert Paper: https://arxiv.org/pdf/1810.04805.pdf • Transformer Paper: https://arxiv.org/abs/1706.03762
  • 25. BERT - Overview Image Source: https://jalammar.github.io/illustrated-bert/ BERT is a trained Transformer Encoder stack developed by Google (2018) . BERT-Base has 12 layers whereas BERT-Large has 24 layers Bidirectional Encoder Representations from Transformers
  • 26. Using BERT-Fine Tuning Image Credit: https://jalammar.github.io/illustrated-bert/
  • 27. Using BERT-Feature Extraction Image Source: https://jalammar.github.io/illustrated-bert/
  • 28. BERT - Training Image Source: https://jalammar.github.io/illustrated-bert/ Word Masking Next Sentence Prediction
  • 29. Model Interpretability • What is Model Interpretability? • Model interpretability is the ability of humans to understand and explain how a machine learning algorithm arrives at a decision. • Motivations • Reducing training data bias and improving fairness of models • Transparency for ethical and legal reasons . Example: Why was a person’s loan application rejected? • Understanding generalisation and improving model performance
  • 30. LIME ModelLocal Interpretable Model-Agnostic Explanations LIME-General Procedure of Implementation • Take a point which we want to interpret, P • Sample instances around P and weigh them by distance to P • Learn a linear model from this procedure • This linear model is a good local representation of the vicinity of P but may not be generalisable globally • Check out this link if you want to use Lime with sklearn: https://marcotcr.github.io/lime/tutorials/Lime%20- %20basic%20usage%2C%20two%20class%20case.html Image Source: https://github.com/marcotcr/lime
  • 31. LIME ModelLocal Interpretable Model-Agnostic Explanations Removing “Posting” and “NNTP” from the input text reduces the class prediction probability of “atheism” by 0.58 - (0.15 + 0.11) = 0.32 Image Source: https://github.com/marcotcr/lime
  • 32. Thank you • Email : rjk.veluri@gmail.com