2. What we will cover
• We will cover some of the common techniques used by NLP
practitioners
• We will discuss some interesting research trends
• We will discuss a few industry cases to illustrate the potential
of NLP
• Natural Language Processing is a very wide discipline. Hence,
we may not be able to cover the entire spectrum of NLP.
3. What is NLP
• Methods and Techniques that enable machines to analyse
and understand natural (human) language. Involves the
following concepts:
• Understanding language
• Reasoning about language
• Generating language
• Translating language
NATURAL LANGUAGE PROCESSING
5. NLP: Main Components
• Morphology: Analysis and description of the structure of words . Morpheme:
smallest linguistic word with semantic meaning Examples: un,install . Lexeme:
unit that corresponds to set of forms taken by a word: Examples install -> install,
installed, installation,installing
• Lexicon: A particular meaning or properties associated with a single word
• Syntax: The structure and order in which words can be combined to form
sentences
• Semantics: Combination of morphology and syntax with lexical meaning to form
the meaning of words and sentences
• Pragmatics: Use of language in a particular context.
• Discourse Analysis: Analysis of relationship between sentences as they occur in a
sequence. Could be a monologue (one person) or dialogue (multiple people)
6. A bit of history….
• Machine Translation was one of the earliest applications(1940s).
Based on a dictionary lookup a sentence in one language could be
translated into another.
• Machine Translation as code-breaking. A carry over of the Second
World War research on code-breaking. Most important was
German to English translation. Problems: Ambiguity in language
was a challenge to the MT approach.
• Linguistics was one of the main sources of contributions to NLP .
Noam Chomsky - Generative Grammar approach to understand
and generate language (1957)
• Contrasting approaches to NLP : statistical and linguistic (1960s)
7. A bit of history….
• Systems (1960 - 1980) focused on Case Grammars (Linking
verbs and nouns by prepositions), Augmented Transition
Networks (Using knowledge of language grammars to parse
sentences ) and Semantic Representations (Conceptual
Dependency between parts of a sentence). Combining
domain knowledge and statistical inference to design rule
based systems.
• Current systems (2000-Present) — Machine Learning and
Deep Learning with faster CPUs, GPUs and storage. Combine
linguistics and statistics in machine learning models.
Research on contextual understanding and reasoning.
8. Ambiguity in Natural Language
• She wore small shoes and socks.
• Two interpretations for the noun modifier
• Source: https://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_history.html
?
PRP VBD JJ NN CC NN
9. Ambiguity in Natural Language
• Coreference Resolution: The trophy doesn’t fit into the brown suitcase
because it is too [large/small]’
• Need to go beyond syntax and semantics
• Source: (2018) Eisenstein J., "Natural Language Processing”, Ch 1.1, pg 3
10. pre
Main components of a NLP Pipeline
Sentence
Detection
Text
Cleaning
Tokenization
Domain Specific
Feature Extraction
Stopword
Removal
Stemming/
Lemmatization
Semantic Role
Labeling (SRL)
Word Sense
Disambiguation
Tagging
Part of Speech /
Dependency
Modeling with Machine
Learning Based / Rule
Based Algorithm
Downstream tasks
Spell Correction
Language
Detection
11. Some Examples of Downstream Tasks
• Named Entity Recognition (The capital of India is New Delhi -> India = Country, New Delhi = City)
• Sentiment Analysis (The movie was too good, I liked it very much -> Sentiment = Positive)
• Dialogue Generation (Application: Chatbots) (Example, User: I need to reset my password. ChatBot:
I can certainly help you with it)
• Question Answering (Example, given a snippet: Where does Bob live? -> Answer: New York)
• Sentence or Document Classification (Tweets, Emails and so on) (Example, given an email ->
Classification = Spam)
• Machine Translation (Example: (English) where have you been? -> (German) wo bist du gewesen?
• Natural Language Inference (NLI) (Sentence 1: Father and son are walking to the store. Sentence 2:
Three people are walking to the store -> Inference: Contradiction)
• Topic Modeling (Example, given a AirBnB dataset -> Reviews of private rooms)
12. Applications of NLP: HealthCare
• Vast amounts of data is patient data generated in healthcare by
clinicians, nurses and laboratory reports
• A lot of this data captured in patient Electronic Health Records
(EHRs). EHRs preserve historical patient information across
hospital visits within a EHR system/Healthcare provider.
• EHRs have lot of unstructured textual data and the format
across hospital systems vary a lot
• Domain-specific abbreviations, non-standard observations in
short text fragments, hypotheses, clinician notes during patient
visit (outpatient) as well as nurse notes (inpatient)
13. Applications of NLP: HealthCare
Source: https://ctakes.apache.org/whycTAKES.html
14. • Using a medical lexicon
• Match terms of interest from
• Example-> ENT: Examined and Normal
Information Extraction
Disease/Diagnosis
Lexicon
Examined
Normal
Enlarged
…….
……..
……..
Regular
Expressions
(a-zA-Z)[:](a-zA-Z)([
and,.]? (not|no)?(a-
zA-Z ){1-3} )*
lexicon
• Using regular expressions (regex)
• Capture terms of interest based on
regular expression patterns
• Example-> Extremities: Ankle scar,
no joint damage
15. Machine Learning
Why Machine Learning?
• Rules are easy to create but need extensive testing for coverage
• Rules are difficult to maintain-> If format of dataset changes, rules need to be changed
• A machine learning algorithm with good generalisation can outperform rule-based
systems
• Machine Learning algorithms need a good amount of data to be trained
• Data is usually labeled examples which the machine learning model can learn
parameters. They can use these parameters to generalise to unseen examples.
• Labeled training data in some domains is hard to obtain!
• HealthCare - Hard and Expensive to obtain data
• News, Government Records (Public) - Might be easier to obtain
16. Deep Learning
Using a BiLSTM-CRF-LSTM character-embedding model for information extraction from a clinical note
Source: https://miro.medium.com/max/1320/1*3OHMG4dTYpGLwcAcyl6t2Q.png
17. Some other applications of
NLP in Healthcare
• Analysing medical transcription records
• Clinical trial matching
• Data mining for research on disease information and
public health
• Computer assisted code generation for automated billing
• Biomarker discovery and computational phenotyping
• Clinical decision-making
18. Applications of NLP: Brand
Monitoring
• User sentiment and opinion analysis
• Going beyond star-ratings. Collect user preferences in
detail.
• Develop product recommendations from user opinions
• Discovering user group sentiments about a particular
product at a particular time period
• Strategy for new product development and existing
product improvement
19. Brand Monitoring: NLP Overview
Twitter
Facebook
Company
Website
Other
websites /
blogs
Sentiment
analysis on
reviews
comments
• Identify top
complaints from
users
• Identify products that
have bad reviews
• Identify if specific
customer segments
show specific
sentiment (location,
user type etc)
• Develop products for
specific groups of
customers that show
similar preferences
• Identify products
that need
improvement over
competitor products
• Recommend similar
products or to
similar customers
Sentim
ent
Analysis
User preferences,
product
review
m
etrics
of self and
com
petitor products
User Review Data
Customer Service
Strategy, Marketing,
Development
Calculate
similarities within
users and products
and cluster them
Classify reviews
by product,
product type,
geography
20. Word Embeddings -
Word2Vec
• Unsupervised method - Provides a notion of relatedness
between two words by capturing co-occurences between
words and projecting them onto a vector space.
• Shallow neural network . Two models:
• CBOW - Predicts the centre word from the surrounding
context words . Example : A chair made of wood
• Skip Gram - Predicts the surrounding context words from
the centre words. Example: A chair made of wood
21. Word2Vec Visualization
Image Source: https://blog.plaid.com/making-sense-of-messy-data/
Each learnt word is represented
by an n-dimensional vector
KING = [0.43 0.57 0.238 0.66 …. 0.5]
Word embeddings can be used as
inputs for
tasks like classification,
Named Entity Recognition
22. Why do we need contextual
word embeddings
• Non-contextual word embeddings (like word2vec) do not capture multiple
meanings of a word.
• For example, (1) ship -> dispatch (2) ship -> vehicle for navigating an ocean.
• Context 1: The ship sailed across the Indian Ocean.
• Context 2: I will ship the required items today.
• Contextual word embeddings also capture the context of a word in the
sentence it occurs. So they take into account the whole sentence before
assigning it a vector value. It is natural to assume that the meaning of a word
depends on the context in which it is used.
• Examples of contextual word embeddings: BERT, ELMo, GPT-2
23. An example of Contextual
Embeddings - BERT
• BERT is developed on the concept of language models coupled with
Transformers
• The main motivation of BERT is transfer learning: to provide a
contextual basis for the learnt embeddings so that they could be
used to improve accuracy on downstream tasks.
• BERT improves accuracy on many downstream tasks - Natural
Language Inference - 4.6%, Question Answering (SQUAD) - 5.1%
and several other NLP tasks
• Bert Paper: https://arxiv.org/pdf/1810.04805.pdf
• Transformer Paper: https://arxiv.org/abs/1706.03762
25. BERT - Overview
Image Source: https://jalammar.github.io/illustrated-bert/
BERT is a trained Transformer Encoder stack developed by Google (2018) . BERT-Base
has 12 layers whereas BERT-Large has 24 layers
Bidirectional Encoder Representations from Transformers
28. BERT - Training
Image Source: https://jalammar.github.io/illustrated-bert/
Word Masking Next Sentence Prediction
29. Model Interpretability
• What is Model Interpretability?
• Model interpretability is the ability of humans to understand and
explain how a machine learning algorithm arrives at a decision.
• Motivations
• Reducing training data bias and improving fairness of models
• Transparency for ethical and legal reasons . Example: Why was a
person’s loan application rejected?
• Understanding generalisation and improving model performance
30. LIME ModelLocal Interpretable Model-Agnostic Explanations
LIME-General Procedure of Implementation
• Take a point which we want to interpret, P
• Sample instances around P and
weigh them by distance to P
• Learn a linear model from this procedure
• This linear model is a good local representation of the vicinity of
P but may not be generalisable globally
• Check out this link if you want to use Lime with
sklearn: https://marcotcr.github.io/lime/tutorials/Lime%20-
%20basic%20usage%2C%20two%20class%20case.html
Image Source: https://github.com/marcotcr/lime
31. LIME ModelLocal Interpretable Model-Agnostic Explanations
Removing “Posting” and “NNTP” from the input text reduces the
class prediction probability of “atheism” by 0.58 - (0.15 + 0.11) = 0.32
Image Source: https://github.com/marcotcr/lime