Review of Natural Language Processing tasks and examples of why it is so hard. Then he describes in detail text categorization and particularly sentiment analysis. A few common approaches for predicting sentiment are discussed, going even further, explaining statistical machine learning algorithms.
2. AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● Engineering approach
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
3. Natural Language Processing
● Enables interaction between computers and
humans through natural languages
or “The branch of information science that deals
with natural language information”
● Natural language understanding - enabling
computers to derive meaning from human input
● Natural language generation
(Not neuro linguistic programming, still some magic applies)
4. NLP is everywhere
Google translate
Google ads
Google search
Siri / Question Answering
Chat bots
Spam generation / spam filtering
Gene and protein detection
Surveillance / marketing
5. Text Classification
● Automatically assign a piece of text to one or
more classes.
● History: Guess the author based on text
specifics and author style
1901: “One author prefers “em” as a short for
“them”- let's use this as feature!”
1970s: Who wrote “The Federalist Papers”?
6. Text Classification
● Spam or not spam
● News analysis: politics, sports, business
● Google ads verticals
26 root categories, 2200 subcategories
● Terrorist or not
Yes, they read your facebook and yes, they know...
7. Also Text Classification
● Detect truth / lie / sarcasm / joke
● Determine medical condition from hospital
records, patient description
● Guess stock prices
● “How will this press release affect company shares price”
● Sentiment analysis
8. Sentiment Analysis
● Determining writer's attitude
● Overall document: positive / negative / neutral
“We totally enjoyed our stay there!”
● Towards a target:
“Battery sucks, bends really well though”
● Detecting emotions: sad, happy, angry, excited
● Scales:
● Number of stars / -10 to +10 / percentage
● Subjective vs Objective
9. Classification for engineers
● Why bother with AI, keep it simple:
IF text contains “ em ”
AND NOT text contains “ them “
author is X
ELSE author is Y
● But what if...
10. Classification for engineers
● If author X decided to use “them” once?
Let's try a list of words that only author X uses
IF text contains a word from listX
author is X
ELSE try other rules
Find all the features !!!
11. Classification for engineers
● Build a super smart system of if-else
statements to classify correctly each document
● Solving the problem algorithmically
● An “expert system”
● Still used in practice for many applications
● Twitter “sentiment analysis” only rule: if text contains :) or :(
12. When to do engineering
● For very narrow tasks
● Determine if text is a url or email address
● For a very specific domain
● “If text contains a name of any US president, it's a legislation”
● To create a proof-of-concept
● Twitter “sentiment analysis” only rule: if text contains :) or :(
● When it's hard to get enough data (explained later)
13. AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
14. Supervised learning - Regression
“In statistics, regression
analysis is a statistical process
for estimating the relationships
among variables.”
● Create a hypothesis function based on the blue dots
● When a new X appears, calculate Y
The graph: X values are features, Y values are target values.
15. Linear Regression Example
● Let X be temperature
● Let Y be chance of rain
Create a function that predicts chance of rain, given temperature
(In reality X is a vector with many feature values)
16. Linear Regression Example
● Let X be temperature
● Let Y be chance of rain
Create a function that predicts chance of rain, given temperature
(In reality X is a vector with many feature values)
19. Supervised Learning -
Classification
“identifying to which of a set of
categories a new observation
belongs, on the basis of a training
set of data containing observations
(or instances) whose category
membership is known.”
Given a set of training instances, predict a continuous
valued output for new ones.
The graph: x1 and x2 are features, dot color is the target class.
20. Classification Example
● Let X1 be temperature
● Let X2 be humidity
Create a function that predicts rain or no rain.
(In reality X is a vector with many feature values)
21. 2D Example
● Let X be humidity
● Let Y = 0 for no rain
● Let Y = 1 for rain
Linear hypothesis function doesn't really make sense now.
Logistic function can approximate better.
23. AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
24. Agenda Explained
● Until now:
● What is text classification
● What is supervised learning (classification)
● Up next:
● How to apply supervised learning to text?
25. Statistical Sentiment Analysis
● Document: A piece of text
● Corpus: Set of documents
● Target: Y, positive/negative, emotion, percentage
● Training corpus: Set of documents for which we know Y
●
What is X?
●
How to convert a document to a (real-valued) vector
● Building training corpus
● Find “enough” data
26. Defining Features
● Each word: one-hot vector
● I = [0, 0, 0, 1, 0, 0, 0, …, 0]
● like = [1, 0, 0, 0, 0, 0, 0, …, 0]
● cookies = [0, 0, 0, 0, 0, 0, 1, …, 0]
● Number of dimensions = size of vocabulary
● Document: bag of words
● Order of words is lost
● Count of words can be added
● Term frequency / inverse document frequency
"I like cookies" = [1, 0, 0, 1, 0, 0, 1, …, 0]
27. Feature Engineering
● Ngrams (as one-hot)
● I, like, cookies - unigrams
● “I like” = [0, 0, 0, 0, 1, 0, …, 0] - bigrams
● “I like cookies” - trigrams
● Character n-grams:
● li, ik, ke, lik, ike
● Dictionaries:
● Great value for sentiment analysis
● Very good for domain specific text
If document contains any of:
{love, like, good, cool}
add this one: [0, 0, 1, 0, …, 0]
28. Feature Engineering
● Simple features
● Document Length
● Emoticons
● elooongated words
● ALL-CAPS
● Stopwords
● Through other classification methods:
● Parts of speech
● Negation contexts “I don't like cookies”
● Named Entities
● Approximate dimensions of X: 100k – 10m
29. Work Process
● Assemble training corpus
● Separate test corpus
● Invent new features
● Generate model (supervised learning)
● Test performance
● Repeat
30. Tips & Tricks
● Performance usually is
● precision / recall / accuracy / f-measure
● Simple Machine Learning with tons of features
● Even a linear classifier works
● Marketing
● Everyone uses different corpus (can't compare accuracy)
● Showing only what you're sure about
● Generalizing: “overall, 70% of your customers like you”
31. AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
32. A.I. - Why is it not working?
“Algorithmically solvable: A decision problem that can be
solved by an algorithm that halts on all inputs in a finite
number of steps.
“Unsolvable problem: A problem that cannot be solved for
all cases by any algorithm whatsoever”
● Artificial Intelligence: Develop intelligent systems, deal with
real world problems. It works... kind of...
- “Siri, will you marry me?”
- “My End User License Agreement does not cover marriage.
My apologies”
33. Challenges
● Annotation Guidelines
● Inter-annotator agreement
● SemEval
● Sentiment analysis corpus (~14k tweets)
● For 40% of tweets annotators didn't agree
"I don't know half of you half as well as I should like; and I like less
than half of you half as well as you deserve.”
Bilbo Baggins
34. Still not convinced?
● Context issues
● Narrowing the domain helps
● “beer is cool”, “soup is cool”
● “No babies yet!” - condoms / fertility drugs
● “Obama goes full Bush on Syria”
● User generated content SUCKS!
● “Polynesian sauce from chik fila a be so bomb”
● Common sense
“I tried the banana slicer and found it unacceptable. […] the
slicer is curved from left to right. All of my bananas are bent
the other way.”
35. AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
36. Word representations
● One-hot is sparse and meaningless
● N-dimensional vector for each word
● “Ubuntu” close to “Debian”
● “king” to “queen” = “man” to “woman”
● Based solely on word co-occurrence
n = 50 to 1000
37. Deep Learning
● Artificial Neural Networks
● Input - word embeddings
● Output – target class
● Complex layer structure
● No feature engineering
38. Tools
● NLTK – NLP in python
● GATE – NLP in java + GUI
● Stanford CoreNLP – NLP in java + deep neural networks
● AlchemyAPI – commercial API for NLP (free demo)
● MetaMind – enterprise sentiment analysis and computer vision (deep
neural networks)
● WolframAlpha – Smart question answering (knows maths)