**Update 7/24/2014**
Cleaned it up. Likely to be the final version unless I figure out image processing. That may be an entirely different presentation though.
**
Presentation given for BarcampNOLA7.
Touched on a variety of topics like natural language processing, sentiment analysis, and ethics. Chose the context of Twitter since I'm more familiar with text processing than image processing. Twitter has some unique problems that make it not straightforward to take the data that is covered here.
Overall, the presentation went longer than expected (The time frame was 15-30 min). Didn't have much time for discussion, although one spawned about the inaccuracy of sentiment analysis. No questions, though I blame the length of the presentation. I also was expecting the room to be mostly computer programmers, but there were some business people, sales, and marketing.
Next time I present, I would give myself more time (45 seems more reasonable) to elaborate on important topics I had to skim through (ethics, programming algorithms, consumer psychology and irrational behavior).
2. Profiling in General
Takes observable qualities and estimates the person(s)
behind those qualities.
Example uses:
Criminal Profiling (Law and Order, Criminal Minds)
Video Games (Bartle Gamer Types)
Marketing (Intelligent Ad Retargeting)
Human Resources
3. Definitions
Consumer profile – what a person likes and dislikes in terms of
products, their spending habits based off their age, gender,
marital status, etc.
Sentiment analysis – studying how a person(s) feel about
something through scientific measurement; can be positive,
negative, or neutral.
6. Straight From Twitter's Privacy
Policy
Advertising:
“To help us deliver ads, measure their performance, and make
them more relevant to you based on criteria like your activity on
Twitter and visits to our ad partners' websites”
People may object this as an invasion of privacy, but you're
stepping into their environment and trying to play by your rules.
7. Why Twitter?
“Tweets” are limited to 140 characters, providing succinct tidbits
of information.
Accessible on mobile devices
Good to use for events such as a concert, convention, etc.
Easily accessible through a JSON API.
8. Why not Twitter?
“Tweets” are 140 characters, sometimes not enough for a
proper sentence.
Can limit descriptiveness
Fake profiles can give false information
Difficulty in traditional natural language processing ->
9. Natural Language Processing
Methods
-Parts-of-speech tagging – utilize a dictionary (ie Wordnet)
to identify words as nouns, verbs, etc.
-can't be used with all words, for example:
Fire:
As a verb: I will fire a gun.
As a verb: I will fire that individual.
As a noun: I didn't start the fire.
10. Sentiment Analysis
Twitter provides sentiment analysis, but it's poor since it
only looks for smilies and frownies.
Computers understand the denotation easily, connotation
is another story.
-Negation
-Adverb/Adjective Modifiers
-Sarcasm
Basic sentiment analysis searches for emotionally
charged words using a dictionary. More advanced
versions use machine learning to train the computer.
11. Brief Word on Machine Learning
-Involves teaching the intelligence what conditions produce
a certain result.
-The more data provided, the more confident intelligence
becomes.
13. What to Consider in Terms of the
Consumer
Person giving the message (celebrity status, followers,
how often they post)
Date (timestamp, how new or old the product is)
The product
The component of the product (if applicable)
The overall sentiment
14. A Little More About Data Sources
Consider online reputation (Klout):
-Number of followers
-Number following
-Frequency of tweets
-Variance of tweets
-”Verified” status, Join Date
-Number of retweets, favorites
Online reputation helps filter out spambots, fake profiles,
social honeypots.
Best you can do without knowing the relationship between
people.
15. Good Example:
I dislike my iPhone. The battery life is too
short.
(Work this out with group time permitting)
16. Bad Example:
LMAO dat RiFF RAFF album...iceberg
simpson off da chain #neonicon
#gettinpaid
-internet acronyms
-non-existant sentence structure
-idiomatic phrase
17. Analysis Over Time
People are more likely to remember negativity than
positivity.
Vengeance breeds vengeance, apologies rebuild trust,
counteract vengeance (see Dan Ariely research)
Emotions are contagious
http://www.scientificamerican.com/article/facebook-
emotions-are-contagious/
18. Putting it Together
Use natural language processing to understand what
products the consumer cares about.
Use sentiment analysis to understand how they feel about
the product.
Address any negativity (if feasible) so negativity about the
product.
Help people rationalize their purchases with positivity.
19. What I want you to take away
- Profiling in general can be wrong.
- Computers can't understand language the same way
people can. They won't be able to get it right 100%.
- Consider the ethics.
- Internet is public, hard to keep things private with
caching, spiders, hackers, etc.
- Don't let the Internet replace real life. People can be
forgiven, online reputation can only be hidden.
20. Related Topics
Game Theory
Probabilities and Statistics
Psychology
Sociology
Natural Language Programming
Language Syntax
Image Processing, Facial Recognition (for
pictures and Instagram)
21. Helpful Resources
www.lct-master.org/files/MullenSentimentCourseSlides.pdf
Sentiment Analysis Tutorial
http://wordnet.princeton.edu/
Wordnet, Lexical Database
http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010
AFINN, Sentiment Dictionary
http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-
Stanford Research
http://danariely.com/the-books/an-excerpt-from-chapter-5-of-%E2%80%9Cthe-upside
Dan Ariely on vengeance