3. Two assertions: Human communications are inherently subjective. Opinion often masquerades as Fact.
4. Facts and Feelings The unemployment rate is 9.7%. Unemployment is WAY TOO HIGH!! The unemployment rate is higher than it was two years ago (5.1%). Former U.S. Federal Reserve Chairman Alan Greenspan said on Tuesday that the global recession will "surely be the longest and deepest" since the 1930s, adding that the Obama administration's Troubled Asset Relief Program will be insufficient to plug the yawning financial gap. [Reuters, Feb 18, 2009] Benjamin Bernanke is doing a better job than Greenspan. www.google.com/publicdata
5. We have a decision need, for monitoring, measurement, and analysis that support action. We = Consumers Marketers Managers Competitors Government Politicians
6. Questions... What are people saying? What’s hot? What are they saying about {topic|person|product} X? ... about X versus {topic|person|product} Y? How has opinion about X and Y trended/evolved? How has opinion correlated with {our|competitors’|general} {news|marketing|sales|events}? What’s behind opinion, the root causes? Who are opinion leaders? How does sentiment propagate across multiple channels?
8. Methods are. Yet counting term hits, in one source, doesn’t take you far. Good or bad? What’s behind the posts?
9. Beyond counting: “Sentiment analysis is the task of identifying positive and negative opinions, emotions, and evaluations.” -- Wilson, Wiebe & Hoffman, 2005, “Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis” Ingredients: Structured and unstructured sources. Subjectivity. Polarity. Intensity.
10. There are many complications. Simplified: Multiple levels: Corpus / data space, i.e., across multiple sources. Document. Statement / sentence. Entity / topic / concept. Human language is noisy and chaotic! Jargon, slang, irony, ambiguity, anaphora, polysemy, synonymy, etc. Context is key. Discourse analysis comes into play. Sentiment holder ≠ object: Greenspan said the recession will…
11.
12.
13. An accuracy aside: [WWH 2005] describes an inter-annotator agreement test. 10 documents w/ 447 subjective expressions.The two annotators agree on 82% of cases. Excluding uncertain subjective expressions (18%) boosts agreement to 90%.
14. Putting aside benefits of automation, how can machine accuracy approach human sensitivity? Claim: You fall short with (only) -- Doc-level analysis. Keyword-based analysis. You need strong natural language processing (NLP). You can also boost accuracy by, for example, ...
15. HappySadAngry Energetic Confused Aggravated Bouncy Crappy Angry Happy Crushed Bitchy Hyper Depressed Enraged Cheerful Distressed Infuriated Ecstatic Envious Irate Excited Gloomy Pissed off Jubilant Guilty Giddy Intimidated Giggly Jealous Lonely Rejected Sad Scared ----------------------- The three prominent mood groups that emerged from K-Means Clustering on the set of LiveJournalmood labels.