Although sentiment analysis has a strong history of success on customer feedback and certain blogs and editorials, accuracy results are mixed for data in the absence of an opinion holder. In particular, news data poses some unique challenges to accuracy for sentiment analysis due to the blending of what I will call "objective" polarity with opinion-based polarity. How is (document-level) Sentiment to be determined, for example, in an article about the Haitian earthquake that discusses humanitarian aid? Similarly, an article about Bernard Madoff’s jail sentence shows a highly negative “objective polarity” somehow mitigated by a subsequent action. And how can we tease an author’s opinion from the semantics of objective polarity where they exist in news data? Author opinion (often referred to as “bias”) in news data is subtle in its indication by design. This talk discusses the grounding of the concept of "sentiment" within the greater context of the Semantics of Opposition.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Sentiment, News, and the Polarity Problem, Leslie Barrett
1. Sentiment, News and the Polarity Problem Leslie Barrett www.lbtechconsulting.com April 13, 2010
2. Sentiment and Opinion Are sentiment and opinion the same? Are feelings the same as beliefs? Sentiment can be applied to opinion but not the other way around (Kim And Hovy 2004) The question is – should it apply to anything else? Does it make sense in narrative, exposition, news data? How much text should we apply it to?
3. Sources Sentiment analysis has been applied where opinion is the norm – blogs and Tweets It has also been applied where opinion is designed to be subtle, if expressed at all – news data So maybe news data is never really objective, or else maybe sentiment is really used as simple polarity – separating the world into human ideas of positive and negative “buckets” blind to objectivity
4. Polarity Polarity is the stuff through which sentiment is measured Sentiment is usually considered to have the “poles” positive and negative These are most often “translated” into “good” and “bad” Sentiment analysis is really considered useful for telling us what is “good” and “bad” in our information stream
5. The “Machine” So the sentiment analysis machine takes in some text and tells us whether that text says something “good” or “bad”. OK…..but before we unveil our machine, we need to ask some important but often overlooked questions: - what text is going in? - where does “good” stop and “bad” begin? - what is the text “about”?
6. Why do we need Sentiment Analysis Beavis? So we’ll know what we’re thinking!
7. Let’s Try Feeding the Machine News Data! News Headlines sound like a pretty straightforward text type to apply sentiment to, given what we’ve just said. Even though news is supposed to be “objective”, headlines sell papers and often can be dramatic Keywords like “crash”, “downturn” and “disaster” are abundant and strong sentiment indictors. - but are headlines enough? - we may want document-level sentiment for news - does it matter what the news is “about”?
9. Beware of Headlines in Financial News financial news especially is really a genre unto itself Its polarity perspective is skewed constantly by pundit “benchmarking” Beating bad expectations is better than a good quarter that falls short – in pundit opinion
10. Can Sentiment Analysis “beat expectations”? All kinds of negatives here but the document-level sentiment should be positive – that’s how an analyst would see it So if you skew to this, what about other news?
11. Objectively “bad” Events Happen Some events don’t require an opinion holder They simply have a generally agreed upon negative or positive polarity And we need to get them right because they affect other events (e.g. crop yields, etc)
12. When Bad Things Happen to Positive Sentiment But objectively bad events have their own problems, even in the absence of “expectations”. The problem with polarity measures outside of the presence of an opinion holder is topic drift An editorial or blog is likely to stick to one sentiment, but bad events can have the dreaded “silver lining”
13. Disaster+Relief Can Spell Trouble Despite some strong negative polarity indicators like “traumatized”, “disaster” and “tsunami” this article has an overall positive theme
14. Don’t Quote Me! Another problem in news data is “opinion blend” Often you have an author’s opinion but other opinions that may differ – directly or indirectly cited Or an author using quotes to showcase two different opinions Coverage of a “debate” for example can get very difficult for even a human to judge
15. Attribution vs. Quoting The author clearly does not believe the positive topic of the article But Clinton believes it So is this positive sentiment about Clinton?
16. Pundits vs. Authors vs. Topics How can I be sure that “bad news” about my client is about my client? Make sure the named entity in question is a topic of the document So-called “document mates” don’t matter Do author names matter? Should I extract them? Yes! Over time if you classify by author name against other entities you might detect bias Do the same for known “pundits” on a topic…..same result may emerge
17. What’s it all About? Some data just tends to be multi-thematic or non-thematic In particular, market and financial reports, which often make their way into news feeds, tend to be this way. It is very hard to get a reasonable sentiment reading on either type of document.
18. SEC Reports: too big, too many sections There is the Management Discussion, which can have appropriate sentiment scores But there are so many other sections, no single theme Many sections have boilerplate, such as the accounting review
19. Scraping Your data is only as good as your news feed. Sometimes a site will deliver excess content that creeps into the text field of a feed That content could be an ad or even another article, skewing the sentiment reading for the expected article and hurting topic detection too.
21. What to Do? Stop doing Sentiment Analysis on news data? NO! News data is very valuable for reputation management Also can be valuable for investment firms *if* you can tease out the jargon and pundit-speak Document-level is still OK!
22. Best Practices Good topic detection - see what’s closely aligned with a theme and eliminate non-thematic or weak-thematic documents Good feed maintenance - you or your feed provider need to spot check for scraping problems
23. Tricks & Tips Data extraction for problem documents If document sections are identified with tags, use them (this is true for SEC reports) and extract the “good” data (see Pang and Lee 2004 on extracting document portions) Write regular expression libraries to find quoted and cited material. Remove or use separately Topic drift is harder but…. you can extract the first n paragraphs. Main topical material in news generally in top 25% of document Secondary topics don’t carry same weight
24. What’s Next for Polarity? Future directions for news-based sentiments analysis are based on looking outside of Positive and Negative poles Think about all the “opposites” in the world Sweet/sour Cold/hot Inside/outside Wet/dry Hard/soft
25. Leverage the Semantics of Opposition There are many types of opposition to study and they can be used in different ways Complementary opposites (male,female) Reversatives (backwards, forwards) Scalar opposites (tall, short) A good deal of semantic research that has yet to be leveraged for opinion analysis and classification (Mettinger, Pustejovsky, Kennedy, Miller, inter alia…)
26. Opposites and Opinions Let’s think of some opinions that fit into poles not definable in terms of “positive” and “negative” Conserative vs. Liberal Government Expansion vs. Privatization Can these positions be detected automatically? ………..
27. Appendix/Bibliography Kim, Soo-Min and Eduard Hovy. 2004. Determining the Sentiment of Opinions. Proceedings of COlING-04. pp. 1367--1373. Geneva, Switzerland. James Pustejovsky, "Events and the Semantics of Opposition" in Events as Grammatical Objects , C. Tenny and J. Pustejovsky (eds.), 2000, CSLI Publications. Arthur Mettinger, Aspects of Semantic Opposition in English, Clarendon Press, Oxford, 1994 Bo Pang and Lillian Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”, In Proceedings of the Association for Computational Linguistics, 2004