SlideShare uma empresa Scribd logo
1 de 27
Sentiment, News and the Polarity Problem Leslie Barrett www.lbtechconsulting.com April 13, 2010
Sentiment and Opinion Are sentiment and opinion the same? Are feelings the same as beliefs? Sentiment can be applied to opinion but not the other way around (Kim And Hovy 2004) The question is – should it apply to anything else? Does it make sense in narrative, exposition, news data? How much text should we apply it to?
Sources Sentiment analysis has been applied where opinion is the norm – blogs and Tweets It has also been applied where opinion is designed to be subtle, if expressed at all – news data So maybe news data is never really objective, or else maybe sentiment is really used as simple polarity – separating the world into human ideas of positive and negative “buckets” blind to objectivity
Polarity  Polarity is the stuff through which sentiment is measured Sentiment is usually considered to have the “poles” positive and negative These are most often “translated” into “good” and “bad” Sentiment analysis is really considered useful for telling us what is “good” and “bad” in our information stream
The “Machine” So the sentiment analysis machine takes in some text and tells us whether that text says something “good” or “bad”. OK…..but before we unveil our machine, we need to ask some important but often overlooked questions:  - what text is going in?  - where does “good” stop and “bad” begin? - what is the text “about”?
Why do we need Sentiment Analysis Beavis? So we’ll know what we’re thinking!
Let’s Try Feeding the Machine News Data! News Headlines sound like a pretty straightforward text type to apply sentiment to, given what we’ve just said. Even though news is supposed to be “objective”, headlines sell papers and often can be dramatic Keywords like “crash”, “downturn” and “disaster” are abundant and strong sentiment indictors.  - but are headlines enough?   - we may want document-level sentiment for news - does it matter what the news is “about”?
Some “real” headlines Short-lived Coup Disappoints Bears
Beware of Headlines in Financial News financial news especially is really a genre unto itself Its polarity perspective is skewed constantly by pundit “benchmarking” Beating bad expectations is better than a good quarter that falls short – in pundit opinion
Can Sentiment Analysis “beat expectations”? All kinds of negatives here but the document-level sentiment should be positive – that’s how an analyst would see it So if you skew to this, what about other news?
Objectively “bad” Events Happen Some events don’t require an opinion holder  They simply have a generally agreed upon negative or positive polarity  And we need to get them right because they affect other events (e.g. crop yields, etc)
When Bad Things Happen to Positive Sentiment But objectively bad events have their own problems, even in the absence of “expectations”.  The problem with polarity measures outside of the presence of an opinion holder is topic drift An editorial or blog is likely to stick to one sentiment, but bad events can have the dreaded “silver lining”
Disaster+Relief Can Spell Trouble Despite some strong negative polarity indicators like “traumatized”, “disaster” and “tsunami” this article has an overall positive theme
Don’t Quote Me! Another problem in news data is “opinion blend” Often you have an author’s opinion but other opinions that may differ – directly or indirectly cited Or an author using quotes to showcase two different opinions Coverage of a “debate” for example can get very difficult for even a human to judge
Attribution vs. Quoting The author clearly does not believe the positive topic of the article But Clinton believes it So is this positive sentiment about Clinton?
Pundits vs. Authors vs. Topics How can I be sure that “bad news” about my client is about my client? Make sure the named entity in question is a topic of the document So-called “document mates” don’t matter Do author names matter? Should I extract them? Yes! Over time if you classify by author name against other entities you might detect bias Do the same for known “pundits” on a topic…..same result may emerge
What’s it all About? Some data just tends to be multi-thematic or non-thematic In particular, market and financial reports, which often make their way into news feeds, tend to be this way. It is very hard to get a reasonable sentiment reading on either type of document.
SEC Reports: too big, too many sections There is the Management Discussion, which can have appropriate sentiment scores But there are so many other sections, no single theme Many sections have boilerplate, such as the accounting review
Scraping Your data is only as good as your news feed. Sometimes a site will deliver excess content that creeps into the text field of a feed That content could be an ad or even another article, skewing the sentiment reading for the expected article and hurting topic detection too.
Field Overlap from a Typical News Page
What to Do? Stop doing Sentiment Analysis on news data? NO! News data is very valuable for reputation management Also can be valuable for investment firms *if* you can tease out the jargon and pundit-speak Document-level is still OK!
Best Practices Good topic detection     - see what’s closely aligned with a theme and eliminate non-thematic or weak-thematic documents Good feed maintenance    - you or your feed provider need to spot check for scraping problems
Tricks & Tips Data extraction for problem documents If document sections are identified with tags, use them (this is true for SEC reports) and extract the “good” data (see Pang and Lee 2004 on extracting document portions) Write regular expression libraries to find quoted and cited material. Remove or use separately Topic drift is harder but…. you can extract the first n paragraphs. Main topical material in news generally in top 25% of document Secondary topics don’t carry same weight
What’s Next for Polarity? Future directions for news-based sentiments analysis are based on looking outside of Positive and Negative poles Think about all the “opposites” in the world Sweet/sour Cold/hot Inside/outside Wet/dry Hard/soft
Leverage the Semantics of Opposition There are many types of opposition to study and they can be used in different ways Complementary opposites (male,female) Reversatives (backwards, forwards) Scalar opposites (tall, short) A good deal of semantic research that has yet to be leveraged for opinion analysis and classification (Mettinger, Pustejovsky, Kennedy, Miller, inter alia…)
Opposites and Opinions Let’s think of some opinions that fit into poles not definable in terms of “positive” and “negative” Conserative vs. Liberal Government Expansion vs. Privatization Can these positions be detected automatically? ………..
Appendix/Bibliography Kim, Soo-Min and Eduard Hovy. 2004. Determining the Sentiment of Opinions. Proceedings of COlING-04. pp. 1367--1373. Geneva, Switzerland.  James Pustejovsky, "Events and the Semantics of Opposition" in Events as Grammatical Objects , C. Tenny and J. Pustejovsky (eds.), 2000, CSLI Publications.  Arthur Mettinger, Aspects of Semantic Opposition in English, Clarendon Press, Oxford, 1994 Bo Pang and Lillian Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”, In Proceedings of the Association for Computational Linguistics, 2004

Mais conteúdo relacionado

Mais procurados

Dean r berry persuasive arguments i am not going to your gay wedding
Dean r berry persuasive arguments i am not going to your gay weddingDean r berry persuasive arguments i am not going to your gay wedding
Dean r berry persuasive arguments i am not going to your gay weddingRiverside County Office of Education
 
Gsp hpsds l2_6.3_critical thinking
Gsp hpsds l2_6.3_critical thinkingGsp hpsds l2_6.3_critical thinking
Gsp hpsds l2_6.3_critical thinkingglobalstudypass
 
Responding to a Text
Responding to a TextResponding to a Text
Responding to a TextJustineWhite6
 
Counterargument and Refutation Paragraphs
Counterargument and Refutation ParagraphsCounterargument and Refutation Paragraphs
Counterargument and Refutation ParagraphsJustineWhite6
 
Full lesson 2 presentation
Full lesson 2 presentationFull lesson 2 presentation
Full lesson 2 presentationlew56f
 
Ch 7 supporting_your_ideas
Ch 7 supporting_your_ideasCh 7 supporting_your_ideas
Ch 7 supporting_your_ideasShelly Yarbrough
 
Lecture - Supporting Argumentative Paragraphs
Lecture - Supporting Argumentative ParagraphsLecture - Supporting Argumentative Paragraphs
Lecture - Supporting Argumentative ParagraphsJustineWhite6
 
Writing an Argumentative Essay
Writing an Argumentative EssayWriting an Argumentative Essay
Writing an Argumentative EssayKaren Acal
 
Distinguishing Fact from Opinion
Distinguishing Fact from OpinionDistinguishing Fact from Opinion
Distinguishing Fact from OpinionChristian Libunao
 

Mais procurados (16)

Week 10 handout
Week 10 handoutWeek 10 handout
Week 10 handout
 
Dean r berry persuasive arguments i am not going to your gay wedding
Dean r berry persuasive arguments i am not going to your gay weddingDean r berry persuasive arguments i am not going to your gay wedding
Dean r berry persuasive arguments i am not going to your gay wedding
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Dean r berry persuasive argument success traits
Dean r berry persuasive argument  success traitsDean r berry persuasive argument  success traits
Dean r berry persuasive argument success traits
 
Gsp hpsds l2_6.3_critical thinking
Gsp hpsds l2_6.3_critical thinkingGsp hpsds l2_6.3_critical thinking
Gsp hpsds l2_6.3_critical thinking
 
Responding to a Text
Responding to a TextResponding to a Text
Responding to a Text
 
5W & 1H of report writing
5W & 1H of report writing5W & 1H of report writing
5W & 1H of report writing
 
Ch 2 critique
Ch 2 critiqueCh 2 critique
Ch 2 critique
 
Counterargument and Refutation Paragraphs
Counterargument and Refutation ParagraphsCounterargument and Refutation Paragraphs
Counterargument and Refutation Paragraphs
 
Full lesson 2 presentation
Full lesson 2 presentationFull lesson 2 presentation
Full lesson 2 presentation
 
Ch 7 supporting_your_ideas
Ch 7 supporting_your_ideasCh 7 supporting_your_ideas
Ch 7 supporting_your_ideas
 
Lecture - Supporting Argumentative Paragraphs
Lecture - Supporting Argumentative ParagraphsLecture - Supporting Argumentative Paragraphs
Lecture - Supporting Argumentative Paragraphs
 
Writing an Argumentative Essay
Writing an Argumentative EssayWriting an Argumentative Essay
Writing an Argumentative Essay
 
Distinguishing Fact from Opinion
Distinguishing Fact from OpinionDistinguishing Fact from Opinion
Distinguishing Fact from Opinion
 
Arguments part-2736
Arguments part-2736Arguments part-2736
Arguments part-2736
 
5W & 1H by ART
5W & 1H by ART5W & 1H by ART
5W & 1H by ART
 

Destaque

Creating Sentiment Line Chart with Watson
Creating Sentiment Line Chart with Watson Creating Sentiment Line Chart with Watson
Creating Sentiment Line Chart with Watson Dev_Events
 
Sentiment in Social Media: The Genie in the Bottle
Sentiment in Social Media: The Genie in the BottleSentiment in Social Media: The Genie in the Bottle
Sentiment in Social Media: The Genie in the BottleSeth Grimes
 
Polarity analysis for sentiment classification
Polarity analysis for sentiment classificationPolarity analysis for sentiment classification
Polarity analysis for sentiment classificationShiang-Yun Yang
 
Sentiment Analysis and Applications in the News and Media Industry
Sentiment Analysis and Applications in the News and Media IndustrySentiment Analysis and Applications in the News and Media Industry
Sentiment Analysis and Applications in the News and Media IndustryRobin Leonard
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Mido Razaz
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 

Destaque (9)

Creating Sentiment Line Chart with Watson
Creating Sentiment Line Chart with Watson Creating Sentiment Line Chart with Watson
Creating Sentiment Line Chart with Watson
 
Sentiment in Social Media: The Genie in the Bottle
Sentiment in Social Media: The Genie in the BottleSentiment in Social Media: The Genie in the Bottle
Sentiment in Social Media: The Genie in the Bottle
 
Polarity analysis for sentiment classification
Polarity analysis for sentiment classificationPolarity analysis for sentiment classification
Polarity analysis for sentiment classification
 
User engagement in the digital world
User engagement in the digital worldUser engagement in the digital world
User engagement in the digital world
 
Sentiment Analysis and Applications in the News and Media Industry
Sentiment Analysis and Applications in the News and Media IndustrySentiment Analysis and Applications in the News and Media Industry
Sentiment Analysis and Applications in the News and Media Industry
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 

Semelhante a Sentiment, News, and the Polarity Problem, Leslie Barrett

Fake news presentation lirt summit 2018
Fake news presentation lirt summit 2018Fake news presentation lirt summit 2018
Fake news presentation lirt summit 2018PamelaPfeiffer1
 
© 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx
 © 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx © 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx
© 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docxmayank272369
 
The discussion in this module will be focused on the idea of truth. .docx
The discussion in this module will be focused on the idea of truth. .docxThe discussion in this module will be focused on the idea of truth. .docx
The discussion in this module will be focused on the idea of truth. .docxbob8allen25075
 
AssignmentCoverage by of the news is not carried out by any mono.docx
AssignmentCoverage by of the news is not carried out by any mono.docxAssignmentCoverage by of the news is not carried out by any mono.docx
AssignmentCoverage by of the news is not carried out by any mono.docxrock73
 
· Now that you have reviewed the correct methods of researching an.docx
· Now that you have reviewed the correct methods of researching an.docx· Now that you have reviewed the correct methods of researching an.docx
· Now that you have reviewed the correct methods of researching an.docxoswald1horne84988
 
Fake news presentation engl 1301 koch fa
Fake news presentation engl 1301 koch faFake news presentation engl 1301 koch fa
Fake news presentation engl 1301 koch faPamelaPfeiffer1
 
Chapter 9 Bedford Guide
Chapter 9 Bedford GuideChapter 9 Bedford Guide
Chapter 9 Bedford Guidejruffin73
 
Chapter 9 Bedford Guide
Chapter 9 Bedford GuideChapter 9 Bedford Guide
Chapter 9 Bedford Guidejruffin73
 
Ap lang apsi 2012 presentation kristen
Ap lang apsi 2012 presentation kristenAp lang apsi 2012 presentation kristen
Ap lang apsi 2012 presentation kristenthisiscooling
 
Compare And Contrast Persuasive Essay
Compare And Contrast Persuasive EssayCompare And Contrast Persuasive Essay
Compare And Contrast Persuasive EssayKeri Sanders
 
How Do I Make a Case Engaging the Writing ProcessNo two p.docx
How Do I Make a Case Engaging the Writing ProcessNo two p.docxHow Do I Make a Case Engaging the Writing ProcessNo two p.docx
How Do I Make a Case Engaging the Writing ProcessNo two p.docxadampcarr67227
 
Module 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docxModule 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docxmoirarandell
 
Writing an editorial
Writing an editorialWriting an editorial
Writing an editorialJoy Magbanua
 
I think you’re off to a good start on your opening segment, but th.docx
I think you’re off to a good start on your opening segment, but th.docxI think you’re off to a good start on your opening segment, but th.docx
I think you’re off to a good start on your opening segment, but th.docxsheronlewthwaite
 
Making annotations
Making annotationsMaking annotations
Making annotationsjeaninedolan
 
BUNDLE Writing Paper With Picture Box By Catherine
BUNDLE Writing Paper With Picture Box By CatherineBUNDLE Writing Paper With Picture Box By Catherine
BUNDLE Writing Paper With Picture Box By CatherineSerena Brown
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentationtklatour
 
DescriptionAn informative speech increases the audience members’
DescriptionAn informative speech increases the audience members’DescriptionAn informative speech increases the audience members’
DescriptionAn informative speech increases the audience members’LinaCovington707
 

Semelhante a Sentiment, News, and the Polarity Problem, Leslie Barrett (20)

Fake news presentation lirt summit 2018
Fake news presentation lirt summit 2018Fake news presentation lirt summit 2018
Fake news presentation lirt summit 2018
 
Evalauting Text
Evalauting TextEvalauting Text
Evalauting Text
 
© 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx
 © 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx © 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx
© 2015, 2013, 2011, 2006, 2004, 1998, 1992, Peter A. Facion.docx
 
The discussion in this module will be focused on the idea of truth. .docx
The discussion in this module will be focused on the idea of truth. .docxThe discussion in this module will be focused on the idea of truth. .docx
The discussion in this module will be focused on the idea of truth. .docx
 
AssignmentCoverage by of the news is not carried out by any mono.docx
AssignmentCoverage by of the news is not carried out by any mono.docxAssignmentCoverage by of the news is not carried out by any mono.docx
AssignmentCoverage by of the news is not carried out by any mono.docx
 
· Now that you have reviewed the correct methods of researching an.docx
· Now that you have reviewed the correct methods of researching an.docx· Now that you have reviewed the correct methods of researching an.docx
· Now that you have reviewed the correct methods of researching an.docx
 
Fake news presentation engl 1301 koch fa
Fake news presentation engl 1301 koch faFake news presentation engl 1301 koch fa
Fake news presentation engl 1301 koch fa
 
Chapter 9 Bedford Guide
Chapter 9 Bedford GuideChapter 9 Bedford Guide
Chapter 9 Bedford Guide
 
Chapter 9 Bedford Guide
Chapter 9 Bedford GuideChapter 9 Bedford Guide
Chapter 9 Bedford Guide
 
Ap lang apsi 2012 presentation kristen
Ap lang apsi 2012 presentation kristenAp lang apsi 2012 presentation kristen
Ap lang apsi 2012 presentation kristen
 
Compare And Contrast Persuasive Essay
Compare And Contrast Persuasive EssayCompare And Contrast Persuasive Essay
Compare And Contrast Persuasive Essay
 
How Do I Make a Case Engaging the Writing ProcessNo two p.docx
How Do I Make a Case Engaging the Writing ProcessNo two p.docxHow Do I Make a Case Engaging the Writing ProcessNo two p.docx
How Do I Make a Case Engaging the Writing ProcessNo two p.docx
 
Module 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docxModule 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docx
 
Writing an editorial
Writing an editorialWriting an editorial
Writing an editorial
 
Eli debrief3.21
Eli debrief3.21Eli debrief3.21
Eli debrief3.21
 
I think you’re off to a good start on your opening segment, but th.docx
I think you’re off to a good start on your opening segment, but th.docxI think you’re off to a good start on your opening segment, but th.docx
I think you’re off to a good start on your opening segment, but th.docx
 
Making annotations
Making annotationsMaking annotations
Making annotations
 
BUNDLE Writing Paper With Picture Box By Catherine
BUNDLE Writing Paper With Picture Box By CatherineBUNDLE Writing Paper With Picture Box By Catherine
BUNDLE Writing Paper With Picture Box By Catherine
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
DescriptionAn informative speech increases the audience members’
DescriptionAn informative speech increases the audience members’DescriptionAn informative speech increases the audience members’
DescriptionAn informative speech increases the audience members’
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Sentiment, News, and the Polarity Problem, Leslie Barrett

  • 1. Sentiment, News and the Polarity Problem Leslie Barrett www.lbtechconsulting.com April 13, 2010
  • 2. Sentiment and Opinion Are sentiment and opinion the same? Are feelings the same as beliefs? Sentiment can be applied to opinion but not the other way around (Kim And Hovy 2004) The question is – should it apply to anything else? Does it make sense in narrative, exposition, news data? How much text should we apply it to?
  • 3. Sources Sentiment analysis has been applied where opinion is the norm – blogs and Tweets It has also been applied where opinion is designed to be subtle, if expressed at all – news data So maybe news data is never really objective, or else maybe sentiment is really used as simple polarity – separating the world into human ideas of positive and negative “buckets” blind to objectivity
  • 4. Polarity Polarity is the stuff through which sentiment is measured Sentiment is usually considered to have the “poles” positive and negative These are most often “translated” into “good” and “bad” Sentiment analysis is really considered useful for telling us what is “good” and “bad” in our information stream
  • 5. The “Machine” So the sentiment analysis machine takes in some text and tells us whether that text says something “good” or “bad”. OK…..but before we unveil our machine, we need to ask some important but often overlooked questions: - what text is going in? - where does “good” stop and “bad” begin? - what is the text “about”?
  • 6. Why do we need Sentiment Analysis Beavis? So we’ll know what we’re thinking!
  • 7. Let’s Try Feeding the Machine News Data! News Headlines sound like a pretty straightforward text type to apply sentiment to, given what we’ve just said. Even though news is supposed to be “objective”, headlines sell papers and often can be dramatic Keywords like “crash”, “downturn” and “disaster” are abundant and strong sentiment indictors. - but are headlines enough? - we may want document-level sentiment for news - does it matter what the news is “about”?
  • 8. Some “real” headlines Short-lived Coup Disappoints Bears
  • 9. Beware of Headlines in Financial News financial news especially is really a genre unto itself Its polarity perspective is skewed constantly by pundit “benchmarking” Beating bad expectations is better than a good quarter that falls short – in pundit opinion
  • 10. Can Sentiment Analysis “beat expectations”? All kinds of negatives here but the document-level sentiment should be positive – that’s how an analyst would see it So if you skew to this, what about other news?
  • 11. Objectively “bad” Events Happen Some events don’t require an opinion holder They simply have a generally agreed upon negative or positive polarity And we need to get them right because they affect other events (e.g. crop yields, etc)
  • 12. When Bad Things Happen to Positive Sentiment But objectively bad events have their own problems, even in the absence of “expectations”. The problem with polarity measures outside of the presence of an opinion holder is topic drift An editorial or blog is likely to stick to one sentiment, but bad events can have the dreaded “silver lining”
  • 13. Disaster+Relief Can Spell Trouble Despite some strong negative polarity indicators like “traumatized”, “disaster” and “tsunami” this article has an overall positive theme
  • 14. Don’t Quote Me! Another problem in news data is “opinion blend” Often you have an author’s opinion but other opinions that may differ – directly or indirectly cited Or an author using quotes to showcase two different opinions Coverage of a “debate” for example can get very difficult for even a human to judge
  • 15. Attribution vs. Quoting The author clearly does not believe the positive topic of the article But Clinton believes it So is this positive sentiment about Clinton?
  • 16. Pundits vs. Authors vs. Topics How can I be sure that “bad news” about my client is about my client? Make sure the named entity in question is a topic of the document So-called “document mates” don’t matter Do author names matter? Should I extract them? Yes! Over time if you classify by author name against other entities you might detect bias Do the same for known “pundits” on a topic…..same result may emerge
  • 17. What’s it all About? Some data just tends to be multi-thematic or non-thematic In particular, market and financial reports, which often make their way into news feeds, tend to be this way. It is very hard to get a reasonable sentiment reading on either type of document.
  • 18. SEC Reports: too big, too many sections There is the Management Discussion, which can have appropriate sentiment scores But there are so many other sections, no single theme Many sections have boilerplate, such as the accounting review
  • 19. Scraping Your data is only as good as your news feed. Sometimes a site will deliver excess content that creeps into the text field of a feed That content could be an ad or even another article, skewing the sentiment reading for the expected article and hurting topic detection too.
  • 20. Field Overlap from a Typical News Page
  • 21. What to Do? Stop doing Sentiment Analysis on news data? NO! News data is very valuable for reputation management Also can be valuable for investment firms *if* you can tease out the jargon and pundit-speak Document-level is still OK!
  • 22. Best Practices Good topic detection - see what’s closely aligned with a theme and eliminate non-thematic or weak-thematic documents Good feed maintenance - you or your feed provider need to spot check for scraping problems
  • 23. Tricks & Tips Data extraction for problem documents If document sections are identified with tags, use them (this is true for SEC reports) and extract the “good” data (see Pang and Lee 2004 on extracting document portions) Write regular expression libraries to find quoted and cited material. Remove or use separately Topic drift is harder but…. you can extract the first n paragraphs. Main topical material in news generally in top 25% of document Secondary topics don’t carry same weight
  • 24. What’s Next for Polarity? Future directions for news-based sentiments analysis are based on looking outside of Positive and Negative poles Think about all the “opposites” in the world Sweet/sour Cold/hot Inside/outside Wet/dry Hard/soft
  • 25. Leverage the Semantics of Opposition There are many types of opposition to study and they can be used in different ways Complementary opposites (male,female) Reversatives (backwards, forwards) Scalar opposites (tall, short) A good deal of semantic research that has yet to be leveraged for opinion analysis and classification (Mettinger, Pustejovsky, Kennedy, Miller, inter alia…)
  • 26. Opposites and Opinions Let’s think of some opinions that fit into poles not definable in terms of “positive” and “negative” Conserative vs. Liberal Government Expansion vs. Privatization Can these positions be detected automatically? ………..
  • 27. Appendix/Bibliography Kim, Soo-Min and Eduard Hovy. 2004. Determining the Sentiment of Opinions. Proceedings of COlING-04. pp. 1367--1373. Geneva, Switzerland. James Pustejovsky, "Events and the Semantics of Opposition" in Events as Grammatical Objects , C. Tenny and J. Pustejovsky (eds.), 2000, CSLI Publications. Arthur Mettinger, Aspects of Semantic Opposition in English, Clarendon Press, Oxford, 1994 Bo Pang and Lillian Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”, In Proceedings of the Association for Computational Linguistics, 2004