SlideShare a Scribd company logo
1 of 16
Download to read offline
On Empirical Sentiment Accuracy Bounds
     Shawn Rutledge, Chief Scientist
Visible’s Sentiment Approach
Visible was one of the
  first Social Media
Monitoring solution in           Algorithms
      the market.                • State of the art                       A sentiment model
                                 • Beyond overhyped NLP                   based on years of
                                                                          labeling social data for
                                 Features                                 enterprises.
                                 • Deep experience                        107+ labels, 105+
                                 • Social NLP & Context
                                                                          topics, 102+
                                                                          enterprises.
                                 Data
                                 • Massive proprietary data




                         Copyright © 2011 Visible. All rights reserved.
Visible’s Sentiment Approach

                            Algorithms
                            • State of the art                       A sentiment model
                            • Beyond overhyped NLP                   based on years of
                                                                     labeling social data for
 We have 10s of
millions of human           Features                                 enterprises.
annotated social            • Deep experience                        107+ labels, 105+
   media posts              • Social NLP & Context
                                                                     topics, 102+
                                                                     enterprises.
                            Data
                            • Massive proprietary data




                    Copyright © 2011 Visible. All rights reserved.
Visible’s Sentiment Approach

                                  Algorithms
                                  • State of the art                       A sentiment model
                                  • Beyond overhyped NLP                   based on years of
                                                                           labeling social data for
                                  Features                                 enterprises.
                                  • Deep experience                        107+ labels, 105+
                                  • Social NLP & Context
                                                                           topics, 102+
                                                                           enterprises.
  Basically all break-            Data
through in the last two           • Massive proprietary data
 decades have come
   from better data




                          Copyright © 2011 Visible. All rights reserved.
Sentiment, The Accuracy Disconnect
• Claims: “We have 97%
  Accuracy”                                    There is a disconnect
                                             between the hype and the
                                                 experience in the
• Experience: “The best                            marketplace
  vendor tested had 50%
  accuracy at the post
  level”

• Experience: Sentiment
  Accuracy most
  dissatisfying feature
  according to Forrester
  research, only 45%
  satisfied with vendor
  sentiment accuracy


                           Copyright © 2011 Visible. All rights reserved.
Key Findings                               After spending several years of
                                         research with the best available data,
                                           here are some of the key findings.


1. Solve relevance first, sentiment second.

2. Accuracy is the wrong measure to
   optimize.

3. Sentiment is more subjective than
   you think it is.

           Copyright © 2011 Visible. All rights reserved.
Key Findings
1. Solve relevance first, sentiment second.

2. Accuracy is the wrong measure to
   optimize.
     We won’t have time to cover the first two. The
      third could be an alternate title for this talk.


3. Sentiment is more subjective than
   you think it is.
                Copyright © 2011 Visible. All rights reserved.
Audit Findings, Large Financial Institution
   A typical study.

 Double Blind, Multi-Reviewer Study:

1. Same posts labeled by both human                                     No statistically significant
   labeling practice and automation.
                                                                       difference between human
2. At least two auditors grade each
   label. Blind to label source.                                          labeled and AI labeled
                                                                                sentiment


                            Reviewers can’t tell the
                         difference between Visible’s
                        statistical models and human
                                  annotators.




                      Copyright © 2011 Visible. All rights reserved.
Audit Findings, Large Financial Institution
Double Blind, Multi-Reviewer Study:


1. Same posts labeled by both human                                     No statistically significant
   labeling practice and automation.
                                                                       difference between human
2. At least two auditors grade each
   label. Blind toSo is Sentiment “solved”?
                   label source.                                          labeled and AI labeled
                                                                                sentiment
              But…

Auditors agree with each other only 73% of the time
   [95%CI: 69%-77%].                                 No, Auditors think people and
                                                  automation are both poor. And they
                                                      don’t agree with each other.


                      Copyright © 2011 Visible. All rights reserved.
Key Audit Findings, Large Financial Institution
        Social Media Professionals Grading Human Annotations
                Another way of looking at the same study




    Both auditors                                                     At least one
     agree with                                                      auditor agrees
   label only 58%                                                    with label 91%
     of the time                                                      of the time
Proxy for                                                                     Proxy for
 “hard”                                                                        “easy”
graders                                                                       graders


                                58% - 91% is a huge range.


                    Copyright © 2011 Visible. All rights reserved.
True Across a Wide Variety of Problems
  This talk      Multi-Reviewer 3rd party audits across a
  promised        variety of Brands consistently show
 bounds and
                    relatively low agreement rates.
here they are.

About 81% Inter-Annotator Agreement
                               [IQR: 78% - 83%]




                     Copyright © 2011 Visible. All rights reserved.
True Across a Wide Variety of Problems
       Multi-Reviewer 3rd party audits across a
        variety of Brands consistently show
          relatively low agreement rates.

About 81% Inter-Annotator Agreement
                     [IQR: 78% - 83%]

                                80% is also consistent
                                with academic research




           Copyright © 2011 Visible. All rights reserved.
Take Aways
1. Yes, your team
2. Evaluating sentiment takes care
3. Accuracy claims inbetter than average drivers.
           We all think we’re
                               the 90s are either exaggerated
   or naïve (over-fit) of us have heard something like the
    Similarly, although most
    80% agreement statistic, we don’t think it applies to us. The
4. It main thing I want you totake away from this talk istight
      will take effort to get your team in that it
   agreement in the People withinyou, disagree with your
      does apply to you.
       team, sitting
                     on sentiment your department, you
                          cube next to
                                       definitions
5. Real breakthroughs inofsentiment accuracy will
                        about 20% the time.

   come from personalization




                 Copyright © 2011 Visible. All rights reserved.
Take Aways
1. Yes, your team
2. Evaluating sentiment takes care
3. Accuracy claims in the 90s are either exaggerated
   or naïve (over-fit)
4. It will The implicationsto get yourtaking in tight
           take effort are also worth team
   agreement When people claim accuracies
          to heart. on sentiment definitions
           much higher than 80% they are either
5. Real breakthroughs in sentiment accuracy will
           lying or they don’t know what they are
   come from personalization .
               doing (overfit to one dataset)




              Copyright © 2011 Visible. All rights reserved.
Take Aways
1. Yes, your what has happened in Search, real breakthroughs will come
        Similar to team
      though personalization. Deeper linguistics (dealing with sarcasm, humor,
2. Evaluating sentiment takesbut can’t help break the 80% barrier.
     contextual knowledge) are interesting care

3. Accuracythe work into getting90s are either exaggerated (with
    If teams put claims in the tight, consistent sentiment definitions
   or naïve (over-fit) then do algorithms have a chance to do that well.
      >80% agreement), only

4. It will take effort to get your team in tight
   agreement on sentiment definitions
5. Real breakthroughs in sentiment accuracy will
   come from personalization




                   Copyright © 2011 Visible. All rights reserved.
@shawnrut


           @Visible
   VisibleTechnologies.com




Thank You!

More Related Content

What's hot

Thinking [Better] About the Future
Thinking [Better] About the Future Thinking [Better] About the Future
Thinking [Better] About the Future IABC Houston
 
Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Katrina (Kate) Pugh
 
Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2aboothroydgm
 
Florida Memory Project and Usability
Florida Memory Project and UsabilityFlorida Memory Project and Usability
Florida Memory Project and UsabilityFlorence Paisey
 
ExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderCrown
 
Gerald.mulenburg
Gerald.mulenburgGerald.mulenburg
Gerald.mulenburgNASAPMC
 
Operating in a connected world and the power of doing
Operating in a connected world and the power of doingOperating in a connected world and the power of doing
Operating in a connected world and the power of doingMartin Bailie
 
Communication And Connectnedness B A World V2
Communication And  Connectnedness  B A  World V2Communication And  Connectnedness  B A  World V2
Communication And Connectnedness B A World V2Mia Horrigan
 

What's hot (9)

Thinking [Better] About the Future
Thinking [Better] About the Future Thinking [Better] About the Future
Thinking [Better] About the Future
 
Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809Midwest km pugh conversational ai and ai for conversation 190809
Midwest km pugh conversational ai and ai for conversation 190809
 
Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2Assessing hearing capacity and hearing skill in infants 2
Assessing hearing capacity and hearing skill in infants 2
 
Florida Memory Project and Usability
Florida Memory Project and UsabilityFlorida Memory Project and Usability
Florida Memory Project and Usability
 
ExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderExactTarget & Crown Audience Builder
ExactTarget & Crown Audience Builder
 
Mfilsecker engagement and educational games
Mfilsecker engagement and educational gamesMfilsecker engagement and educational games
Mfilsecker engagement and educational games
 
Gerald.mulenburg
Gerald.mulenburgGerald.mulenburg
Gerald.mulenburg
 
Operating in a connected world and the power of doing
Operating in a connected world and the power of doingOperating in a connected world and the power of doing
Operating in a connected world and the power of doing
 
Communication And Connectnedness B A World V2
Communication And  Connectnedness  B A  World V2Communication And  Connectnedness  B A  World V2
Communication And Connectnedness B A World V2
 

Similar to Empirical Sentiment Accuracy Bounds

Research uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites ConsultingResearch uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites ConsultingIBM Danmark
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Dave King
 
Goodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "AlwGoodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "AlwTim Marklein
 
iMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social WorkforceiMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social WorkforceiMedia Connection
 
Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1ronpiovesan
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2Dave King
 
Prediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitetPrediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitetIBM Sverige
 
Making Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected AgeMaking Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected AgePeter Merholz
 
Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...Anup Narayanan
 
EmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & InfluencersEmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & Influencersjoerhoton
 
Osimo crossover md
Osimo crossover mdOsimo crossover md
Osimo crossover mdosimod
 
Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12Lisa Trager
 
SemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersSemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersINSEMTIVES project
 
7 Steps to Thought Leadership
7 Steps to Thought Leadership7 Steps to Thought Leadership
7 Steps to Thought LeadershipRegalix
 
Using Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOUsing Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOOptify
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdamsaskiamenkel
 
Proving the business value of social media
Proving the business value of social media Proving the business value of social media
Proving the business value of social media Blackbaud Pacific
 
Introduction to Trufflenet for local government
Introduction to Trufflenet for local governmentIntroduction to Trufflenet for local government
Introduction to Trufflenet for local governmenttrufflenet
 

Similar to Empirical Sentiment Accuracy Bounds (20)

Research uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites ConsultingResearch uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
Research uden at stille spørgsmål, Sophie Van Neck, InSites Consulting
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Goodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "AlwGoodbye Measurement, Hello Analytics: The Move to "Alw
Goodbye Measurement, Hello Analytics: The Move to "Alw
 
iMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social WorkforceiMedia March Brand Summit: Enabling the Social Workforce
iMedia March Brand Summit: Enabling the Social Workforce
 
Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1Social analytics apr24'12_marklein-1
Social analytics apr24'12_marklein-1
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2
 
Prediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitetPrediktiv analys och kundlojalitet
Prediktiv analys och kundlojalitet
 
Making Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected AgeMaking Business Human: Delivering Great Experiences in a Connected Age
Making Business Human: Delivering Great Experiences in a Connected Age
 
Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...Human Impact on Information Security - Computer Society of India Conference, ...
Human Impact on Information Security - Computer Society of India Conference, ...
 
EmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & InfluencersEmPower PRSA, 2012 - Analytics & Influencers
EmPower PRSA, 2012 - Analytics & Influencers
 
Osimo crossover md
Osimo crossover mdOsimo crossover md
Osimo crossover md
 
Listening Tools
Listening ToolsListening Tools
Listening Tools
 
Communicating using our strengths
Communicating using our strengthsCommunicating using our strengths
Communicating using our strengths
 
Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12Trager gaining insights workshop_csw_10-9-12
Trager gaining insights workshop_csw_10-9-12
 
SemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing UsersSemTech 2012 - Making your semantic app addictive: Incentivizing Users
SemTech 2012 - Making your semantic app addictive: Incentivizing Users
 
7 Steps to Thought Leadership
7 Steps to Thought Leadership7 Steps to Thought Leadership
7 Steps to Thought Leadership
 
Using Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOUsing Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEO
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdam
 
Proving the business value of social media
Proving the business value of social media Proving the business value of social media
Proving the business value of social media
 
Introduction to Trufflenet for local government
Introduction to Trufflenet for local governmentIntroduction to Trufflenet for local government
Introduction to Trufflenet for local government
 

More from Visible Technologies

The Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital WorldThe Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital WorldVisible Technologies
 
The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0Visible Technologies
 
Interacting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication StrategiesInteracting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication StrategiesVisible Technologies
 

More from Visible Technologies (7)

The Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital WorldThe Future of Social Influence in a Social Capital World
The Future of Social Influence in a Social Capital World
 
Ama Webcast 2.17.09
Ama Webcast 2.17.09Ama Webcast 2.17.09
Ama Webcast 2.17.09
 
The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0The Social Web. Why Brands Must Listen, Measure and Act v2.0
The Social Web. Why Brands Must Listen, Measure and Act v2.0
 
Interacting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication StrategiesInteracting with Social Media to Strengthen Communication Strategies
Interacting with Social Media to Strengthen Communication Strategies
 
Ama Webcast 5.22.08
Ama Webcast 5.22.08Ama Webcast 5.22.08
Ama Webcast 5.22.08
 
Tmobile Engadget Case Study
Tmobile Engadget Case StudyTmobile Engadget Case Study
Tmobile Engadget Case Study
 
New Realities
New RealitiesNew Realities
New Realities
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Empirical Sentiment Accuracy Bounds

  • 1. On Empirical Sentiment Accuracy Bounds Shawn Rutledge, Chief Scientist
  • 2. Visible’s Sentiment Approach Visible was one of the first Social Media Monitoring solution in Algorithms the market. • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for Features enterprises. • Deep experience 107+ labels, 105+ • Social NLP & Context topics, 102+ enterprises. Data • Massive proprietary data Copyright © 2011 Visible. All rights reserved.
  • 3. Visible’s Sentiment Approach Algorithms • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for We have 10s of millions of human Features enterprises. annotated social • Deep experience 107+ labels, 105+ media posts • Social NLP & Context topics, 102+ enterprises. Data • Massive proprietary data Copyright © 2011 Visible. All rights reserved.
  • 4. Visible’s Sentiment Approach Algorithms • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for Features enterprises. • Deep experience 107+ labels, 105+ • Social NLP & Context topics, 102+ enterprises. Basically all break- Data through in the last two • Massive proprietary data decades have come from better data Copyright © 2011 Visible. All rights reserved.
  • 5. Sentiment, The Accuracy Disconnect • Claims: “We have 97% Accuracy” There is a disconnect between the hype and the experience in the • Experience: “The best marketplace vendor tested had 50% accuracy at the post level” • Experience: Sentiment Accuracy most dissatisfying feature according to Forrester research, only 45% satisfied with vendor sentiment accuracy Copyright © 2011 Visible. All rights reserved.
  • 6. Key Findings After spending several years of research with the best available data, here are some of the key findings. 1. Solve relevance first, sentiment second. 2. Accuracy is the wrong measure to optimize. 3. Sentiment is more subjective than you think it is. Copyright © 2011 Visible. All rights reserved.
  • 7. Key Findings 1. Solve relevance first, sentiment second. 2. Accuracy is the wrong measure to optimize. We won’t have time to cover the first two. The third could be an alternate title for this talk. 3. Sentiment is more subjective than you think it is. Copyright © 2011 Visible. All rights reserved.
  • 8. Audit Findings, Large Financial Institution A typical study. Double Blind, Multi-Reviewer Study: 1. Same posts labeled by both human No statistically significant labeling practice and automation. difference between human 2. At least two auditors grade each label. Blind to label source. labeled and AI labeled sentiment Reviewers can’t tell the difference between Visible’s statistical models and human annotators. Copyright © 2011 Visible. All rights reserved.
  • 9. Audit Findings, Large Financial Institution Double Blind, Multi-Reviewer Study: 1. Same posts labeled by both human No statistically significant labeling practice and automation. difference between human 2. At least two auditors grade each label. Blind toSo is Sentiment “solved”? label source. labeled and AI labeled sentiment But… Auditors agree with each other only 73% of the time [95%CI: 69%-77%]. No, Auditors think people and automation are both poor. And they don’t agree with each other. Copyright © 2011 Visible. All rights reserved.
  • 10. Key Audit Findings, Large Financial Institution Social Media Professionals Grading Human Annotations Another way of looking at the same study Both auditors At least one agree with auditor agrees label only 58% with label 91% of the time of the time Proxy for Proxy for “hard” “easy” graders graders 58% - 91% is a huge range. Copyright © 2011 Visible. All rights reserved.
  • 11. True Across a Wide Variety of Problems This talk Multi-Reviewer 3rd party audits across a promised variety of Brands consistently show bounds and relatively low agreement rates. here they are. About 81% Inter-Annotator Agreement [IQR: 78% - 83%] Copyright © 2011 Visible. All rights reserved.
  • 12. True Across a Wide Variety of Problems Multi-Reviewer 3rd party audits across a variety of Brands consistently show relatively low agreement rates. About 81% Inter-Annotator Agreement [IQR: 78% - 83%] 80% is also consistent with academic research Copyright © 2011 Visible. All rights reserved.
  • 13. Take Aways 1. Yes, your team 2. Evaluating sentiment takes care 3. Accuracy claims inbetter than average drivers. We all think we’re the 90s are either exaggerated or naïve (over-fit) of us have heard something like the Similarly, although most 80% agreement statistic, we don’t think it applies to us. The 4. It main thing I want you totake away from this talk istight will take effort to get your team in that it agreement in the People withinyou, disagree with your does apply to you. team, sitting on sentiment your department, you cube next to definitions 5. Real breakthroughs inofsentiment accuracy will about 20% the time. come from personalization Copyright © 2011 Visible. All rights reserved.
  • 14. Take Aways 1. Yes, your team 2. Evaluating sentiment takes care 3. Accuracy claims in the 90s are either exaggerated or naïve (over-fit) 4. It will The implicationsto get yourtaking in tight take effort are also worth team agreement When people claim accuracies to heart. on sentiment definitions much higher than 80% they are either 5. Real breakthroughs in sentiment accuracy will lying or they don’t know what they are come from personalization . doing (overfit to one dataset) Copyright © 2011 Visible. All rights reserved.
  • 15. Take Aways 1. Yes, your what has happened in Search, real breakthroughs will come Similar to team though personalization. Deeper linguistics (dealing with sarcasm, humor, 2. Evaluating sentiment takesbut can’t help break the 80% barrier. contextual knowledge) are interesting care 3. Accuracythe work into getting90s are either exaggerated (with If teams put claims in the tight, consistent sentiment definitions or naïve (over-fit) then do algorithms have a chance to do that well. >80% agreement), only 4. It will take effort to get your team in tight agreement on sentiment definitions 5. Real breakthroughs in sentiment accuracy will come from personalization Copyright © 2011 Visible. All rights reserved.
  • 16. @shawnrut @Visible VisibleTechnologies.com Thank You!