SlideShare a Scribd company logo
1 of 54
Adventures in Crowdsourcing:
Research at UT Austin & Beyond




               Matt Lease
          School of Information                  @mattlease

       University of Texas at Austin   ml@ischool.utexas.edu
Outline
     • Foundations
     • Work at UT Austin
     • A Few Roadblocks
           – Workflow Design
           – Sensitive data
           – Regulation
           – Fraud
           – Ethics

August 23, 2012          Matt Lease - ml@ischool.utexas.edu   2
Amazon Mechanical Turk (MTurk)




     • Marketplace for crowd labor (microtasks)
     • Created in 2005 (still in “beta”)
     • On-demand, scalable, 24/7 global workforce

August 23, 2012        Matt Lease - ml@ischool.utexas.edu   3
Labeling Data (“Gold Rush”)
Snow et al. (EMNLP 2008)
   • MTurk annotation for 5 Tasks
         – Affect recognition
         – Word similarity
         – Recognizing textual entailment
         – Event temporal ordering
         – Word sense disambiguation
   • 22K labels for US $26
   • High agreement between
     consensus labels and
     gold-standard labels
August 23, 2012           Matt Lease - ml@ischool.utexas.edu   5
Sorokin & Forsythe (CVPR 2008)
     • MTurk for Computer Vision
     • 4K labels for US $60




August 23, 2012      Matt Lease - ml@ischool.utexas.edu   6
Kittur, Chi, & Suh (CHI 2008)

 • MTurk for User Studies

 • “…make creating believable invalid responses as
   effortful as completing the task in good faith.”




August 23, 2012            Matt Lease - ml@ischool.utexas.edu   7
Alonso et al. (SIGIR Forum 2008)
     • MTurk for Information Retrieval (IR)
           – Judge relevance of search engine results
     • Various follow-on studies (design, quality, cost)




August 23, 2012            Matt Lease - ml@ischool.utexas.edu   8
Social & Behavioral Sciences
  • A Guide to Behavioral Experiments
    on Mechanical Turk
        – W. Mason and S. Suri (2010). SSRN online.
  • Crowdsourcing for Human Subjects Research
        – L. Schmidt (CrowdConf 2010)
  • Crowdsourcing Content Analysis for Behavioral Research:
    Insights from Mechanical Turk
        – Conley & Tosti-Kharas (2010). Academy of Management
  • Amazon's Mechanical Turk : A New Source of
    Inexpensive, Yet High-Quality, Data?
        – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
August 23, 2012              Matt Lease - ml@ischool.utexas.edu   9
August 12, 2012   Matt Lease - ml@ischool.utexas.edu   10
What about data quality?
• Many CS papers on statistical methods
  – Online vs. offline, feature-based vs. content-agnostic
  – Worker calibration, noise vs. bias, weighted voting
  – Work in my lab by Jung, Kumar, Ryu, & Tang
• Human factors also matter!
  – Instructions, design, interface, interaction
  – Names, relationship, reputation
  – Fair pay, hourly vs. per-task, recognition, advancement
  – For contrast with MTurk, consider Kochhar (2010)
• See Lease, HComp‘11                                     11
Kovashka & Lease, CrowdConf’10




August 23, 2012    Matt Lease - ml@ischool.utexas.edu   13
Grady & Lease, 2010 (Search Eval.)




August 23, 2012   Matt Lease - ml@ischool.utexas.edu   14/10
Noisy Supervised Classification
       Kumar and Lease, 2011(a)

  Our 1st study of aggregation (Fall’10)
     Simple idea, simulated workers
  Highlights concepts & open questions
Problem
 • Crowd labels tends to be noisy
 • Can reduce uncertainty via wisdom of crowds
       – Collect & aggregate multiple labels per example
 • How do we to maximize learning (labeling effort)?
       – Label a new example?
       – Get another label for an already-labeled example?


 See: Sheng, Provost & Ipeirotis, KDD’08

August 23, 2012          Matt Lease - ml@ischool.utexas.edu   16
Setup
   • Task: Binary classification
   • Learner: C4.5 decision tree
   • Given
         – An initial seed set of single-labeled examples (64)
         – An unlimited pool of unlabeled examples
   • Cost model
         – Fixed unit cost for labeling any example
         – Unlabeled examples are freely obtained
   • Goal: Maximize learning rate (for labeling effort)
August 23, 2012            Matt Lease - ml@ischool.utexas.edu    17
Compare 3 methods: SL, MV, & NB
  • Single labeling (SL): label a new example

  • Multi-Labeling: get another label for pool
        – Majority Vote (MV): consensus by simple vote

        – Naïve Bayes (NB): weight vote by annotator accuracy




August 23, 2012          Matt Lease - ml@ischool.utexas.edu   18
Assumptions
     • Example selection: random
           – From pool for SL, from seed set for multi-labeling

     • Fixed commitment to a single method a priori
     • Balanced classes (accuracy, uniform prior)
     • Annotator accuracies are known to system
           – In practice, must estimate these: from gold data
                  (Snow et al. ’08) or EM (Dawid & Skene’79)

August 23, 2012                 Matt Lease - ml@ischool.utexas.edu   19
Simulation
     • Each annotator
           – Has parameter p (prob. of producing correct label)
           – Generates exactly one label
     • Uniform distribution of accuracies U(min,max)
     • Generative model for simulation
           – Pick an example x (with true label y*) at random
           – Draw annotator accuracy p ~ U(min,max)
           – Generate label y ~ P(y | p, y*)

August 23, 2012            Matt Lease - ml@ischool.utexas.edu   20
Evaluation
  • Data: datasets from UCI ML Repository
        – Mushroom
        – Spambase       http://archive.ics.uci.edu/ml/datasets.html

        – Tic-Tac-Toe
        – Chess: King-Rook vs. King-Pawn
  • Same trends across all 4, so we report first 2
  • Random 70 / 30 split of data for seed+pool / test
  • Repeat each run 10 times and average results

August 23, 2012            Matt Lease - ml@ischool.utexas.edu    21
p ~ U(0.6, 1.0)
  • Fairly accurate annotators (mean = 0.8)
  • Little uncertainty -> little gain from multi-labeling




August 23, 2012       Matt Lease - ml@ischool.utexas.edu   22
p ~ U(0.4, 0.6)
     • Very noisy (mean = 0.5, random coin flip)
     • SL and MV learning rates are flat
     • NB wins by weighting more accurate workers




August 23, 2012      Matt Lease - ml@ischool.utexas.edu   23
p ~ U(0.1, 0.7)
     • Worsen accuracies further (mean = 0.4)
     • NB virtually unchanged
     • SL and MV predictions become anti-correlated
           – We should actually flip their predictions…




August 23, 2012            Matt Lease - ml@ischool.utexas.edu   24
Label flipping
   • Is NB doing better due to how it uses accuracy,
     or simply because it’s using more information?
   • Average accuracy < 50% --> label usually wrong
         – NB implicitly captures; SL and MV do not
   • Label flipping: put all methods on even-footing
   • Simple case of bias vs. noise
         – Issue is not whether correlated or anti-correlated
         – Issue is strength of correlation

August 23, 2012            Matt Lease - ml@ischool.utexas.edu   25
p ~ U(0.1, 0.7)
       No flipping
       fter




       With flipping
                  Mushroom Dataset                                            Spambase Dataset
 100
                                                              90
  80                                                          80
                                                              70
  60                                 SL accuracy (%)                                             SL accuracy (%)
                                                              60
                                                                                                 MV accuracy(%)
  40                                 MV accuracy(%)           50
                                                                                                 NB accuracy (%)
                                                              40
  20                                 NB accuracy (%)
                                                              30
   0                                                          20
       64   128   256    512    1024         2048      4096        64   128      256      512    1024        2048

August 23, 2012                          Matt Lease - ml@ischool.utexas.edu                             26
Summary of study
     • Detecting anti-correlated (bad) workers
       more important than the model used
     • Open Questions
           – When accuracies are estimated (noisy)?
           – With actual error distribution (real data)?
           – With different learners or tasks (e.g. ranking)?
           – With dynamic choice of new example or re-label?
           – With active learning example selection?
           – With imbalanced classes?
August 23, 2012            Matt Lease - ml@ischool.utexas.edu   27
Snapshots




August 23, 2012   Matt Lease - ml@ischool.utexas.edu   28
Noisy Learning
                                                  to Rank

                                                   Kumar & Lease
                                                          2011b




August 23, 2012   Matt Lease - ml@ischool.utexas.edu           29
Semi-Supervised Repeated Labeling
     Tang & Lease, CIR’11




August 23, 2012      Matt Lease - ml@ischool.utexas.edu   30
Smart Crowd Filter
     • Ryu & Lease, ASIS&T’11
     • Active Learning
           – Train Multi-class SVM to estimate P(Y|X)
           – Estimate average P(Y|X) for each worker
           – Filter out workers below threshold
     • Explore/Exploit (unexpected/expected labels)



August 23, 2012            Matt Lease - ml@ischool.utexas.edu   31
Z-score Weighted Filtering & Voting




     Jung & Lease, Hcomp’11

August 23, 2012      Matt Lease - ml@ischool.utexas.edu   32
Inferring Missing Judgments

  Jung & Lease, 2012




August 23, 2012           Matt Lease - ml@ischool.utexas.edu   33
Jung & Lease, Hcomp’12




August 23, 2012         Matt Lease - ml@ischool.utexas.edu   34
Social Network + Crowdsourcing
     • Klinger & Lease, ASIS&T’11




August 23, 2012      Matt Lease - ml@ischool.utexas.edu   35
Website Usability (Liu et al., 2012)




August 23, 2012    Matt Lease - ml@ischool.utexas.edu   36
37
Designing & Optimizing Workflows




                               38
Workflow Management
     • How should we balance automation vs.
       human computation? Who does what?

     • Who’s the right person for the job?

     • Juggling constraints on budget, scheduling,
       quality, effort …


August 23, 2012        Matt Lease - ml@ischool.utexas.edu   39
What about sensitive data?
• Not all data can be publicly disclosed
  – User data (e.g. AOL query log, Netflix ratings)
  – Intellectual property
  – Legal confidentiality
• Need to restrict who is in your crowd
  – Separate channel (workforce) from technology
  – Hot question for adoption at enterprise level



                                                      40
What about regulation?
• Wolfson & Lease (ASIS&T’11)
• As usual, technology is ahead of the law
  – employment law
  – patent inventorship
  – data security and the Federal Trade Commission
  – copyright ownership
  – securities regulation of crowdfunding
• Take-away: don’t panic, but be mindful
  – Understand risks of “just in-time compliance”

                                                     41
What about fraud?
• Some reports of robot “workers” on MTurk
  – Artificial Artificial Artificial Intelligence
  – Violates terms of service
• Why not just use a captcha?




                                                    42
Captcha Fraud




• Severity?




                              43
Requester Fraud on MTurk
“Do not do any HITs that involve: filling in
CAPTCHAs; secret shopping; test our web page;
test zip code; free trial; click my link; surveys or
quizzes (unless the requester is listed with a
smiley in the Hall of Fame/Shame); anything
that involves sending a text message; or
basically anything that asks for any personal
information at all—even your zip code. If you
feel in your gut it’s not on the level, IT’S NOT.
Why? Because they are scams...”
                                                       44
Fraud via Crowds
Wang et al., WWW’12
• “…not only do malicious crowd-sourcing
  systems exist, but they are rapidly growing…”




                                              46
Robert Sim, MSR Summit’12




                            47
Identifying Workers (Uniquely)
   • Need for identifiable workers
         – Repeated labeling
         – Recognizing “Master Workers”
   • Today
         – Platforms assign IDs intended to be unique
         – Problem in practice, esp. with multiple platforms
         – Sybil attacks
   • Identity value
         – If workers interchangeable, identities are disposable
         – If workers are distinguished, identifies become valuable
         – Reduce some types of attacks, increase others
August 23, 2012             Matt Lease - ml@ischool.utexas.edu   48
What about ethics?
Fort, Adda, and Cohen (2011)
• “…opportunities for our community to deliberately
  value ethics above cost savings.”
• Suggest we focus on unpaid games; narrow solution

Silberman, Irani, and Ross (2010)
• “How should we… conceptualize the role of these
   people who we ask to power our computing?”
• Power dynamics between parties
• “Abstraction hides detail”
                                                    49
Davis et al. (2010) The HPU.




               HPU




                               50
HPU: “Abstraction hides detail”




                                  51
Digital Dirty Jobs
•   The Googler who Looked at the Worst of the Internet
•   Policing the Web’s Lurid Precincts
•   Facebook content moderation
•   The dirty job of keeping Facebook clean




• Even linguistic annotators report stress &
  nightmares from reading news articles!
                                                          52
What about freedom?
• Vision: empowering worker freedom:
  – work whenever you want for whomever you want

• Risk: people being compelled to perform work
  – As crowdsourcing grows, greater $$$ at stake
  – Digital sweat shops? Digital slaves?
  – Prisoners used for gold farming
  – We really don’t know much today
  – Traction? Human Trafficking at MSR Summit’12

                                                   53
Thank You!
Students: Past & Present
 –   Catherine Grady (iSchool)
 –   Hyunjoon Jung (iSchool)
 –   Jorn Klinger (Linguistics)
 –   Adriana Kovashka (CS)
 –   Abhimanu Kumar (CS)
                                         ir.ischool.utexas.edu/crowd
 –   Hohyon Ryu (iSchool)
 –   Wei Tang (CS)
 –   Stephen Wolfson (iSchool)
Support
 – John P. Commons Fellowship
 – Temple Fellowship
                   Matt Lease - ml@ischool.utexas.edu -   @mattlease   54

More Related Content

Similar to Adventures in Crowdsourcing: Research at UT Austin & Beyond

UT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingUT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingMatthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervisedMadhav Sigdel
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Lukas Mandrake
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docxHomework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docxadampcarr67227
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Modeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex AnnotationsModeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex AnnotationsAlexander Braylan
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Dimensionality Reduction : Above PCA
Dimensionality Reduction : Above PCADimensionality Reduction : Above PCA
Dimensionality Reduction : Above PCAManas Gaur
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2NIMMYRAJU
 

Similar to Adventures in Crowdsourcing: Research at UT Austin & Beyond (20)

UT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingUT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd Computing
 
Emerging patterns based classifier
Emerging patterns based classifierEmerging patterns based classifier
Emerging patterns based classifier
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docxHomework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Modeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex AnnotationsModeling and Aggregation of Complex Annotations
Modeling and Aggregation of Complex Annotations
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Dimensionality Reduction : Above PCA
Dimensionality Reduction : Above PCADimensionality Reduction : Above PCA
Dimensionality Reduction : Above PCA
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 2
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 

Recently uploaded

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Adventures in Crowdsourcing: Research at UT Austin & Beyond

  • 1. Adventures in Crowdsourcing: Research at UT Austin & Beyond Matt Lease School of Information @mattlease University of Texas at Austin ml@ischool.utexas.edu
  • 2. Outline • Foundations • Work at UT Austin • A Few Roadblocks – Workflow Design – Sensitive data – Regulation – Fraud – Ethics August 23, 2012 Matt Lease - ml@ischool.utexas.edu 2
  • 3. Amazon Mechanical Turk (MTurk) • Marketplace for crowd labor (microtasks) • Created in 2005 (still in “beta”) • On-demand, scalable, 24/7 global workforce August 23, 2012 Matt Lease - ml@ischool.utexas.edu 3
  • 5. Snow et al. (EMNLP 2008) • MTurk annotation for 5 Tasks – Affect recognition – Word similarity – Recognizing textual entailment – Event temporal ordering – Word sense disambiguation • 22K labels for US $26 • High agreement between consensus labels and gold-standard labels August 23, 2012 Matt Lease - ml@ischool.utexas.edu 5
  • 6. Sorokin & Forsythe (CVPR 2008) • MTurk for Computer Vision • 4K labels for US $60 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 6
  • 7. Kittur, Chi, & Suh (CHI 2008) • MTurk for User Studies • “…make creating believable invalid responses as effortful as completing the task in good faith.” August 23, 2012 Matt Lease - ml@ischool.utexas.edu 7
  • 8. Alonso et al. (SIGIR Forum 2008) • MTurk for Information Retrieval (IR) – Judge relevance of search engine results • Various follow-on studies (design, quality, cost) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 8
  • 9. Social & Behavioral Sciences • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. August 23, 2012 Matt Lease - ml@ischool.utexas.edu 9
  • 10. August 12, 2012 Matt Lease - ml@ischool.utexas.edu 10
  • 11. What about data quality? • Many CS papers on statistical methods – Online vs. offline, feature-based vs. content-agnostic – Worker calibration, noise vs. bias, weighted voting – Work in my lab by Jung, Kumar, Ryu, & Tang • Human factors also matter! – Instructions, design, interface, interaction – Names, relationship, reputation – Fair pay, hourly vs. per-task, recognition, advancement – For contrast with MTurk, consider Kochhar (2010) • See Lease, HComp‘11 11
  • 12.
  • 13. Kovashka & Lease, CrowdConf’10 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 13
  • 14. Grady & Lease, 2010 (Search Eval.) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 14/10
  • 15. Noisy Supervised Classification Kumar and Lease, 2011(a) Our 1st study of aggregation (Fall’10) Simple idea, simulated workers Highlights concepts & open questions
  • 16. Problem • Crowd labels tends to be noisy • Can reduce uncertainty via wisdom of crowds – Collect & aggregate multiple labels per example • How do we to maximize learning (labeling effort)? – Label a new example? – Get another label for an already-labeled example? See: Sheng, Provost & Ipeirotis, KDD’08 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 16
  • 17. Setup • Task: Binary classification • Learner: C4.5 decision tree • Given – An initial seed set of single-labeled examples (64) – An unlimited pool of unlabeled examples • Cost model – Fixed unit cost for labeling any example – Unlabeled examples are freely obtained • Goal: Maximize learning rate (for labeling effort) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 17
  • 18. Compare 3 methods: SL, MV, & NB • Single labeling (SL): label a new example • Multi-Labeling: get another label for pool – Majority Vote (MV): consensus by simple vote – Naïve Bayes (NB): weight vote by annotator accuracy August 23, 2012 Matt Lease - ml@ischool.utexas.edu 18
  • 19. Assumptions • Example selection: random – From pool for SL, from seed set for multi-labeling • Fixed commitment to a single method a priori • Balanced classes (accuracy, uniform prior) • Annotator accuracies are known to system – In practice, must estimate these: from gold data (Snow et al. ’08) or EM (Dawid & Skene’79) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 19
  • 20. Simulation • Each annotator – Has parameter p (prob. of producing correct label) – Generates exactly one label • Uniform distribution of accuracies U(min,max) • Generative model for simulation – Pick an example x (with true label y*) at random – Draw annotator accuracy p ~ U(min,max) – Generate label y ~ P(y | p, y*) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 20
  • 21. Evaluation • Data: datasets from UCI ML Repository – Mushroom – Spambase http://archive.ics.uci.edu/ml/datasets.html – Tic-Tac-Toe – Chess: King-Rook vs. King-Pawn • Same trends across all 4, so we report first 2 • Random 70 / 30 split of data for seed+pool / test • Repeat each run 10 times and average results August 23, 2012 Matt Lease - ml@ischool.utexas.edu 21
  • 22. p ~ U(0.6, 1.0) • Fairly accurate annotators (mean = 0.8) • Little uncertainty -> little gain from multi-labeling August 23, 2012 Matt Lease - ml@ischool.utexas.edu 22
  • 23. p ~ U(0.4, 0.6) • Very noisy (mean = 0.5, random coin flip) • SL and MV learning rates are flat • NB wins by weighting more accurate workers August 23, 2012 Matt Lease - ml@ischool.utexas.edu 23
  • 24. p ~ U(0.1, 0.7) • Worsen accuracies further (mean = 0.4) • NB virtually unchanged • SL and MV predictions become anti-correlated – We should actually flip their predictions… August 23, 2012 Matt Lease - ml@ischool.utexas.edu 24
  • 25. Label flipping • Is NB doing better due to how it uses accuracy, or simply because it’s using more information? • Average accuracy < 50% --> label usually wrong – NB implicitly captures; SL and MV do not • Label flipping: put all methods on even-footing • Simple case of bias vs. noise – Issue is not whether correlated or anti-correlated – Issue is strength of correlation August 23, 2012 Matt Lease - ml@ischool.utexas.edu 25
  • 26. p ~ U(0.1, 0.7) No flipping fter With flipping Mushroom Dataset Spambase Dataset 100 90 80 80 70 60 SL accuracy (%) SL accuracy (%) 60 MV accuracy(%) 40 MV accuracy(%) 50 NB accuracy (%) 40 20 NB accuracy (%) 30 0 20 64 128 256 512 1024 2048 4096 64 128 256 512 1024 2048 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 26
  • 27. Summary of study • Detecting anti-correlated (bad) workers more important than the model used • Open Questions – When accuracies are estimated (noisy)? – With actual error distribution (real data)? – With different learners or tasks (e.g. ranking)? – With dynamic choice of new example or re-label? – With active learning example selection? – With imbalanced classes? August 23, 2012 Matt Lease - ml@ischool.utexas.edu 27
  • 28. Snapshots August 23, 2012 Matt Lease - ml@ischool.utexas.edu 28
  • 29. Noisy Learning to Rank Kumar & Lease 2011b August 23, 2012 Matt Lease - ml@ischool.utexas.edu 29
  • 30. Semi-Supervised Repeated Labeling Tang & Lease, CIR’11 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 30
  • 31. Smart Crowd Filter • Ryu & Lease, ASIS&T’11 • Active Learning – Train Multi-class SVM to estimate P(Y|X) – Estimate average P(Y|X) for each worker – Filter out workers below threshold • Explore/Exploit (unexpected/expected labels) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 31
  • 32. Z-score Weighted Filtering & Voting Jung & Lease, Hcomp’11 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 32
  • 33. Inferring Missing Judgments Jung & Lease, 2012 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 33
  • 34. Jung & Lease, Hcomp’12 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 34
  • 35. Social Network + Crowdsourcing • Klinger & Lease, ASIS&T’11 August 23, 2012 Matt Lease - ml@ischool.utexas.edu 35
  • 36. Website Usability (Liu et al., 2012) August 23, 2012 Matt Lease - ml@ischool.utexas.edu 36
  • 37. 37
  • 38. Designing & Optimizing Workflows 38
  • 39. Workflow Management • How should we balance automation vs. human computation? Who does what? • Who’s the right person for the job? • Juggling constraints on budget, scheduling, quality, effort … August 23, 2012 Matt Lease - ml@ischool.utexas.edu 39
  • 40. What about sensitive data? • Not all data can be publicly disclosed – User data (e.g. AOL query log, Netflix ratings) – Intellectual property – Legal confidentiality • Need to restrict who is in your crowd – Separate channel (workforce) from technology – Hot question for adoption at enterprise level 40
  • 41. What about regulation? • Wolfson & Lease (ASIS&T’11) • As usual, technology is ahead of the law – employment law – patent inventorship – data security and the Federal Trade Commission – copyright ownership – securities regulation of crowdfunding • Take-away: don’t panic, but be mindful – Understand risks of “just in-time compliance” 41
  • 42. What about fraud? • Some reports of robot “workers” on MTurk – Artificial Artificial Artificial Intelligence – Violates terms of service • Why not just use a captcha? 42
  • 44. Requester Fraud on MTurk “Do not do any HITs that involve: filling in CAPTCHAs; secret shopping; test our web page; test zip code; free trial; click my link; surveys or quizzes (unless the requester is listed with a smiley in the Hall of Fame/Shame); anything that involves sending a text message; or basically anything that asks for any personal information at all—even your zip code. If you feel in your gut it’s not on the level, IT’S NOT. Why? Because they are scams...” 44
  • 46. Wang et al., WWW’12 • “…not only do malicious crowd-sourcing systems exist, but they are rapidly growing…” 46
  • 47. Robert Sim, MSR Summit’12 47
  • 48. Identifying Workers (Uniquely) • Need for identifiable workers – Repeated labeling – Recognizing “Master Workers” • Today – Platforms assign IDs intended to be unique – Problem in practice, esp. with multiple platforms – Sybil attacks • Identity value – If workers interchangeable, identities are disposable – If workers are distinguished, identifies become valuable – Reduce some types of attacks, increase others August 23, 2012 Matt Lease - ml@ischool.utexas.edu 48
  • 49. What about ethics? Fort, Adda, and Cohen (2011) • “…opportunities for our community to deliberately value ethics above cost savings.” • Suggest we focus on unpaid games; narrow solution Silberman, Irani, and Ross (2010) • “How should we… conceptualize the role of these people who we ask to power our computing?” • Power dynamics between parties • “Abstraction hides detail” 49
  • 50. Davis et al. (2010) The HPU. HPU 50
  • 52. Digital Dirty Jobs • The Googler who Looked at the Worst of the Internet • Policing the Web’s Lurid Precincts • Facebook content moderation • The dirty job of keeping Facebook clean • Even linguistic annotators report stress & nightmares from reading news articles! 52
  • 53. What about freedom? • Vision: empowering worker freedom: – work whenever you want for whomever you want • Risk: people being compelled to perform work – As crowdsourcing grows, greater $$$ at stake – Digital sweat shops? Digital slaves? – Prisoners used for gold farming – We really don’t know much today – Traction? Human Trafficking at MSR Summit’12 53
  • 54. Thank You! Students: Past & Present – Catherine Grady (iSchool) – Hyunjoon Jung (iSchool) – Jorn Klinger (Linguistics) – Adriana Kovashka (CS) – Abhimanu Kumar (CS) ir.ischool.utexas.edu/crowd – Hohyon Ryu (iSchool) – Wei Tang (CS) – Stephen Wolfson (iSchool) Support – John P. Commons Fellowship – Temple Fellowship Matt Lease - ml@ischool.utexas.edu - @mattlease 54