SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
data processing with
mechanical turk
Kelly O'Brien @klm427; github.com/kellyob
Michael Becker @beckerfuffle; github.com/mdbecker




                                                    #ptw2013
Mechanical Turk




                  #ptw2013
"The Turk"




             #ptw2013
Lets focus on the crowdsourcing...




 Relatively cheap means of getting random
  samples of input for small, tedious tasks



          "Crowdsourced labor can cost companies less than half as much as typical outsourcing"
               -- Panagiotis G. Ipeirotis, an associate professor at NYU's Stern School of Business

                                                                                              #ptw2013
"Nothing is a waste of time if you use the experience wisely."
                                                      ~Auguste Rodin




                                                             #ptw2013
The business challenge....




                             #ptw2013
The solution....




                   #ptw2013
Let start with the basics


                                              Template
 Data          Template                        Template
 Requester's                                     Template
    Data                                             HITs




                          Workers (Turkers)
                                                            #ptw2013
Use cases


● Classification

● Transcription

● Content Generation

● Surveys

                       #ptw2013
Do people actually use this?




                               #ptw2013
AOL




      #ptw2013
Twitter




          #ptw2013
CardMunch @LinkedIn




                      #ptw2013
The Sheep Market




                   #ptw2013
Development Tools
● Requester user interface

● Amazon offers four official APIs
  ○ Ruby, .NET, Perl, and Java

● AWS API

● Boto mturk
  ○ Python

● Houdini, Clockwork Raven,
  Crowdflower, QuikTurKit



                                     #ptw2013
Create a HIT

● A title
● A description
● Keywords, used to help Workers find the HITs with a search
● The amount of the reward
● An amount of time in which the Worker must complete the HIT
● An amount of time after which the HIT will no longer be available
  to Workers
● The number of Workers needed to submit results for the HIT
  before the HIT is considered complete
● Qualification requirements
● All of the information required to answer the question


                                                                #ptw2013
Process Results

●   Assignment id
●   Worker id
●   HIT id
●   Assignment status
●   Auto approval time
●   Accept time
●   Submit time
●   Approval time
●   Rejection time
●   Deadline
●   Answer
●   Requester feedback

                         #ptw2013
What was the question?

● Question forms

● External questions

● HTML questions




                         #ptw2013
Formatting HITs


●   Compact

●   Coherent

●   Cost-effective


                     #ptw2013
Bad Actors




  "Unfortunately, since manually verifying the quality of the submitted results is hard, malicious workers often take
                                     advantage of the verification difficulty and submit answers of low quality." [1]


                                                                                                            #ptw2013
Quality Control

● Manually spot
  check
● Qualifications
● Multiple agreement
● Gold HITs
● Calculate worker
  error


                       #ptw2013
Quality Control: Manually Check
Look through the results of some workers and manually
reject/ban those which look bad




                                                        #ptw2013
Quality Control: Multiple Agreement
1. Submit HITs to multiple turks (3-10)
2. Reject/throw out all HITs below some
   agreement threshold




                                          #ptw2013
Quality Control: Qualifications




  ● Pay extra for "superior"
         turks
  ●      Build your own custom
         qualification




"Thought Masters was just bad for non-blessed workers? It's even worse for requesters [1]"
                                                                                             #ptw2013
Quality Control: Gold HITs
1. Give turks HITs which we know the correct answer to
2. Reject/Ban turks with high error rates
This technique is used by CrowdFlower




                                                     #ptw2013
Quality Control: Calculate Error
Calculate each worker's error rate based solely on their agreement with other
workers. Use an expectation-maximization algorithm as described by Dawid
and Skene.




Lots of math, consider using 3rd party service like Project Troia


                                                                          #ptw2013
Auto-approval




      "Quick approval is important, too. Watching that money pile up is a serious
motivator; I’ll sometimes choose a lower-paying task that approves in close to real
                 time over a higher-paying one that won’t pay out for several days."
                                                                        -worker[1]
                                                                                #ptw2013
Turkopticon
"Turkopticon lets you REPORT and AVOID shady
employers"




                                          #ptw2013
Turkernation
"If you want to make a living on Amazon
Mechanical Turk, this is the forum for you"




                                              #ptw2013
Do's and Don'ts




                  #ptw2013
What exactly do I do with this?




                                  #ptw2013
A demo in python




                   #ptw2013
Requirements




               #ptw2013
Data Details




               #ptw2013
Question template




                    #ptw2013
Build a custom qualification




                               #ptw2013
Post HITs....




                #ptw2013
Success.




           #ptw2013
Let the work begin.




                      #ptw2013
To get results...




                    #ptw2013
AWeber




         We're hiring.
                         aweber.jobs
....and we have slides.




           aweberopenhouse.eventbrite.com
Data Processing with Mechanical Turk

Mais conteúdo relacionado

Semelhante a Data Processing with Mechanical Turk

Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...
Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...
Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...
Codiax
 

Semelhante a Data Processing with Mechanical Turk (20)

Developer Experience
Developer ExperienceDeveloper Experience
Developer Experience
 
Merijn Neeleman - Impact of 3D printing on Service Design
Merijn Neeleman - Impact of 3D printing on Service DesignMerijn Neeleman - Impact of 3D printing on Service Design
Merijn Neeleman - Impact of 3D printing on Service Design
 
12 Things to do Before Your Company Dies : Conversion Conference London - Oct...
12 Things to do Before Your Company Dies : Conversion Conference London - Oct...12 Things to do Before Your Company Dies : Conversion Conference London - Oct...
12 Things to do Before Your Company Dies : Conversion Conference London - Oct...
 
Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...
Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...
Victoria Albrecht (Springbok AI) – Learnings from Deploying AI and Chatbot Pr...
 
Running a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learnedRunning a small, high tech consulting firm - lessons learned
Running a small, high tech consulting firm - lessons learned
 
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
 
I, project manager, The rise of artificial intelligence in the world of proje...
I, project manager, The rise of artificial intelligence in the world of proje...I, project manager, The rise of artificial intelligence in the world of proje...
I, project manager, The rise of artificial intelligence in the world of proje...
 
Interviewstreet goals
Interviewstreet goalsInterviewstreet goals
Interviewstreet goals
 
Mobile Product Strategy Keynote Presentation for Mobile App Europe Conference...
Mobile Product Strategy Keynote Presentation for Mobile App Europe Conference...Mobile Product Strategy Keynote Presentation for Mobile App Europe Conference...
Mobile Product Strategy Keynote Presentation for Mobile App Europe Conference...
 
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
How to hire top software engineers
How to hire top software engineersHow to hire top software engineers
How to hire top software engineers
 
Everyone is a project manager. You can too!
Everyone is a project manager. You can too!Everyone is a project manager. You can too!
Everyone is a project manager. You can too!
 
Running a business on Web Scraped Data
Running a business on Web Scraped DataRunning a business on Web Scraped Data
Running a business on Web Scraped Data
 
My talk at a workshop for the 2nd generation of TSstartup
My talk at a workshop for the 2nd generation of TSstartupMy talk at a workshop for the 2nd generation of TSstartup
My talk at a workshop for the 2nd generation of TSstartup
 
Break up the Monolith: Testing Microservices
Break up the Monolith: Testing MicroservicesBreak up the Monolith: Testing Microservices
Break up the Monolith: Testing Microservices
 
InsightBridger - Agoda.pptx
InsightBridger - Agoda.pptxInsightBridger - Agoda.pptx
InsightBridger - Agoda.pptx
 
OWASP AppSec EU 2016 - Security Project Management - How to be Agile in Secu...
OWASP AppSec EU 2016 - Security Project Management -  How to be Agile in Secu...OWASP AppSec EU 2016 - Security Project Management -  How to be Agile in Secu...
OWASP AppSec EU 2016 - Security Project Management - How to be Agile in Secu...
 
#Interactive Session by Sujeet Kumar Maurya, "Performance engineering with AI...
#Interactive Session by Sujeet Kumar Maurya, "Performance engineering with AI...#Interactive Session by Sujeet Kumar Maurya, "Performance engineering with AI...
#Interactive Session by Sujeet Kumar Maurya, "Performance engineering with AI...
 
Elpie Bannister & Alex Yang (Simprints) - Finding and Nurturing Tech Talent
Elpie Bannister & Alex Yang (Simprints) - Finding and Nurturing Tech TalentElpie Bannister & Alex Yang (Simprints) - Finding and Nurturing Tech Talent
Elpie Bannister & Alex Yang (Simprints) - Finding and Nurturing Tech Talent
 

Mais de AWeber

Beginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social NetworksBeginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social Networks
AWeber
 
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 201230 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
AWeber
 

Mais de AWeber (20)

ASCEND Content Marketing Power Tools
ASCEND Content Marketing Power ToolsASCEND Content Marketing Power Tools
ASCEND Content Marketing Power Tools
 
ASCEND Multichannel Marketing Power Tools
ASCEND Multichannel Marketing Power ToolsASCEND Multichannel Marketing Power Tools
ASCEND Multichannel Marketing Power Tools
 
Beginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social NetworksBeginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social Networks
 
5 Content Blind Spots and How to Avoid Them
5 Content Blind Spots and How to Avoid Them5 Content Blind Spots and How to Avoid Them
5 Content Blind Spots and How to Avoid Them
 
Digital Marketing Tips from Experts at the Top of the Summit
Digital Marketing Tips from Experts at the Top of the SummitDigital Marketing Tips from Experts at the Top of the Summit
Digital Marketing Tips from Experts at the Top of the Summit
 
Realtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learnRealtime predictive analytics using RabbitMQ & scikit-learn
Realtime predictive analytics using RabbitMQ & scikit-learn
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 
5 WordPress Plugins that will Rock Your World
5 WordPress Plugins that will Rock Your World5 WordPress Plugins that will Rock Your World
5 WordPress Plugins that will Rock Your World
 
How to Grow Your Email List Like the Pros
How to Grow Your Email List Like the ProsHow to Grow Your Email List Like the Pros
How to Grow Your Email List Like the Pros
 
How to Create Killer Emails that Make Readers Love You
How to Create Killer Emails that Make Readers Love YouHow to Create Killer Emails that Make Readers Love You
How to Create Killer Emails that Make Readers Love You
 
Breathing Life (and ROI) Back Into Your Email Marketing
Breathing Life (and ROI) Back Into Your Email MarketingBreathing Life (and ROI) Back Into Your Email Marketing
Breathing Life (and ROI) Back Into Your Email Marketing
 
More Engagement, Less Effort: The Lowdown on Marketing Automation
More Engagement, Less Effort: The Lowdown on Marketing AutomationMore Engagement, Less Effort: The Lowdown on Marketing Automation
More Engagement, Less Effort: The Lowdown on Marketing Automation
 
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
 
Email List-Building 101: How to Reel In New Readers with a Few Simple Steps
Email List-Building 101: How to Reel In New Readers with a Few Simple StepsEmail List-Building 101: How to Reel In New Readers with a Few Simple Steps
Email List-Building 101: How to Reel In New Readers with a Few Simple Steps
 
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 201230 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
 
How To Get The Results You Want From An Email Campaign
How To Get The Results You Want From An Email CampaignHow To Get The Results You Want From An Email Campaign
How To Get The Results You Want From An Email Campaign
 
Smart Email Marketing: Engage Your Customers and Grow Your Business
Smart Email Marketing: Engage Your Customers and Grow Your BusinessSmart Email Marketing: Engage Your Customers and Grow Your Business
Smart Email Marketing: Engage Your Customers and Grow Your Business
 
Get More Email Subscribers
Get More Email SubscribersGet More Email Subscribers
Get More Email Subscribers
 
Efficient Marketing: The Tools You Need and How to Use Them
Efficient Marketing: The Tools You Need and How to Use ThemEfficient Marketing: The Tools You Need and How to Use Them
Efficient Marketing: The Tools You Need and How to Use Them
 
From Local Business to National Sensation
From Local Business to National SensationFrom Local Business to National Sensation
From Local Business to National Sensation
 

Último

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Data Processing with Mechanical Turk

  • 1. data processing with mechanical turk Kelly O'Brien @klm427; github.com/kellyob Michael Becker @beckerfuffle; github.com/mdbecker #ptw2013
  • 2. Mechanical Turk #ptw2013
  • 3. "The Turk" #ptw2013
  • 4. Lets focus on the crowdsourcing... Relatively cheap means of getting random samples of input for small, tedious tasks "Crowdsourced labor can cost companies less than half as much as typical outsourcing" -- Panagiotis G. Ipeirotis, an associate professor at NYU's Stern School of Business #ptw2013
  • 5. "Nothing is a waste of time if you use the experience wisely." ~Auguste Rodin #ptw2013
  • 7. The solution.... #ptw2013
  • 8. Let start with the basics Template Data Template Template Requester's Template Data HITs Workers (Turkers) #ptw2013
  • 9. Use cases ● Classification ● Transcription ● Content Generation ● Surveys #ptw2013
  • 10. Do people actually use this? #ptw2013
  • 11. AOL #ptw2013
  • 12. Twitter #ptw2013
  • 14. The Sheep Market #ptw2013
  • 15. Development Tools ● Requester user interface ● Amazon offers four official APIs ○ Ruby, .NET, Perl, and Java ● AWS API ● Boto mturk ○ Python ● Houdini, Clockwork Raven, Crowdflower, QuikTurKit #ptw2013
  • 16. Create a HIT ● A title ● A description ● Keywords, used to help Workers find the HITs with a search ● The amount of the reward ● An amount of time in which the Worker must complete the HIT ● An amount of time after which the HIT will no longer be available to Workers ● The number of Workers needed to submit results for the HIT before the HIT is considered complete ● Qualification requirements ● All of the information required to answer the question #ptw2013
  • 17. Process Results ● Assignment id ● Worker id ● HIT id ● Assignment status ● Auto approval time ● Accept time ● Submit time ● Approval time ● Rejection time ● Deadline ● Answer ● Requester feedback #ptw2013
  • 18. What was the question? ● Question forms ● External questions ● HTML questions #ptw2013
  • 19. Formatting HITs ● Compact ● Coherent ● Cost-effective #ptw2013
  • 20. Bad Actors "Unfortunately, since manually verifying the quality of the submitted results is hard, malicious workers often take advantage of the verification difficulty and submit answers of low quality." [1] #ptw2013
  • 21. Quality Control ● Manually spot check ● Qualifications ● Multiple agreement ● Gold HITs ● Calculate worker error #ptw2013
  • 22. Quality Control: Manually Check Look through the results of some workers and manually reject/ban those which look bad #ptw2013
  • 23. Quality Control: Multiple Agreement 1. Submit HITs to multiple turks (3-10) 2. Reject/throw out all HITs below some agreement threshold #ptw2013
  • 24. Quality Control: Qualifications ● Pay extra for "superior" turks ● Build your own custom qualification "Thought Masters was just bad for non-blessed workers? It's even worse for requesters [1]" #ptw2013
  • 25. Quality Control: Gold HITs 1. Give turks HITs which we know the correct answer to 2. Reject/Ban turks with high error rates This technique is used by CrowdFlower #ptw2013
  • 26. Quality Control: Calculate Error Calculate each worker's error rate based solely on their agreement with other workers. Use an expectation-maximization algorithm as described by Dawid and Skene. Lots of math, consider using 3rd party service like Project Troia #ptw2013
  • 27. Auto-approval "Quick approval is important, too. Watching that money pile up is a serious motivator; I’ll sometimes choose a lower-paying task that approves in close to real time over a higher-paying one that won’t pay out for several days." -worker[1] #ptw2013
  • 28. Turkopticon "Turkopticon lets you REPORT and AVOID shady employers" #ptw2013
  • 29. Turkernation "If you want to make a living on Amazon Mechanical Turk, this is the forum for you" #ptw2013
  • 30. Do's and Don'ts #ptw2013
  • 31. What exactly do I do with this? #ptw2013
  • 32. A demo in python #ptw2013
  • 33. Requirements #ptw2013
  • 34. Data Details #ptw2013
  • 35. Question template #ptw2013
  • 36. Build a custom qualification #ptw2013
  • 37. Post HITs.... #ptw2013
  • 38. Success. #ptw2013
  • 39. Let the work begin. #ptw2013
  • 40. To get results... #ptw2013
  • 41. AWeber We're hiring. aweber.jobs
  • 42. ....and we have slides. aweberopenhouse.eventbrite.com