SlideShare a Scribd company logo
1 of 26
Download to read offline
Consumer Analytics in Real Time:
How InfoScout Tracks Purchase Behavior with Mechanical Turk
Jon Brelig, CTO, InfoScout
Sharon Chiarella, Vice President, Amazon Mechanical Turk
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Overview

– Receipt workflow
– Quality control
– Analytics
Wish I knew who that shopper was!
Helping brands answer…
•
•
•
•
•
•
•

Who’s buying my product?
Who’s the end consumer?
Why did they buy?
When and where?
How many?
At what price?
With what else?

Who’s the shopper? What’s their motive?
How do we build
a better panel?
Capture receipts through mobile
Our mobile apps
Receipt Hog

Put $ in your pocket!

Shoparoo

Fundraise for a cause!
Architecture

target.com
target.com

Masterdata
MySQL

GAT G2 LMN LIME = UPC 052000209648

1. Capture Receipt

2. Convert to structured data
Computer vision + OCR + MTurk

3) Link to masterdata
Scraping + classification models +
human training

Tlog
Redshift

5. Build cool stuff on top of it!
Analytics, data firehouse, hacks, etc.

4) Data warehouse & prematerialize
MySQL, Amazon Redshift, Hadoop
(Amazon EMR)
Digitizing Receipts
Task is to convert image(s) of receipts => structured data
Amazon Mechanical Turk
Transcribing Receipts
• Isn’t OCR good enough?

Auto Extract
OpenCV, OCR, Regex

– Leverage OCR & computer vision, fill gaps with
humans

• Human = MTurk + small audit staff
– We leverage a 6-person team to act as the top
audit layer of the system

User marks or staff rejects HIT

• Hybrid of computer + human

Summary Extraction
Mechanical Turk

Itemized Extraction
Mechanical Turk

Score & Audit
Staff / Mechanical Turk

Complete

Can we skip?

– It is a solved problem… for books
– Low recognition on wrinkled receipts from mobile
Summary Transcription

Summary Extraction
Mechanical Turk

Itemized Extraction
Mechanical Turk

Score & Audit
Staff / Mechanical Turk

Complete

Can we skip?

User marks or staff rejects HIT

Auto Extract
OpenCV, OCR, Regex
Summary Transcription
Receipts by Month
1,200,000
1,000,000
800,000
600,000
400,000
200,000
-

How do we scale quality control with growing volume?
Known Answers
• Publish HIT with at least one
known answer to audit Worker
accuracy
• Additional support provided by
Amazon API
• Most effective when there is a
concrete, expected answer
– i.e. Multiple choice answers

Known Answer
Known Answers
Net Cost per Receipt
Developed more efficient review process
$0.0300

Transitioned to Known Answers

$0.0250
$0.0200
$0.0150
$0.0100
$0.0050
$-

InfoScout Review Cost

Mturk Cost

Known Answers lowered our net cost per receipt from 2 cents to 1 cent per receipt
Itemized Extraction

Summary Extraction
Mechanical Turk

Itemized Extraction
Mechanical Turk

Score & Audit
Staff / Mechanical Turk

Complete

Can we skip?

User marks or staff rejects HIT

Auto Extract
OpenCV, OCR, Regex
Itemized Extraction
• Transcribe every item on receipt
• HITs audited by review team, priority scored by:
–
–
–
–
–

Comparing output to known OCR extraction
Comparison to master data? (i.e. did they “fat finger” a price or UPC?)
Worker approval history
Worker tenure (for InfoScout HITs)
Additional features

• Not a great candidate for Known Answers….
How do we scale quality control for itemized extraction?
Plurality

Publish HIT

• HIT completed by >1 Worker
– InfoScout only sends HITs with low
confidence to multiple Workers
Worker 2
Submits

Worker 1
Submits

• Higher quality, higher cost
– Limit costs by scientifically selecting HITs to
send to a second Worker

• Multiple strategies when an answer
discrepancy is found
– Ask a third Worker
– Leverage internal auditors

Match
?
YES

Accept
HIT Acceptance Latency
700

Minutes to Accept

600

Changed Template

500
400
300
200
100
0
12/22/12

•
•

1/22/13

2/22/13

3/22/13

4/22/13

5/22/13

6/22/13

Measures HIT demand
Template change decreased demand temporarily, but Workers acclimated
700,000

100%
90%

Total HITs Completed

600,000

80%
500,000

70%
60%

400,000

50%
300,000

40%
30%

200,000

20%
100,000

10%
0%

0

HITs Complete (New Workers)

% Completed by retained Workers

Worker Retention

HITs Complete (Retained Workers)

Within two months, 80% of HITs were completed by returning Workers
Pareto of Worker Volume
90%
% of all HITs completed

80%
70%
60%
50%
40%
30%
20%
10%
0%
Top 5%

6-10%

10-20%

21-50%

51-100%

Worker Percentile

Our top 5% (~500) active Workers account for >80% of all HITs completed
Analytics Demo
Please give us your feedback on this
presentation

BDT206
As a thank you, we will select prize
winners daily for completed surveys!
Appendix
Quality Control Strategies
• Filter incoming Workers
– Qualifications
– Template validation
– Template instructions

Enhance

• Increase quality during completion
HIT

• Post submission
– Plurality (multiple HITs per task)
– Known Answers
– Workers audit Workers

Approve/Reject?

Multiple strategies can yield high accuracy
HIT templates
• Clear & concise instructions
– 1st time each Worker sees detailed
instructions, has ability to hide once
they’re comfortable

• Keyboard shortcuts
• Maximize Validation
– Client-side and/or AJAX validation

• Bonus Rewards
– Nice option for rewarding Workers,
especially when HIT’s are variable in
length & time

More Related Content

What's hot

Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...
Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...
Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...Apttus
 
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivWhen business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivZorin Radovancevic
 
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?Mark Heckler
 
1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptop1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptopRising Media, Inc.
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in LookerLooker
 
Omni-Channel E-Commerce for Fun and Profits
Omni-Channel E-Commerce for Fun and ProfitsOmni-Channel E-Commerce for Fun and Profits
Omni-Channel E-Commerce for Fun and ProfitsApttus
 
Mastering Paid Search Automation
Mastering Paid Search AutomationMastering Paid Search Automation
Mastering Paid Search AutomationROI Revolution
 
Sage Estimating: Better estimates by every measure
Sage Estimating: Better estimates by every measureSage Estimating: Better estimates by every measure
Sage Estimating: Better estimates by every measureKerri Davies
 

What's hot (9)

Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...
Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...
Quote-to-Cash For Financial Services: Tech Creates ROI for Your Business and ...
 
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel AvivWhen business meets measurement protocol - atdconf - 2017 - Tel Aviv
When business meets measurement protocol - atdconf - 2017 - Tel Aviv
 
Scientific Revenue and R
Scientific Revenue and RScientific Revenue and R
Scientific Revenue and R
 
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
This stuff is cool, but...HOW CAN I GET MY COMPANY TO DO IT?
 
1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptop1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptop
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in Looker
 
Omni-Channel E-Commerce for Fun and Profits
Omni-Channel E-Commerce for Fun and ProfitsOmni-Channel E-Commerce for Fun and Profits
Omni-Channel E-Commerce for Fun and Profits
 
Mastering Paid Search Automation
Mastering Paid Search AutomationMastering Paid Search Automation
Mastering Paid Search Automation
 
Sage Estimating: Better estimates by every measure
Sage Estimating: Better estimates by every measureSage Estimating: Better estimates by every measure
Sage Estimating: Better estimates by every measure
 

Similar to Consumer Analytics in Real Time: How InfoScout Tracks Purchase Behavior with Mechanical Turk

Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1Panna Visani MBCS ACCA
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Thomas Jensen. Machine Learning
Thomas Jensen. Machine LearningThomas Jensen. Machine Learning
Thomas Jensen. Machine LearningVolha Banadyseva
 
AppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development Company
AppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development CompanyAppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development Company
AppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development CompanySandeep Srivastava
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission TeamsDashlane
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...Amazon Web Services
 
Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...
Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...
Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...Patrick Van Renterghem
 
Smart retail & hospitality 2019 joe keating - retail automation
Smart retail & hospitality 2019   joe keating - retail automationSmart retail & hospitality 2019   joe keating - retail automation
Smart retail & hospitality 2019 joe keating - retail automationJoe Keating
 
Time-to-Event Models, presented by DataSong and Revolution Analytics
Time-to-Event Models, presented by DataSong and Revolution AnalyticsTime-to-Event Models, presented by DataSong and Revolution Analytics
Time-to-Event Models, presented by DataSong and Revolution AnalyticsRevolution Analytics
 
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Amazon Web Services Korea
 
Metrics that every startup should know
Metrics that every startup should knowMetrics that every startup should know
Metrics that every startup should knowAlexey Orap
 
BAD Toolkit - Berlin Overview
BAD Toolkit - Berlin OverviewBAD Toolkit - Berlin Overview
BAD Toolkit - Berlin OverviewBAD-gcrossley
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsBernardo Srulzon
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
 
[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything
[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything
[WSO2 Summit Americas 2020] Having the Best Technology Isn’t EverythingWSO2
 
ferret_company_facts_en(30.03.17)
ferret_company_facts_en(30.03.17)ferret_company_facts_en(30.03.17)
ferret_company_facts_en(30.03.17)ferretslides
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Ai design sprint - Finance - Wealth management
Ai design sprint  - Finance - Wealth managementAi design sprint  - Finance - Wealth management
Ai design sprint - Finance - Wealth managementChinmay Patel
 

Similar to Consumer Analytics in Real Time: How InfoScout Tracks Purchase Behavior with Mechanical Turk (20)

Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Thomas Jensen. Machine Learning
Thomas Jensen. Machine LearningThomas Jensen. Machine Learning
Thomas Jensen. Machine Learning
 
AppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development Company
AppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development CompanyAppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development Company
AppCode Technologies Pvt. Ltd. - INDIA | Mobile App Development Company
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
 
Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...
Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...
Artificial Intelligence beyond the hype: Local (Belgian) Machine Learning suc...
 
Smart retail & hospitality 2019 joe keating - retail automation
Smart retail & hospitality 2019   joe keating - retail automationSmart retail & hospitality 2019   joe keating - retail automation
Smart retail & hospitality 2019 joe keating - retail automation
 
Time-to-Event Models, presented by DataSong and Revolution Analytics
Time-to-Event Models, presented by DataSong and Revolution AnalyticsTime-to-Event Models, presented by DataSong and Revolution Analytics
Time-to-Event Models, presented by DataSong and Revolution Analytics
 
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
 
Metrics that every startup should know
Metrics that every startup should knowMetrics that every startup should know
Metrics that every startup should know
 
BAD Toolkit - Berlin Overview
BAD Toolkit - Berlin OverviewBAD Toolkit - Berlin Overview
BAD Toolkit - Berlin Overview
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisions
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything
[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything
[WSO2 Summit Americas 2020] Having the Best Technology Isn’t Everything
 
ferret_company_facts_en(30.03.17)
ferret_company_facts_en(30.03.17)ferret_company_facts_en(30.03.17)
ferret_company_facts_en(30.03.17)
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Ai design sprint - Finance - Wealth management
Ai design sprint  - Finance - Wealth managementAi design sprint  - Finance - Wealth management
Ai design sprint - Finance - Wealth management
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Consumer Analytics in Real Time: How InfoScout Tracks Purchase Behavior with Mechanical Turk

  • 1. Consumer Analytics in Real Time: How InfoScout Tracks Purchase Behavior with Mechanical Turk Jon Brelig, CTO, InfoScout Sharon Chiarella, Vice President, Amazon Mechanical Turk November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Overview – Receipt workflow – Quality control – Analytics
  • 3.
  • 4. Wish I knew who that shopper was!
  • 5. Helping brands answer… • • • • • • • Who’s buying my product? Who’s the end consumer? Why did they buy? When and where? How many? At what price? With what else? Who’s the shopper? What’s their motive?
  • 6. How do we build a better panel? Capture receipts through mobile
  • 7. Our mobile apps Receipt Hog Put $ in your pocket! Shoparoo Fundraise for a cause!
  • 8. Architecture target.com target.com Masterdata MySQL GAT G2 LMN LIME = UPC 052000209648 1. Capture Receipt 2. Convert to structured data Computer vision + OCR + MTurk 3) Link to masterdata Scraping + classification models + human training Tlog Redshift 5. Build cool stuff on top of it! Analytics, data firehouse, hacks, etc. 4) Data warehouse & prematerialize MySQL, Amazon Redshift, Hadoop (Amazon EMR)
  • 9. Digitizing Receipts Task is to convert image(s) of receipts => structured data
  • 11. Transcribing Receipts • Isn’t OCR good enough? Auto Extract OpenCV, OCR, Regex – Leverage OCR & computer vision, fill gaps with humans • Human = MTurk + small audit staff – We leverage a 6-person team to act as the top audit layer of the system User marks or staff rejects HIT • Hybrid of computer + human Summary Extraction Mechanical Turk Itemized Extraction Mechanical Turk Score & Audit Staff / Mechanical Turk Complete Can we skip? – It is a solved problem… for books – Low recognition on wrinkled receipts from mobile
  • 12. Summary Transcription Summary Extraction Mechanical Turk Itemized Extraction Mechanical Turk Score & Audit Staff / Mechanical Turk Complete Can we skip? User marks or staff rejects HIT Auto Extract OpenCV, OCR, Regex
  • 13. Summary Transcription Receipts by Month 1,200,000 1,000,000 800,000 600,000 400,000 200,000 - How do we scale quality control with growing volume?
  • 14. Known Answers • Publish HIT with at least one known answer to audit Worker accuracy • Additional support provided by Amazon API • Most effective when there is a concrete, expected answer – i.e. Multiple choice answers Known Answer
  • 15. Known Answers Net Cost per Receipt Developed more efficient review process $0.0300 Transitioned to Known Answers $0.0250 $0.0200 $0.0150 $0.0100 $0.0050 $- InfoScout Review Cost Mturk Cost Known Answers lowered our net cost per receipt from 2 cents to 1 cent per receipt
  • 16. Itemized Extraction Summary Extraction Mechanical Turk Itemized Extraction Mechanical Turk Score & Audit Staff / Mechanical Turk Complete Can we skip? User marks or staff rejects HIT Auto Extract OpenCV, OCR, Regex
  • 17. Itemized Extraction • Transcribe every item on receipt • HITs audited by review team, priority scored by: – – – – – Comparing output to known OCR extraction Comparison to master data? (i.e. did they “fat finger” a price or UPC?) Worker approval history Worker tenure (for InfoScout HITs) Additional features • Not a great candidate for Known Answers…. How do we scale quality control for itemized extraction?
  • 18. Plurality Publish HIT • HIT completed by >1 Worker – InfoScout only sends HITs with low confidence to multiple Workers Worker 2 Submits Worker 1 Submits • Higher quality, higher cost – Limit costs by scientifically selecting HITs to send to a second Worker • Multiple strategies when an answer discrepancy is found – Ask a third Worker – Leverage internal auditors Match ? YES Accept
  • 19. HIT Acceptance Latency 700 Minutes to Accept 600 Changed Template 500 400 300 200 100 0 12/22/12 • • 1/22/13 2/22/13 3/22/13 4/22/13 5/22/13 6/22/13 Measures HIT demand Template change decreased demand temporarily, but Workers acclimated
  • 20. 700,000 100% 90% Total HITs Completed 600,000 80% 500,000 70% 60% 400,000 50% 300,000 40% 30% 200,000 20% 100,000 10% 0% 0 HITs Complete (New Workers) % Completed by retained Workers Worker Retention HITs Complete (Retained Workers) Within two months, 80% of HITs were completed by returning Workers
  • 21. Pareto of Worker Volume 90% % of all HITs completed 80% 70% 60% 50% 40% 30% 20% 10% 0% Top 5% 6-10% 10-20% 21-50% 51-100% Worker Percentile Our top 5% (~500) active Workers account for >80% of all HITs completed
  • 23. Please give us your feedback on this presentation BDT206 As a thank you, we will select prize winners daily for completed surveys!
  • 25. Quality Control Strategies • Filter incoming Workers – Qualifications – Template validation – Template instructions Enhance • Increase quality during completion HIT • Post submission – Plurality (multiple HITs per task) – Known Answers – Workers audit Workers Approve/Reject? Multiple strategies can yield high accuracy
  • 26. HIT templates • Clear & concise instructions – 1st time each Worker sees detailed instructions, has ability to hide once they’re comfortable • Keyboard shortcuts • Maximize Validation – Client-side and/or AJAX validation • Bonus Rewards – Nice option for rewarding Workers, especially when HIT’s are variable in length & time