SlideShare uma empresa Scribd logo
1 de 18
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Transformational Impact
of Cloud Labor
John Hoskins & Daniel Gray
jhoskins@amazon.com
djgray@amazon.com
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
22
][ How is Mechanical Turk
impacting Business?
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Forestry Service wants to provide real time
online campsite booking
• 350,000 individual campsites – exact location
is unknown
• Thousands of campgrounds with little or no
POI data (bathroom? shower? Boat ramp?)
• No concierge for a double booking
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
US Copyright Office would like to provide
internet access to CR data
• Current data is contained exclusively on
cards and microfilm
• Scanning project is underway
• No taxonomy for discovery
“What would the internet be without a search
engine?”
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
5
Business Need[ ]
5
The FDA wants to provide instant access to
product and drug recall and interaction information
to better protect consumers.
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Over 2 MILLION serious ADRs yearly
• 100,000 DEATHS yearly
• ADRs 4th leading cause of death ahead of pulmonary
disease, diabetes, AIDS, pneumonia, accidents and
automobile death
Why[ ]
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
7
Business Problem[ ]
7
Reports of interactions are delivered randomly
and the current process to extract data from
thousands of forms causes significant lag in its
availability
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Data can be received in multiple formats – forms, written
and typed, email, electronic . . .
• Data is subject to HIPAA privacy regulations.
• Accuracy and response time are critical – budget constraint
obvious
8
Challenge[ ]
8
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Technology can shred the form into field level or below
• OCR makes a pass at recognizing the data
• Workers correct OCR.
• Data from workers is reconstructed into digital input for the
database
• Data is made available through the API openFDA
9
Solution[ ]
9
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
0
Business Need[ ]
10
A Government Defense contractor needs to
update its natural language processing system to
accommodate “internet speak”.
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
1
Business Problem[ ]
11
Comments from the internet in the form of posts
and tweets more closely resemble spoken
language – while NLP is predicated on written
language.
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• NLP is involved in a mission critical defense system and is
missing significant data due to inaccuracies.
• Cross referencing spoken language to written language in
Arabic is uniquely complex
• Training requires millions of data points of ground truth
1
2
Challenge[ ]
12
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Internet crawler scrapes posts with interesting key words and phrases.
• Phrases are translated by 5 unique native Arabic speakers (5 dialects) with
English as their second language
• Each of the 5 phrases are corrected by English grammar experts
• The five corrected phrases are voted on by a panel of 5 additional workers
• The best phrase (highest score with least corrections) is sent to 5 native
English speakers with Arabic as second language for translation
• Each result is corrected by Arabic grammar experts and then voted on
• Best result is fed into NLP with original phrase for learning
1
3
Solution[ ]
13
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
4
Business Need[ ]
14
Army Research Labs needed to annotate verbs
across many permutations against actual human
actions to train robots to recognize
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
5
Business Problem[ ]
15
The volume of data required placed significant
delays on the project – yet accuracy was
paramount to the results
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Sample consisted of 100 different samples of 10
permutations of 35 verbs – 350,000 videos
• At 20 seconds each that’s almost 2000 hours – a person
year.
• Project needed completion within 60 days
1
6
Challenge[ ]
16
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Workers were given 50 videos per task and asked if the
video represented a given verb permutation
• Gold standard videos were included in each batch of 50
• Vote consisted of 2 workers with 100% Gold standard
accuracy agreeing
1
7
Solution[ ]
17
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank You
http://www.mturk.com
18
John Hoskins, Amazon Mechanical Turk
hoskins@amazon.com

Mais conteúdo relacionado

Destaque

[R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th [R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th
Loai Awad
 
Amazon mechanical turk intro to govt partners v2
Amazon mechanical turk intro to govt partners v2Amazon mechanical turk intro to govt partners v2
Amazon mechanical turk intro to govt partners v2
John Hoskins
 
[R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th [R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th
Loai Awad
 
Strength of materials
 Strength of materials  Strength of materials
Strength of materials
Loai Awad
 

Destaque (11)

[R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th [R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th
 
La elaha ella_allah
La elaha ella_allahLa elaha ella_allah
La elaha ella_allah
 
Amazon mechanical turk intro to govt partners v2
Amazon mechanical turk intro to govt partners v2Amazon mechanical turk intro to govt partners v2
Amazon mechanical turk intro to govt partners v2
 
Amazon mechanical turk intro to bpo's v3
Amazon mechanical turk intro to bpo's v3Amazon mechanical turk intro to bpo's v3
Amazon mechanical turk intro to bpo's v3
 
Manual wpf
Manual wpfManual wpf
Manual wpf
 
[R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th [R. k. bansal]strength of materials 4th
[R. k. bansal]strength of materials 4th
 
الموسوعه المصورة للاعجاز العلمي في القران الكريم4
الموسوعه المصورة للاعجاز العلمي في القران الكريم4الموسوعه المصورة للاعجاز العلمي في القران الكريم4
الموسوعه المصورة للاعجاز العلمي في القران الكريم4
 
الموسوعه المصورة للاعجاز العلمي في القران الكريم 2
الموسوعه المصورة للاعجاز العلمي في القران الكريم  2الموسوعه المصورة للاعجاز العلمي في القران الكريم  2
الموسوعه المصورة للاعجاز العلمي في القران الكريم 2
 
Quran miracle-encycopediaالموسوعه العلميه في الاعجاز القراني
Quran miracle-encycopediaالموسوعه العلميه في الاعجاز القراني Quran miracle-encycopediaالموسوعه العلميه في الاعجاز القراني
Quran miracle-encycopediaالموسوعه العلميه في الاعجاز القراني
 
Revolucao francesa
Revolucao francesa Revolucao francesa
Revolucao francesa
 
Strength of materials
 Strength of materials  Strength of materials
Strength of materials
 

Semelhante a How Public Sector is using Mechanical Turk

Semelhante a How Public Sector is using Mechanical Turk (20)

Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....
Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....
Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C....
 
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
Leveraging the Cloud to Strengthen Democracy: A Case Study - AWS Washington D...
 
Leveraging the Cloud to Strengthen Democracy: NDI Case Study - AWS Washington...
Leveraging the Cloud to Strengthen Democracy: NDI Case Study - AWS Washington...Leveraging the Cloud to Strengthen Democracy: NDI Case Study - AWS Washington...
Leveraging the Cloud to Strengthen Democracy: NDI Case Study - AWS Washington...
 
Outcome Broker: Data Driven Innovation - AWS Washington D.C. Symposium 2014
Outcome Broker: Data Driven Innovation - AWS Washington D.C. Symposium 2014Outcome Broker: Data Driven Innovation - AWS Washington D.C. Symposium 2014
Outcome Broker: Data Driven Innovation - AWS Washington D.C. Symposium 2014
 
Spikey Workloads - AWS Symposium 2014 - Washington D.C. - Partner Presentatio...
Spikey Workloads - AWS Symposium 2014 - Washington D.C. - Partner Presentatio...Spikey Workloads - AWS Symposium 2014 - Washington D.C. - Partner Presentatio...
Spikey Workloads - AWS Symposium 2014 - Washington D.C. - Partner Presentatio...
 
GIS on AWS Deep Dive - AWS Symposium 2014 - Washington D.C.
GIS on AWS Deep Dive - AWS Symposium 2014 - Washington D.C. GIS on AWS Deep Dive - AWS Symposium 2014 - Washington D.C.
GIS on AWS Deep Dive - AWS Symposium 2014 - Washington D.C.
 
Big Open Data Transformation Through Public Data Sets - AWS Washington D.C. S...
Big Open Data Transformation Through Public Data Sets - AWS Washington D.C. S...Big Open Data Transformation Through Public Data Sets - AWS Washington D.C. S...
Big Open Data Transformation Through Public Data Sets - AWS Washington D.C. S...
 
Transformational impact of cloud labor session1 062314v1
Transformational impact of cloud labor session1 062314v1Transformational impact of cloud labor session1 062314v1
Transformational impact of cloud labor session1 062314v1
 
AWS Public Sector Summit 2014 Talk - Science as a Service using AWS
AWS Public Sector Summit 2014 Talk - Science as a Service using AWSAWS Public Sector Summit 2014 Talk - Science as a Service using AWS
AWS Public Sector Summit 2014 Talk - Science as a Service using AWS
 
Running the Business of Education in the Cloud: How Central IT Leverages the ...
Running the Business of Education in the Cloud: How Central IT Leverages the ...Running the Business of Education in the Cloud: How Central IT Leverages the ...
Running the Business of Education in the Cloud: How Central IT Leverages the ...
 
Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...
Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...
Time to Science, Time to Results: Accelerating Research with AWS - AWS Sympos...
 
Bringing Governance to an Existing Cloud at NASA’s Jet Propulsion Laboratory ...
Bringing Governance to an Existing Cloud at NASA’s Jet Propulsion Laboratory ...Bringing Governance to an Existing Cloud at NASA’s Jet Propulsion Laboratory ...
Bringing Governance to an Existing Cloud at NASA’s Jet Propulsion Laboratory ...
 
Continuous Integration and Deployment Best Practices on AWS - AWS Symposium 2...
Continuous Integration and Deployment Best Practices on AWS - AWS Symposium 2...Continuous Integration and Deployment Best Practices on AWS - AWS Symposium 2...
Continuous Integration and Deployment Best Practices on AWS - AWS Symposium 2...
 
DevOps and Continuous Deployment @ WWPS Government, Education, and Non-profit...
DevOps and Continuous Deployment @ WWPS Government, Education, and Non-profit...DevOps and Continuous Deployment @ WWPS Government, Education, and Non-profit...
DevOps and Continuous Deployment @ WWPS Government, Education, and Non-profit...
 
AWS GovCloud (US) Fundamentals: Past, Present, and Future - AWS Symposium 201...
AWS GovCloud (US) Fundamentals: Past, Present, and Future - AWS Symposium 201...AWS GovCloud (US) Fundamentals: Past, Present, and Future - AWS Symposium 201...
AWS GovCloud (US) Fundamentals: Past, Present, and Future - AWS Symposium 201...
 
Welcome to the AWS Cloud - AWS Symposium 2014 - Washington D.C.
Welcome to the AWS Cloud - AWS Symposium 2014 - Washington D.C. Welcome to the AWS Cloud - AWS Symposium 2014 - Washington D.C.
Welcome to the AWS Cloud - AWS Symposium 2014 - Washington D.C.
 
Big Data on AWS - AWS Washington D.C. Symposium 2014
Big Data on AWS - AWS Washington D.C. Symposium 2014Big Data on AWS - AWS Washington D.C. Symposium 2014
Big Data on AWS - AWS Washington D.C. Symposium 2014
 
Adobe : The Future of SaaS
Adobe : The Future of SaaSAdobe : The Future of SaaS
Adobe : The Future of SaaS
 
Perspectives from the NIH Associate Director for Data Science (ADDS) Office
Perspectives from the NIH Associate Director for Data Science (ADDS) OfficePerspectives from the NIH Associate Director for Data Science (ADDS) Office
Perspectives from the NIH Associate Director for Data Science (ADDS) Office
 
Transforming Education in the Cloud
Transforming Education in the CloudTransforming Education in the Cloud
Transforming Education in the Cloud
 

Último

在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 

Último (20)

在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 

How Public Sector is using Mechanical Turk

  • 1. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Transformational Impact of Cloud Labor John Hoskins & Daniel Gray jhoskins@amazon.com djgray@amazon.com
  • 2. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 22 ][ How is Mechanical Turk impacting Business?
  • 3. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Forestry Service wants to provide real time online campsite booking • 350,000 individual campsites – exact location is unknown • Thousands of campgrounds with little or no POI data (bathroom? shower? Boat ramp?) • No concierge for a double booking
  • 4. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 US Copyright Office would like to provide internet access to CR data • Current data is contained exclusively on cards and microfilm • Scanning project is underway • No taxonomy for discovery “What would the internet be without a search engine?”
  • 5. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 5 Business Need[ ] 5 The FDA wants to provide instant access to product and drug recall and interaction information to better protect consumers.
  • 6. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Over 2 MILLION serious ADRs yearly • 100,000 DEATHS yearly • ADRs 4th leading cause of death ahead of pulmonary disease, diabetes, AIDS, pneumonia, accidents and automobile death Why[ ]
  • 7. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 7 Business Problem[ ] 7 Reports of interactions are delivered randomly and the current process to extract data from thousands of forms causes significant lag in its availability
  • 8. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Data can be received in multiple formats – forms, written and typed, email, electronic . . . • Data is subject to HIPAA privacy regulations. • Accuracy and response time are critical – budget constraint obvious 8 Challenge[ ] 8
  • 9. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Technology can shred the form into field level or below • OCR makes a pass at recognizing the data • Workers correct OCR. • Data from workers is reconstructed into digital input for the database • Data is made available through the API openFDA 9 Solution[ ] 9
  • 10. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 0 Business Need[ ] 10 A Government Defense contractor needs to update its natural language processing system to accommodate “internet speak”.
  • 11. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 1 Business Problem[ ] 11 Comments from the internet in the form of posts and tweets more closely resemble spoken language – while NLP is predicated on written language.
  • 12. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • NLP is involved in a mission critical defense system and is missing significant data due to inaccuracies. • Cross referencing spoken language to written language in Arabic is uniquely complex • Training requires millions of data points of ground truth 1 2 Challenge[ ] 12
  • 13. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Internet crawler scrapes posts with interesting key words and phrases. • Phrases are translated by 5 unique native Arabic speakers (5 dialects) with English as their second language • Each of the 5 phrases are corrected by English grammar experts • The five corrected phrases are voted on by a panel of 5 additional workers • The best phrase (highest score with least corrections) is sent to 5 native English speakers with Arabic as second language for translation • Each result is corrected by Arabic grammar experts and then voted on • Best result is fed into NLP with original phrase for learning 1 3 Solution[ ] 13
  • 14. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 4 Business Need[ ] 14 Army Research Labs needed to annotate verbs across many permutations against actual human actions to train robots to recognize
  • 15. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 5 Business Problem[ ] 15 The volume of data required placed significant delays on the project – yet accuracy was paramount to the results
  • 16. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Sample consisted of 100 different samples of 10 permutations of 35 verbs – 350,000 videos • At 20 seconds each that’s almost 2000 hours – a person year. • Project needed completion within 60 days 1 6 Challenge[ ] 16
  • 17. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Workers were given 50 videos per task and asked if the video represented a given verb permutation • Gold standard videos were included in each batch of 50 • Vote consisted of 2 workers with 100% Gold standard accuracy agreeing 1 7 Solution[ ] 17
  • 18. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Thank You http://www.mturk.com 18 John Hoskins, Amazon Mechanical Turk hoskins@amazon.com

Notas do Editor

  1. We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
  2. The current technology solution produced limitations that impacted our ability to support the business needs.
  3. We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
  4. The current technology solution produced limitations that impacted our ability to support the business needs.
  5. We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
  6. The current technology solution produced limitations that impacted our ability to support the business needs.