20240508 QFM014 Elixir Reading List April 2024.pdf
How Public Sector is using Mechanical Turk
1. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Transformational Impact
of Cloud Labor
John Hoskins & Daniel Gray
jhoskins@amazon.com
djgray@amazon.com
2. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
22
][ How is Mechanical Turk
impacting Business?
3. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Forestry Service wants to provide real time
online campsite booking
• 350,000 individual campsites – exact location
is unknown
• Thousands of campgrounds with little or no
POI data (bathroom? shower? Boat ramp?)
• No concierge for a double booking
4. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
US Copyright Office would like to provide
internet access to CR data
• Current data is contained exclusively on
cards and microfilm
• Scanning project is underway
• No taxonomy for discovery
“What would the internet be without a search
engine?”
5. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
5
Business Need[ ]
5
The FDA wants to provide instant access to
product and drug recall and interaction information
to better protect consumers.
6. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Over 2 MILLION serious ADRs yearly
• 100,000 DEATHS yearly
• ADRs 4th leading cause of death ahead of pulmonary
disease, diabetes, AIDS, pneumonia, accidents and
automobile death
Why[ ]
7. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
7
Business Problem[ ]
7
Reports of interactions are delivered randomly
and the current process to extract data from
thousands of forms causes significant lag in its
availability
8. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Data can be received in multiple formats – forms, written
and typed, email, electronic . . .
• Data is subject to HIPAA privacy regulations.
• Accuracy and response time are critical – budget constraint
obvious
8
Challenge[ ]
8
9. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Technology can shred the form into field level or below
• OCR makes a pass at recognizing the data
• Workers correct OCR.
• Data from workers is reconstructed into digital input for the
database
• Data is made available through the API openFDA
9
Solution[ ]
9
10. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
0
Business Need[ ]
10
A Government Defense contractor needs to
update its natural language processing system to
accommodate “internet speak”.
11. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
1
Business Problem[ ]
11
Comments from the internet in the form of posts
and tweets more closely resemble spoken
language – while NLP is predicated on written
language.
12. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• NLP is involved in a mission critical defense system and is
missing significant data due to inaccuracies.
• Cross referencing spoken language to written language in
Arabic is uniquely complex
• Training requires millions of data points of ground truth
1
2
Challenge[ ]
12
13. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Internet crawler scrapes posts with interesting key words and phrases.
• Phrases are translated by 5 unique native Arabic speakers (5 dialects) with
English as their second language
• Each of the 5 phrases are corrected by English grammar experts
• The five corrected phrases are voted on by a panel of 5 additional workers
• The best phrase (highest score with least corrections) is sent to 5 native
English speakers with Arabic as second language for translation
• Each result is corrected by Arabic grammar experts and then voted on
• Best result is fed into NLP with original phrase for learning
1
3
Solution[ ]
13
14. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
4
Business Need[ ]
14
Army Research Labs needed to annotate verbs
across many permutations against actual human
actions to train robots to recognize
15. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 20141
5
Business Problem[ ]
15
The volume of data required placed significant
delays on the project – yet accuracy was
paramount to the results
16. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Sample consisted of 100 different samples of 10
permutations of 35 verbs – 350,000 videos
• At 20 seconds each that’s almost 2000 hours – a person
year.
• Project needed completion within 60 days
1
6
Challenge[ ]
16
17. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Workers were given 50 videos per task and asked if the
video represented a given verb permutation
• Gold standard videos were included in each batch of 50
• Vote consisted of 2 workers with 100% Gold standard
accuracy agreeing
1
7
Solution[ ]
17
18. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank You
http://www.mturk.com
18
John Hoskins, Amazon Mechanical Turk
hoskins@amazon.com
Notas do Editor
We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
The current technology solution produced limitations that impacted our ability to support the business needs.
We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
The current technology solution produced limitations that impacted our ability to support the business needs.
We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
The current technology solution produced limitations that impacted our ability to support the business needs.