The document discusses crowdsourcing and provides examples of how it can be used. It covers crowd and client motivations, quality management, and using machine learning to scale tasks. Complex tasks can be broken into simpler human intelligence tasks using workflows. Reputation systems and intelligent design are important to address quality issues. Crowdsourcing can enable new collaboration and business models through social production.
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
Crowdsourcing Techniques and Best Practices
1. • Introduction
• Crowd Motivation
• Client Motivations and Types of tasks
• Scale up with Machine Learning
• Quality Management
• Workflows for Complex tasks
• Reputation Systems
• Economic shift
PWI - September 29, 2011
corina@waterloohills.com
http://bitsofknowledge.waterloohills.com
http://bitsofknowledge.waterloohills.com
2. Crowdsourcing Crowd or Community
(online audience)
1 2
3
4
http://bitsofknowledge.waterloohills.com
3. Ex: “Adult Websites” Classification
• Large number of sites to label
• Get people to look at sites and classify them as:
–G (general audience)
– PG (parental guidance)
–R (restricted)
–X (porn)
[Panos Ipeirotis. WWW2011 tutorial] http://bitsofknowledge.waterloohills.com
4. Ex: “Adult Websites” Classification
• Large number of hand‐labeled sites
• Get people to look at sites and classify them as:
–G (general audience)
– PG (parental guidance)
–R (restricted)
–X (porn)
Cost/Speed Statistics:
• Undergrad intern: 200 websites/hr, cost: $15/hr
• MTurk: 2500 websites/hr, cost: $12/hr
[Panos Ipeirotis. WWW2011 tutorial] http://bitsofknowledge.waterloohills.com
5. Crowd Motivation
• €,$ = Money!
• Self-serving purpose (learning new skills,
get recognition, avoid boredom, enjoyment,
create a network with other profesionals)
• Socializing, feeling of belonging to a
community, friendship
• Altruism (public good, help others)
http://bitsofknowledge.waterloohills.com
7. Crowd Demography
(background defines motivation)
• The 2008 survey at iStockphoto indicates that
the crowd is quite homogenous and elite.
• Amazon’s Mechanical Turk workers come
mainly from 2 countries:
a) USA
b) India
http://bitsofknowledge.waterloohills.com
9. Client motivation
• Need Suppliers:
Mass work, Distributed work, or just tedious work
Creative work
Look for specific talent
Testing
Support
To offload peak demands
Tackle problems that need specific communities
or human variety
Any work that can be done cheaper this way.
http://bitsofknowledge.waterloohills.com
10. Client motivation
• Need customers!
• Need Funding
• Need to be Backed up
• Crowdsourcing is your business!
http://bitsofknowledge.waterloohills.com
12. Client Tasks Goals
3 main goals for a task to be done:
1. Minimize Cost (cheap)
2. Minimize Completion Time (fast)
3. Maximize Quality (good)
Remember Crowd Motivation!
(ex.: Game-ify your task,
explain the final purpose)
http://bitsofknowledge.waterloohills.com
15. Pros
• Quicker: Parallellism reduces time
• Cheap
• Creativity, Innovation
• Quality (*depends)
• Access to scarce resources: The ‘long tail’
• Multiple feedback
• Allows to create a community (followers)
• Business Agility
• Scales up! (*up to a level)
http://bitsofknowledge.waterloohills.com
16. Cons
• Lack of professionalism: Unverified quality
• Too many answers
• No standards
• Not always cheap: Added costs to bring a
project to conclusion
• Too few participants if task or pay is not
attractive
• If worker is not motivated, lower quality of work
http://bitsofknowledge.waterloohills.com
17. Scale Up with Machine Learning
Build an ‘Adult Website’ Classifier
• Crowdsourcing is cheap but not free
- Workers cannot do more than xxhours/day,
Cannot scale to web without help
Build automatic classification models using
examples from crowdsourced data
http://bitsofknowledge.waterloohills.com
18. Integration with Machine Learning
• Humans label training data
• Use training data to build model
http://bitsofknowledge.waterloohills.com
19. Quality Management
Ex: “Adult Website” Classification
• Bad news: Spammers!
• Worker ATAMRO447HWJQ labeled
X (porn) sites as G (general audience)
[Panos Ipeirotis. WWW2011 tutorial] http://bitsofknowledge.waterloohills.com
20. Quality Management
Majority Voting and Label Quality
• Spammers try to go undetected
• Good willing workers may have bias
difficult to set apart.
1. Ask multiple labelers
2. Keep majority label as
“true” label
Use the probability of
being correct as the
Quality Indicator
http://bitsofknowledge.waterloohills.com
21. Complex tasks
Handle answers through workflow
• Q: “My task does not have discrete answers….”
• A: Break into two Human Intelligence Tasks (HITs):
– “Create” HIT
– “Vote” HIT
Vote controls quality of Creation HIT
• Redundancy controls quality of Voting HIT
http://bitsofknowledge.waterloohills.com
22. Collaboration: Photo description
But the free-form
answer can be more
complex, not just right or
wrong…
TurkIt toolkit [Little et al., UIST 2010]: http://groups.csail.mit.edu/uid/turkit/
http://bitsofknowledge.waterloohills.com
23. Collaboration: Description Versions
1. A partial view of a pocket calculator
together with some coins and a pen.
2. ...
3. A close‐up photograph of the following
items: A CASIO multi‐function
calculator. A ball point pen, uncapped.
Various coins, apparently European,
both copper and gold. Seems to be a
theme illustration for a brochure or
document cover treating finance,
probably personal finance.
4. …
8. A close‐up photograph of the following items: A CASIO
multi‐function, solar powered scientific calculator. A blue ball
point pen with a blue rubber grip and the tip extended. Six
British coins; two of £1value, three of 20p value and one of 1p
value. Seems to be a theme illustration for a brochure or
document cover treating finance ‐ probably personal finance.
http://bitsofknowledge.waterloohills.com
24. Collaboration
• Exploration / exploitation tradeoff
(Independence/or not)
– Can accelerate learning, by sharing good
solutions
– But can lead to premature convergence on
suboptimal solution
[Mason and Watts, submitted to Science, 2011]
http://bitsofknowledge.waterloohills.com
25. Collaboration: Positive
• Building iteratively allows better outcomes
for the image description task.
• In the FoldIt puzzles, workers built on each
other’s results. They recently found in 10
days the molecular structure of a protein-
cutting enzyme from an AIDS-like virus.
http://bitsofknowledge.waterloohills.com
26. Collaboration: Negative
Group Thinking Effect
• Individual search strategies affect group success:
Players copying each other
make less exploring
lower probability of finding
peak on a round
http://bitsofknowledge.waterloohills.com
27. Workflow Patterns
• Generate / Create
• Find
• Improve / Edit / Fix
Creation
• Vote for accept‐reject
• Vote up, vote down, to generate rank
• Vote for best / select top‐k
Quality Control
• Split task
• Aggregate Flow Control
• Iterate
Flow Control
http://bitsofknowledge.waterloohills.com
30. AdSafe Crowdsourcing Experience
•Detect pages that discuss swine flu
– Pharmaceutical firm had drug “treating” (off-label) swine flu
– FDA prohibited pharmaceuticals to display drug ad in
pages about swine flu
Two days to comply!
• Big fast-food chain does not want ad to appear:
– In pages that discuss the brand (99% negative sentiment)
– In pages discussing obesity
http://bitsofknowledge.waterloohills.com
31. Adsafe Crowdsourcing Experience
Workflow to classify URLs
• Find URLs for a given topic (hate speech, gambling, alcohol
abuse, guns, bombs, celebrity gossip, etc etc)
http://url‐collector.appspot.com/allTopics.jsp
• Classify URLs into appropriate categories
http://url‐annotator.appspot.com/AdminFiles/Categories.jsp
• Mesure quality of the labelers and remove spammers
http://qmturk.appspot.com/
• Get humans to “beat” the classifier by providing cases where
the classifier fails
http://adsafe‐beatthemachine.appspot.com/
http://bitsofknowledge.waterloohills.com
32. Crowdsourcing Aggregators
Act as Portals
• Create a crowd or community.
• Create a site to connect a client to the crowd
• Deal with workflow of complex tasks, like
decomposition into simpler tasks and answer
recomposition
• Works as Broker and Bank, Mediator
Allow anonymity
Consumers can benefit from a crowd without
the need to create it. http://bitsofknowledge.waterloohills.com
33. Market Design:
Crude vs Intelligent Crowdsourcing
• Intelligent Crowdsourcing uses an
organized workflow to tackle CONS of
crude crowdsourcing.
Complex task is divided by experts,
Given to relevant crowds, and not to
everyone
Individual answers are recomposed by
experts into general answer
http://bitsofknowledge.waterloohills.com
34. Lack of Reputation and
Market for Lemons
“When quality of sold good is uncertain and hidden before
transaction, prize goes to value of lowest valued good”
[Akerlof, 1970; Nobel prize winner]
• Market evolution steps:
1. Employers pays $10 to good worker, $0.1 to bad worker
2. 50% good workers, 50% bad; indistinguishable from
each other
3. Employer offers price in the middle: $5
4. Some good workers leave the market (pay too low)
5. Employer revised prices downwards as % of bad
increased
6. More good workers leave the market… death spiral
http://en.wikipedia.org/wiki/The_Market_for_Lemons
http://bitsofknowledge.waterloohills.com
35. Reputation systems
• Challenges:
- Insufficient participation
- Overwhelmingly positive feedback
+ Hoping to get a positive ranking in return
- Negative feedback avoided for fear of retaliation
- Dishonest reports
+ « Riddle for a PENNY! No shipping-Positive Feedback »
- « Bad-mouth » reports
• Incentive mechanisms to get honest feedback
- pay rater if report matches next;
- delay next transaction over time
http://bitsofknowledge.waterloohills.com
36. Reputation systems
• “Cheap pseudonyms”: easy to disappear and
reregister under a new identity with almost no cost.
[Friedman and Resnick 2001]
Introduce opportunities to misbehave without
paying reputational consequences.
Increase the difficulty of online identity changes
Impose upfront costs to new entrants: allow new
identities (forget the past) but make it costly.
• 2-sided Reputation Mechanisms
– Crowd: To ensure worker quality
– Employer: To ensure their trustworthiness
http://bitsofknowledge.waterloohills.com
37. Economical Shift
• From Social Networking to Social Production
through Collaborative Innovation
Mass-Collaboration changes how Products &
Services are Designed,Manufactured,Marketed
• Classical geo-political and economical organisations
do not correspond to new economy
Realignment of competitive advantages
Move towards Collaborative Enterprises based
on Open Infrastructure
http://bitsofknowledge.waterloohills.com
38. Societal Shift
Moral values Reinforcement
• Open data access makes actions Transparent
• Transparency makes people Accountable
• Accountability forces/fosters Integrity
• Integrity breeds Community Support
Link between Ethical values and ROI
http://bitsofknowledge.waterloohills.com
39. References
• Wikipedia,2011
• Dion Hinchcliffe Crowdsourcing: 5 Reasons Its Not Just For Start Ups
Anymore,2009
• Tomoko A. Hosaka, MSNBC. "Facebook asks users to translate for
free“,2008.
• Daren C. Brabham. "Moving the Crowd at iStockphoto: The Composition of
the Crowd and Motivations for Participation in a Crowdsourcing Application",
First Monday, 13(6),2008.
• Karim R. Lakhani, Lars Bo Jeppesen, Peter A. Lohse & Jill A. Panetta. The
value of openness in scientific problem solving (Harvard Business School
Working Paper No. 07-050),2007.
• Klaus-Peter Speidel How to Do Intelligent Crowdsourcing,2011
• Panos Ipeirotis. Managing Crowdsourced Human Computation,
WWW2011 tutorial,2011
• Omar Alonso & Matthew Lease. Crowdsourcing 101: Putting the WSDM of
Crowds to Work for You, WSDM Hong Kong 2011.
• Sanjoy Dasgupta,
http://videolectures.net/icml09_dasgupta_langford_actl/,2009
•Don Tapscott, Anthony Williams. Macrowikinomics, 2010.
http://bitsofknowledge.waterloohills.com
40. Call For Ideas:
If you have a large set of examples
or just an idea of application
for a program to classify or predict,
I would love to hear from you!
Questions?
corina@waterloohills.com
http://bitsofknowledge.waterloohills.com
PWI - September 29, 2011
http://bitsofknowledge.waterloohills.com