CrowdED: Guideline for optimal Crowdsourcing Experimental Design
1. CrowdED: Guideline for
Optimal Crowdsourcing
Amrapali Zaveri, Pedro Hernandez Serrano, Manisha
Desai, Michel Dumontier
HumL@WWW2018 @AmrapaliZ 24 April, 20181
2. Crowdsourcing Tasks
❖ Tasks based on human skills
not yet replicable by machines
❖ Highly parallelizable tasks
❖ Every human (worker) must
be provided with a monetary
reward for an answer
❖ Consolidated answers solve
scientific problems
!2
8. Related Studies
!8
Adaptive Model
Active Learning
KB Test Questions
Self Assessment
Cost-Time
&
Cost-Quality
Optimization
CrowdED
CrowdED offers a two-staged statistical model to estimate a-priori worker
and task assignment to achieve maximum accuracy.
9. Stage 1:
• Train all
workers
• On a proportion
of tasks
• Identify best
workers &
• Hard tasks
2 Stages
!9
!
Stage 2:
• Assign best
workers to
• Hard tasks
• Remaining tasks
• Calculate
Overall
Accuracy
!
11. Assign Tasks to Workers
!
Stage 1
Easy Hard
Good Poor
Workers
Tasks
Task
Label
Truth
1 1 hard_task age
1 2 hard_task age
1 3 hard_task age
1 4 easy_task age
1 5 easy_task age
Simulate
Odd no.
Proportion
of tasks to train
!11
Worker
Label
Truth
1 1 good_worker age
2 1 poor_worker age
3 1 good_worker age
4 1 good_worker age
5 1 poor_worker age
Workerview
Taskview
12. Calculate Worker Accuracy
& Task Difficulty
!12
Task
Label
Truth
Task
Difficulty
1 1 hard_task age 0.54
1 2 hard_task age 0.42
1 3 hard_task age 0.45
1 4 easy_task age 0.80
1 5 easy_task age 0.70
Worker
Label
Truth
Worker
Accuracy
1 1 good_worker age 0.75
2 1 poor_worker age 0.58
3 1 good_worker age 0.78
4 1 good_worker age 0.95
5 1 poor_worker age 0.54
Workerview
Taskview
13. Simulate Worker Answer
!13
Task
Label
Truth
Task
Difficulty
Worker
Answer
1 1 hard_task age 0.54 age
1 2 hard_task age 0.42 tissue
1 3 hard_task age 0.45 disease
1 4 easy_task age 0.80 age
1 5 easy_task age 0.70 age
Worker
Label
Truth
Worker
Accuracy
Worker
Answer
1 1 good_worker age 0.75 age
2 1 poor_worker age 0.58 age
3 1 good_worker age 0.78 age
4 1 good_worker age 0.95 tissue
5 1 poor_worker age 0.54 age
!13
Workerview
Taskview
14. Calculate Worker
Performance
Avg. proportion of times a
worker is in agreement with other
workers for a given task
vs.
all tasks performed by the worker
Range
[0…1]
Threshold
identify
!
Easy Hard
Good Poor
!14
15. Easy Tasks
!15
Hard Tasks!
Worker
Label
Truth
Worker
Accuracy
Worker
Answer
1 1 good_worker age 0.75 age
2 1 poor_worker age 0.58 age
3 1 good_worker age 0.78 age
4 1 good_worker age 0.95 tissue
5 1 poor_worker age 0.54 age
Worker
Label
Truth
Worker
Accuracy
Worker
Answer
2 2 good_worker age 0.75 treatment
3 2 poor_worker age 0.58 disease
15 2 good_worker age 0.78 age
17 2 poor_worker age 0.95 tissue
20 2 poor_worker age 0.54
Taskview
Taskview
16. Stage 1:
• Train all
workers
• On a proportion
of tasks
• Identify best
workers &
• Hard tasks
2 Stages
!16
!
Stage 2:
• Assign best
workers to
• Hard tasks &
• Remaining tasks
• Calculate
Overall
Accuracy
!
18. Simulate Worker Answer
Stage 2
!
Hard
Good
simulate
Remaining
Tasks
!18
Task
Label
Truth
Task
Difficulty
Worker
Answer
1 1 hard_task age 0.54 age
1 2 hard_task age 0.42 tissue
1 3 hard_task age 0.45 disease
1 4 easy_task age 0.80 age
1 5 easy_task age 0.70 age
Workerview
19. Merge Stage 1 and 2
& Assign Answers
!19
Worker
Label
Truth
Worker
Accuracy
Worker
Answer
1 1 good_worker age 0.75 age
2 1 poor_worker age 0.58 age
3 1 good_worker age 0.78 age
4 1 good_worker age 0.95 tissue
5 1 poor_worker age 0.54 age
Taskview
Answer = age
20. Assessing Design
Merged Dataset
calculate
!20
Overall Accuracy
avg. of all the tasks
which had consensus
Worker
Label
Truth
Worker
Accuracy
Worker
Answer
1 1 good_worker age 0.75 age
2 1 poor_worker age 0.58 age
3 1 good_worker age 0.78 age
4 1 good_worker age 0.95 tissue
5 1 poor_worker age 0.54 age
Taskview
22. • Results support the
intuition that reduced
difficulty (10%) in tasks
result in higher
accuracy
!22
23. • calculating the
performance of the
workers in combination
with whether she was a
good worker (from the
beginning) ensures that
she is the best worker
• adopting the two-
staged algorithm
ensures that only the
best workers are chosen
to perform all the tasks
!23
25. CrowdED recommendation
• no. of workers should be 40-60% of the total number
of tasks
• train workers on 40-60% of the tasks in Stage 1
• set the number of workers per task to be either 3, 5 or
7 (fewer than 9)
• reduce the number of hard tasks
• adopt the two-staged algorithm to identify the best
workers
!25
27. Conclusion & Future Work
• Two-staged statistical design for designing optimal crowdsourcing experiments
• a-priori estimate optimal workers and tasks' assignment to obtain maximum
accuracy on all tasks
• Implemented in Python, open source, Jupyter notebook
• Future work
• Training the workers vs. not training
• Real-world experiments and comparison with baseline approaches
• Include budgetary constraints
• Extend the interface to allow user to vary parameters and observe sensitivity the
design is to various assumptions
!27