WebSci2013 Harnessing Disagreement in Crowdsourcing
1. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
gathering gold standard annotations for relation extraction
Crowd Truth
Harnessing Disagreement in
Crowdsourcing
2. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Gold Standard
Assumption
• typically in cognitive systems
• for each annotated instance there is a single right answer
• gold standard quality can be measured in inter-annotator
agreement
Let them disagree?
3. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Hypothesis
Annotator disagreement is not noise, but signal.
Not a problem to overcome but a source of information for machines
Artificially restricting humans does not help machines to learn.
They will learn better from diversity
4. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Position
disagreement is a sign of
intrinsic vagueness & ambiguity in human understanding
5. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Approach Principles
1. Tolerate, capture & exploit disagreement
2. Understand it by a space of possibilities (frequencies & similarities)
3. Score the machine output based on where it falls in this space
4. Adapt to new annotation tasks
6. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Relation Extraction
crowdsourcing gold standard data
Relations overlap in meaning
Sentences are vague and ambiguous
Experts have different interpretations
8. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Feeling the way the CHEST expands (PALPATION), can identify areas of
the lung that are full of fluid.
?PALPATIONIs CHEST related to
diagnose location associated
with
is_a otherpart_of
0 0 02 3 0 0 0 1 0 0 44 1
?CONJUNCTIVITISHYPERAEMIA related toIs
0 0 0 1 0 0 0 013 0 0 0 0 0
symptomcause
Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora)
of the eyes are symptoms common to all forms of CONJUNCTIVITIS.
10. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Harnessing Disagreement
• Sentence-relation score: core crowd truth metric for relation extraction, measured for each relation on
each sentence as the cosine of the unit vector for relation with sentence vector
• Sentence clarity: for each sentence - max relation score for that sentence. If all the workers selected the
same relation for a sentence, the max score is 1, indicating a clear sentence
• Relation similarity: pairwise conditional probability that if relation Ri is annotated in a sentence, Rj is as
well. Indicates how confusable the linguistic expression of two relations are
• Relation ambiguity: max relation similarity for a relation. If a relation is clear it has low score
• Relation clarity: max sentence-relation score for a relation over all sentences. If a relation has a high
clarity score, it means that it is at least possible to express the relation clearly
11. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
The Dark Side of Crowdsourcing
Disagreement
• spammers generate disagreement for the wrong reasons
• most spam detection requires gold standard
• Worker-sentence disagreement: the average of all the cosines between each
worker’s sentence vector and the full sentence vector (minus that worker).
Indicates how much a worker disagrees with the crowd on a sentence basis
• Worker-worker disagreement: a pairwise confusion matrix between workers
and the average agreement across the matrix for each worker. Indicates
whether there are consistently like-minded workers
12. Chris Welty Crowd Truth for Cognitive Computing Lora Aroyo
Questions?