Crowdsourcing has emerged as a powerful paradigm for quality assessment and improvement of Linked Data. A major challenge of employing crowdsourcing, for quality assessment in Linked Data, is the cold-start problem: how to estimate the reliability of crowd workers and assign the most reliable workers to tasks? We address this challenge by
proposing a novel approach for generating test questions from DBpedia based on the topics associated with quality assessment tasks. These test questions are used to estimate the reliability of the new workers. Subsequently, the tasks are dynamically assigned to reliable workers to help improve the accuracy of collected responses. Our proposed approach, ACRyLIQ, is evaluated using workers hired from Amazon Mechanical Turk, on two real-world Linked Data datasets. We validate the proposed approach in terms of accuracy and compare it against the baseline approach of reliability estimate using gold-standard task. The results demonstrate that our proposed approach achieves high accuracy without using gold-standard tasks.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment
1. EKAW 2016
ACRyLIQ: Leveraging DBpedia for Adaptive
Crowdsourcing in Linked Data Quality Assessment
Umair ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, Jens Lehmann
2. Background
• Linked Data Quality Assessment
(LDQA)
– Incomplete, inaccurate,
inconsistent data in LOD
• Crowdsourcing LDQA
1. Generate Micro-tasks to
assess quality of Linked
Data dataset
2. Recruits crowd workers to
perform LDQA tasks
3. Update dataset based on
crowd answers
Zaveri, Amrapali, et al. "Quality assessment for linked data: A survey." Semantic Web 7.1 (2015): 63-93.
Acosta, Maribel, et al. "Crowdsourcing linked data quality assessment." International Semantic Web Conference. Springer Berlin Heidelberg, 2013.
2
Linked
Dataset
LDQA tasks Updates
Crowd
Workers
Answers
3. Research Challenge
• Workers have varying reliability and expertise depending on the
domain and topics of a datasets
3
Linked
Dataset
Crowdsourced
LDQA tasks
How can we estimate
the reliability of crowd
workers to achieve
high accuracy of LDQA
tasks though adaptive
task assignment?
6. Evaluation Methodology
Languages Interlinks
LDQA Tasks Verify language tags for
entities in LinkedSpending
dataset
Verify relationships
between entities as
generated by OAEI
Topics Chinese, English, French,
Japanese, Russian
Anatomy, Books,
Economics, Geography,
Nature
KBQs Verify language of Dbpedia
facts
Verify Dbpedia facts based
on SKOS relationships
No. of tasks 25 25
No. of KBQs 10 10
6
7. Evaluation Methodology
• Crowd Workers
– 60 workers from Amazon
Mechanical Turk
– $1.5 for 30 mins
– Provided answers to 10
KBQs and 25 tasks for both
datasets
– Diverse reliability on
Languages tasks
– Low reliability on Interlinks
tasks
7
10. Summary
• Strengths
– KBQs provide a quick and inexpensive method of estimating the
reliability and expertise of workers
– Our approach is particularly suited for complex and knowledge-
intensive tasks
• Limitations
– Assumption that LDQA tasks and KBQs are partitioned according to
same set of topics
– Assumption that the all facts in Dbpedia are correct
– Assumption that dataset topics are mutually exclusive
• Future work
– Scalability of the proposed approach needs to be validated
– Evaluate of wide range of tasks and datasets
10
11. Thank you
Umair Ul Hassan, Amrapali Zaveri, Edgard Marx, Edward Curry, and Jens
Lehmann. “ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in
Linked Data Quality Assessment”. In: 20th International Conference on
Knowledge Engineering and Knowledge Management. Springer
International Publishing. 2016
Questions:
umair.ulhassan@insight-centre.org