4.18.24 Movement Legacies, Reflection, and Review.pptx
TripleCheckMate
1. TripleCheckMate: A Tool for
Crowdsourcing the Quality
Assessment of Linked Data
Dimitris Kontokostas, Amrapali Zaveri,
Sören Auer and Jens Lehmann
KESW 2013 Oct 08, 2013
2. Outline
❏ Data Quality
❏ Data Quality Assessment Methodology
❏ Evaluation Methodology - Manual
❏ Phase I: Quality Problem Taxonomy
❏ Phase II: Crowdsourcing Quality Assessment
❏ TripleCheckMate
❏ Architecture
❏ Demo
❏ Conclusion & Future Work
2
3. Data Quality
● Data Quality (DQ) is defined as:
○ fitness for a certain use case*
● On the Data Web - varying quality of information
covering various domains
● High quality datasets
○ curated over decades - life science domain
○ crowdsourcing process - extracted from unstructured
and semi-structured information, e.g. DBpedia
* J. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.
3
4. Data Quality Assessment
Methodology
4 Step Methodology:
❏ Step 1: Resource selection
❏ Per Class
❏ Completely random
❏ Manual
❏ Step 2: Evaluation mode
selection
❏ Manual
❏ Semi-automatic
❏ Automatic
❏ Step 3: Resource evaluation
❏ Step 4: DQ improvement
❏ Direct
❏ Indirect
4
5. Evaluating Methodology - Manual
❏Phase I: Creation of quality problem
taxonomy
❏Phase II: Crowdsourcing quality
assessment
5
6. Phase I: Quality Problem Taxonomy
AZaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality assessment methodologies
for Linked Open Data: A Review. Under review, available at
http://www.semantic-webjournal.net/content/quality-assessment-methodologieslinked-open-data.
6
7. Phase II: Crowdsourcing
Quality Assessment
Crowdsourcing Our Approach
Type Human Intelligent Tasks
(HITs)
Contest-based
Participants Labor market Linked Data (LD) experts
Task Detect quality issues in
triples
Detect & classify quality issues in
resources
Reward Per tasks/triple Most no. of resources evaluated
Tool Amazon Mechanical
Turk, CrowdFlower etc.
TripleCheckMate
7
9. TripleCheckMate - Architecture (2/2)
● Built on Java / GWT
○ GWT compiles to native cross-browser HTML/JS
● Tomcat / Jetty & MySQL as minimal backend
○ store/retrieve evaluation data only
● Application logic is built on the client
○ SPARQL executed on client
○ Portable
9
10. Evaluation storage schema
● Designed to support multiple campaigns and
different ontologies
● Quality taxonomy is stored in the database
which makes it easy to adapt
10
12. Conclusion & Future Work
● TripleCheckMate
○ Tool for crowdsouring quality assessment
○ Linked Data quality assessment
○ Supports inter-rater agreement
○ Can be used with any Linked Dataset
● Future Work
○ Directly integrating semi-automatic methods
○ Improve efficiency of quality assessment
○ Include support for Patch Ontology* as output format
* M. Knuth, J. Hercher, and H. Sack. Collaboratively patching linked data. CoRR, 2012. 12