SlideShare uma empresa Scribd logo
1 de 34
Microtask Crowdsourcing
Applications for Linked Data
Architecture of
Linked Data Applications
Presentation Tier
Logic Tier
Data Tier

Integrated
Dataset

Data Access
Component

Republication

Republication
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

SPARQL Wr.

Physical Wrapper

R2R Transf.

Cleansing

LD Wrapper

RDF/
XML
Web Data accessed via APIs

SPARQL
Endpoints

EUCLID – Microtask crowdsourcing
applications for Linked Data

Relational Data

Linked Data
2
Data Tier
Data Integration Component
Data Access
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

Cleansing

• Consolidates the data retrieved from heterogeneous sources.
• This component may operate at:
– Schema level: Performs vocabulary mappings in order to translate
data into a single unified schema. Links correspond to RDFS properties
CH 2
or OWL property and class axioms.
– Instance level: Performs entity linking, e.g., entity resolution via
owl:sameAs links
CH 3
EUCLID – Microtask crowdsourcing
applications for Linked Data

3
Data Tier (2)
Data Integration Component
Data Access
Component

Data Integration Component
Vocabulary
Mapping

Interlinking

Cleansing

The data integration component can be enhanced by including
microtask crowdsourcing apporaches:
• Cleansing or data assessments: Assessment of DBpedia triples
• Vocabulary mapping: CrowdMAP
• Interlinking: ZenCrowd
EUCLID – Microtask crowdsourcing
applications for Linked Data

4
Other Crowdsourcing-based
Solutions for Linked Data Tasks
• Query understanding: CrowdDQ

• Ontology population: OntoGame
• Linked Data curation: Urbanopoly
• …

EUCLID – Microtask crowdsourcing
applications for Linked Data

5
DBPEDIA QUALITY ASSESSMENT

EUCLID – Microtask crowdsourcing
applications for Linked Data
Assessing DBpedia Triples
Correct

{s p o .}
Dataset

{s p o .}
Incorrect +
Quality issue

1. Selecting LD quality issues generated by erroneous extraction
mechanisms and that can be detected by the crowd
2. Selecting the appropriate crowdsourcing approaches
3. Designing and generating the interfaces to present the data to the
crowd
EUCLID – Microtask crowdsourcing
applications for Linked Data
Selecting LD Quality
Issues to Crowdsource
Three categories of quality problems occur
pervasively in DBpedia [Zaveri2013]
and can be crowdsourced:
• Incorrect object
 Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.

• Incorrect data type
 Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en.

• Incorrect link to “external Web pages”
 Example: dbpedia:John-Two-Hawks dbpediaowl:wikiPageExternalLink
<http://cedarlakedvd.com/>

EUCLID – Microtask crowdsourcing
applications for Linked Data
Selecting Appropriate
Crowdsourcing Approaches
Verify

Find

Contest

Microtasks

LD Experts
Difficult task
Final prize

Workers
Easy task
Micropayments

TripleCheckMate

MTurk

[Kontoskostas2013]

Adapted from [Bernstein2010]
EUCLID – Microtask crowdsourcing
applications for Linked Data
Presenting the Data
to the Crowd
Microtask interfaces: MTurk tasks

Incorrect object

• Selection of foaf:name or
rdfs:label to extract humanreadable descriptions
• Real object values extracted
automatically from Wikipedia
infoboxes

Incorrect data type

• Link to the Wikipedia article via
foaf:isPrimaryTopicOf

Incorrect outlink

• Preview of external pages by
implementing HTML iframe
EUCLID – Microtask crowdsourcing
applications for Linked Data
Results
Object values

Data types

Interlinks

Linked Data
experts

0.7151

0.8270

0.1525

MTurk

0.8977

0.4752

0.9412

(majority voting)

• Both forms of crowdsourcing can be applied to detect
certain LD quality issues
• The effort of LD experts must be applied on those tasks
demanding specific-domain skills
• MTurk crowd are exceptionally good at performing
comparison of data entries
EUCLID – Microtask crowdsourcing
applications for Linked Data

11
ZENCROWD

EUCLID – Microtask crowdsourcing
applications for Linked Data
ZenCrowd: Entity Linking by
the Crowd

• Combine both algorithmic and manual linking
• Automate manual linking via crowdsourcing
• Dynamically assess human workers with a
probabilistic reasoning framework
Crowd

Machines
EUCLID – Microtask crowdsourcing
applications for Linked Data

Algorithms
13
http://dbpedia.org/resource/Facebook

HTML:
<p>Facebook is not waiting for its initial
public offering to make its first big
purchase.</p><p>In its largest
acquisition to date, the social network
has purchased Instagram, the popular
photo-sharing application, for about $1
billion in cash and stock, the company
said Monday.</p>

http://dbpedia.org/resource/Instagram
owl:sameAs

fbase:Instagram

Google

RDFa
enrichment

Android

<p><span
about="http://dbpedia.org/resource/Facebook"><cit
e property=”rdfs:label">Facebook</cite> is not
waiting for its initial public offering to make its first
big purchase.</span></p><p><span
about="http://dbpedia.org/resource/Instagram">In
its largest acquisition to date, the social network has
purchased <cite
property=”rdfs:label">Instagram</cite> , the popular
photo-sharing application, for about $1 billion in cash
and stock, the company said Monday.</span></p>

EUCLID – Microtask crowdsourcing
applications for Linked Data

14
ZenCrowd Architecture
HTML
Pages

Input

Z enCrowd

Micro
Matching
Tasks

MicroTask Manager

Entity
Extractors

Crowdsourcing
Platform

HTML+ RDFa
Pages
Output

Algorithmic
Matchers

Decision Engine
Probabilistic
Network

LOD Index Get Entity

Workers Decisions

LOD Open Data Cloud

Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic
Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on
World Wide Web (WWW 2012).
EUCLID – Microtask crowdsourcing
applications for Linked Data

15
Entity Factor Graphs
• Graph components

pw1( )

w1

– Workers, links, clicks
Observed
variables
– Prior probabilities
c11
c21
– Link Factors
Link
– Constraints
factors

w2

c12

lf1( )

• Probabilistic
Inference

SameAs
l1
constraints

c22

c13

lf2( )

sa1-2( )

pl1( )

– Select all links with
posterior prob >τ

Worker
priors

pw2( )

l2
pl2( )

c23
lf3( )

u2-3( )

l3

Dataset
Unicity
constraints

pl3( )

Link priors
2 workers, 6 clicks, 3 candidate links

EUCLID – Microtask crowdsourcing
applications for Linked Data

16
Lessons Learnt
• Crowdsourcing + Prob reasoning works!
• But
– Different worker communities perform differently
– Many low quality workers
– Completion time may vary (based on reward)

• Need to find the right workers for your task
(see WWW13 paper)

EUCLID – Microtask crowdsourcing
applications for Linked Data

17
ZenCrowd Summary
• ZenCrowd: Probabilistic reasoning over automatic and
crowdsourcing methods for entity linking
• Standard crowdsourcing improves 6% over automatic
• 4% - 35% improvement over standard crowdsourcing
• 14% average improvement over automatic approaches

http://exascale.info/zencrowd/
• Follow up-work (VLDBJ):
– Also used for instance matching across datasets
– 3-way blocking with the crowd
EUCLID – Microtask crowdsourcing
applications for Linked Data

18
CROWDQ – CROWD-POWERED
QUERY UNDERSTANDING
EUCLID – Microtask crowdsourcing
applications for Linked Data
Motivation
• Web Search Engines can answer simple factual
queries directly on the result page
• Users with complex information needs are
often unsatisfied
• Purely automatic techniques are not enough
• We want to solve it with Crowdsourcing!

EUCLID – Microtask crowdsourcing
applications for Linked Data

20
CrowdQ
• CrowdQ is the first system that uses
crowdsourcing to
– Understand the intended meaning
– Build a structured query template
– Answer the query over Linked Open Data

Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ:
Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems
Research (CIDR 2013).
EUCLID – Microtask crowdsourcing
applications for Linked Data

21
22
CrowdQ Architecture
Off-line: query template generation with the help of the crowd
On-line: query template matching using NLP and search over open data
Keyword Query

On#
line'Complex'Query
Processing
Complex
query
classifier

User

Y

Off#
line'Complex'Query
Decomposition
query

POS + NER tagging
N

N

Structured Query

Vetrical
selection,
Unstructured
Search, ...

Crowd
Manager

Match with existing Queries Templ +
Answer Types
query templates

t1
t2

t3

Template Generation

Answer
Composition

Query Template Index

SERP

Query
Log

Structured
LOD Search

Crowdsourcing
Platform

Result Joiner

23
LOD Open Data Cloud
Hybrid Human-Machine
Pipeline
Q= birthdate of actors of forrest gump
Query annotation

Noun

Noun

Named entity

Verification

Is forrest gump this entity in the query?

Entity Relations

Which is the relation between: actors and forrest gump

Schema element

Starring

Verification

Is the relation between:
Indiana Jones – Harrison Ford
Back to the Future – Michael J. Fox
of the same type as
Forrest Gump – actors

starring

<dbpedia-owl:starring>

EUCLID – Microtask crowdsourcing
applications for Linked Data

24
Structured query generation
Q= birthdate of actors of forrest gump
SELECT ?y ?x
WHERE { ?y <dbpedia-owl:birthdate> ?x .
?z <dbpedia-owl:starring> ?y .
?z <rdfs:label> ‘Forrest Gump’ }

Results from BTC09:

EUCLID – Microtask crowdsourcing
applications for Linked Data

25
CROWDMAP & OTHERS

EUCLID – Microtask crowdsourcing
applications for Linked Data
CrowdMAP
• Experiments using MTurk, CrowdFlower and established benchmarks
• Enhancing the results of automatic techniques
• Fast, accurate, cost-effective
[Sarasua, Simperl, Noy, ISWC2012]

CartP
301-304

100R50P
Edas-Iasted

100R50P
Ekaw-Iasted

100R50P
Cmt-Ekaw

100R50P
ConfOf-Ekaw

Imp
301-304

PRECISION

0.53

0.8

1.0

1.0

0.93

0.73

RECALL

1.0

0.42

0.7

0.75

0.65

1.0

27
Taste IT! Try IT!
•
•
•
•

Restaurant review Android app developed in the Insemtives project
Uses Dbpedia concepts to generate structured reviews
Uses mechanism design/gamification to configure incentives
User study
–

2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts

2500
2000
1500
1000
500
0
CAFE

FASTFOOD

PUB

RESTAURANT

Numer of reviews

Number of semantic annotations (type of cuisine)
Number of semantic annotations (dishes)

https://play.google.com/store/apps/details?id=insemtives.android&hl=en
11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

28
LODrefine

http://research.zemanta.com/crowds-to-the-rescue/
11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

29
Ontology Population

11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

30
Linked Data Curation

EUCLID – Microtask crowdsourcing
applications for Linked Data

31
Problems and Challenges
•

What is feasible and how can tasks be optimally translated into microtasks?
– Examples: data quality assessment for technical and contextual features; subjective vs
objective tasks (also in modeling); open-ended questions

•

What to show to users
– Natural language descriptions of Linked Data/SPARQL
– How much context
– What form of rendering
– How about links?

•

How to combine with automatic tools
–

Which results to validate
•
•

•

Low precision (no fun for gamers...)
Low recall (vs all possible questions)

How to embed it into an existing application
– Tasks are fine granular, perceived as additional burden to the actual functionality

•

What to do with the resulting data?
– Integration into existing practices
– Vocabularies!

11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

32
Web site:
https://sites.google.com/site/microtasktutorial/
SLIDES and EXERCISES:
https://github.com/maribelacosta/crowdsourcingtutorial

Full-day tutorial ISWC2013
Sydney Australia
11/11/2013

EUCLID – Microtask crowdsourcing
applications for Linked Data

33
For exercises, quiz and further material visit our website:

http://www.euclid-project.eu

Course

eBook

Other channels:

@euclid_project

euclidproject
EUCLID – Microtask crowdsourcing
applications for Linked Data

euclidproject
34

Mais conteúdo relacionado

Mais procurados

euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
National Information Standards Organization (NISO)
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
eswcsummerschool
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 

Mais procurados (20)

Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research DataNISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
Introduction to W3C Linked Data Platform
Introduction to W3C Linked Data PlatformIntroduction to W3C Linked Data Platform
Introduction to W3C Linked Data Platform
 
Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examples
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 

Destaque

Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
EUCLID project
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
EUCLID project
 
Speech Technology and Big Data
Speech Technology and Big DataSpeech Technology and Big Data
Speech Technology and Big Data
EUCLID project
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Maribel Acosta Deibe
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
SlideShare
 

Destaque (17)

Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Conference Live: Accessible and Sociable Conference Semantic Data
Conference Live: Accessible and Sociable Conference Semantic DataConference Live: Accessible and Sociable Conference Semantic Data
Conference Live: Accessible and Sociable Conference Semantic Data
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
 
CrowdSem 2013 Workshop @ISWC2013
CrowdSem 2013 Workshop @ISWC2013CrowdSem 2013 Workshop @ISWC2013
CrowdSem 2013 Workshop @ISWC2013
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
 
Speech Technology and Big Data
Speech Technology and Big DataSpeech Technology and Big Data
Speech Technology and Big Data
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Data Science Curriculum for Professionals
Data Science Curriculum for ProfessionalsData Science Curriculum for Professionals
Data Science Curriculum for Professionals
 
Mapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataMapping Relational Databases to Linked Data
Mapping Relational Databases to Linked Data
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Comment manager des geeks - Devoxx 2015
Comment manager des geeks - Devoxx 2015Comment manager des geeks - Devoxx 2015
Comment manager des geeks - Devoxx 2015
 
Annotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVMAnnotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVM
 
Building and managing a research team %281%29
Building and managing a research team %281%29Building and managing a research team %281%29
Building and managing a research team %281%29
 
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUGConférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 

Semelhante a Microtask Crowdsourcing Applications for Linked Data

Entity centric data_management_2013
Entity centric data_management_2013Entity centric data_management_2013
Entity centric data_management_2013
eXascale Infolab
 
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
eswcsummerschool
 
2011 07 14_fractalperspective
2011 07 14_fractalperspective2011 07 14_fractalperspective
2011 07 14_fractalperspective
Curran Kelleher
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 

Semelhante a Microtask Crowdsourcing Applications for Linked Data (20)

Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Enabling Citizen-empowered Apps over Linked Data
Enabling Citizen-empowered Apps over Linked DataEnabling Citizen-empowered Apps over Linked Data
Enabling Citizen-empowered Apps over Linked Data
 
Entity centric data_management_2013
Entity centric data_management_2013Entity centric data_management_2013
Entity centric data_management_2013
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Human Computation for Big Data
Human Computation for Big DataHuman Computation for Big Data
Human Computation for Big Data
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
 
Introduction to Spark Training
Introduction to Spark TrainingIntroduction to Spark Training
Introduction to Spark Training
 
Poster
PosterPoster
Poster
 
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
 
2011 07 14_fractalperspective
2011 07 14_fractalperspective2011 07 14_fractalperspective
2011 07 14_fractalperspective
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
Koneksys - Offering Services to Connect Data using the Data Web
Koneksys - Offering Services to Connect Data using the Data WebKoneksys - Offering Services to Connect Data using the Data Web
Koneksys - Offering Services to Connect Data using the Data Web
 

Último

Último (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Microtask Crowdsourcing Applications for Linked Data

  • 2. Architecture of Linked Data Applications Presentation Tier Logic Tier Data Tier Integrated Dataset Data Access Component Republication Republication Component Data Integration Component Vocabulary Mapping Interlinking SPARQL Wr. Physical Wrapper R2R Transf. Cleansing LD Wrapper RDF/ XML Web Data accessed via APIs SPARQL Endpoints EUCLID – Microtask crowdsourcing applications for Linked Data Relational Data Linked Data 2
  • 3. Data Tier Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing • Consolidates the data retrieved from heterogeneous sources. • This component may operate at: – Schema level: Performs vocabulary mappings in order to translate data into a single unified schema. Links correspond to RDFS properties CH 2 or OWL property and class axioms. – Instance level: Performs entity linking, e.g., entity resolution via owl:sameAs links CH 3 EUCLID – Microtask crowdsourcing applications for Linked Data 3
  • 4. Data Tier (2) Data Integration Component Data Access Component Data Integration Component Vocabulary Mapping Interlinking Cleansing The data integration component can be enhanced by including microtask crowdsourcing apporaches: • Cleansing or data assessments: Assessment of DBpedia triples • Vocabulary mapping: CrowdMAP • Interlinking: ZenCrowd EUCLID – Microtask crowdsourcing applications for Linked Data 4
  • 5. Other Crowdsourcing-based Solutions for Linked Data Tasks • Query understanding: CrowdDQ • Ontology population: OntoGame • Linked Data curation: Urbanopoly • … EUCLID – Microtask crowdsourcing applications for Linked Data 5
  • 6. DBPEDIA QUALITY ASSESSMENT EUCLID – Microtask crowdsourcing applications for Linked Data
  • 7. Assessing DBpedia Triples Correct {s p o .} Dataset {s p o .} Incorrect + Quality issue 1. Selecting LD quality issues generated by erroneous extraction mechanisms and that can be detected by the crowd 2. Selecting the appropriate crowdsourcing approaches 3. Designing and generating the interfaces to present the data to the crowd EUCLID – Microtask crowdsourcing applications for Linked Data
  • 8. Selecting LD Quality Issues to Crowdsource Three categories of quality problems occur pervasively in DBpedia [Zaveri2013] and can be crowdsourced: • Incorrect object  Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”. • Incorrect data type  Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島”@en. • Incorrect link to “external Web pages”  Example: dbpedia:John-Two-Hawks dbpediaowl:wikiPageExternalLink <http://cedarlakedvd.com/> EUCLID – Microtask crowdsourcing applications for Linked Data
  • 9. Selecting Appropriate Crowdsourcing Approaches Verify Find Contest Microtasks LD Experts Difficult task Final prize Workers Easy task Micropayments TripleCheckMate MTurk [Kontoskostas2013] Adapted from [Bernstein2010] EUCLID – Microtask crowdsourcing applications for Linked Data
  • 10. Presenting the Data to the Crowd Microtask interfaces: MTurk tasks Incorrect object • Selection of foaf:name or rdfs:label to extract humanreadable descriptions • Real object values extracted automatically from Wikipedia infoboxes Incorrect data type • Link to the Wikipedia article via foaf:isPrimaryTopicOf Incorrect outlink • Preview of external pages by implementing HTML iframe EUCLID – Microtask crowdsourcing applications for Linked Data
  • 11. Results Object values Data types Interlinks Linked Data experts 0.7151 0.8270 0.1525 MTurk 0.8977 0.4752 0.9412 (majority voting) • Both forms of crowdsourcing can be applied to detect certain LD quality issues • The effort of LD experts must be applied on those tasks demanding specific-domain skills • MTurk crowd are exceptionally good at performing comparison of data entries EUCLID – Microtask crowdsourcing applications for Linked Data 11
  • 12. ZENCROWD EUCLID – Microtask crowdsourcing applications for Linked Data
  • 13. ZenCrowd: Entity Linking by the Crowd • Combine both algorithmic and manual linking • Automate manual linking via crowdsourcing • Dynamically assess human workers with a probabilistic reasoning framework Crowd Machines EUCLID – Microtask crowdsourcing applications for Linked Data Algorithms 13
  • 14. http://dbpedia.org/resource/Facebook HTML: <p>Facebook is not waiting for its initial public offering to make its first big purchase.</p><p>In its largest acquisition to date, the social network has purchased Instagram, the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</p> http://dbpedia.org/resource/Instagram owl:sameAs fbase:Instagram Google RDFa enrichment Android <p><span about="http://dbpedia.org/resource/Facebook"><cit e property=”rdfs:label">Facebook</cite> is not waiting for its initial public offering to make its first big purchase.</span></p><p><span about="http://dbpedia.org/resource/Instagram">In its largest acquisition to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</span></p> EUCLID – Microtask crowdsourcing applications for Linked Data 14
  • 15. ZenCrowd Architecture HTML Pages Input Z enCrowd Micro Matching Tasks MicroTask Manager Entity Extractors Crowdsourcing Platform HTML+ RDFa Pages Output Algorithmic Matchers Decision Engine Probabilistic Network LOD Index Get Entity Workers Decisions LOD Open Data Cloud Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on World Wide Web (WWW 2012). EUCLID – Microtask crowdsourcing applications for Linked Data 15
  • 16. Entity Factor Graphs • Graph components pw1( ) w1 – Workers, links, clicks Observed variables – Prior probabilities c11 c21 – Link Factors Link – Constraints factors w2 c12 lf1( ) • Probabilistic Inference SameAs l1 constraints c22 c13 lf2( ) sa1-2( ) pl1( ) – Select all links with posterior prob >τ Worker priors pw2( ) l2 pl2( ) c23 lf3( ) u2-3( ) l3 Dataset Unicity constraints pl3( ) Link priors 2 workers, 6 clicks, 3 candidate links EUCLID – Microtask crowdsourcing applications for Linked Data 16
  • 17. Lessons Learnt • Crowdsourcing + Prob reasoning works! • But – Different worker communities perform differently – Many low quality workers – Completion time may vary (based on reward) • Need to find the right workers for your task (see WWW13 paper) EUCLID – Microtask crowdsourcing applications for Linked Data 17
  • 18. ZenCrowd Summary • ZenCrowd: Probabilistic reasoning over automatic and crowdsourcing methods for entity linking • Standard crowdsourcing improves 6% over automatic • 4% - 35% improvement over standard crowdsourcing • 14% average improvement over automatic approaches http://exascale.info/zencrowd/ • Follow up-work (VLDBJ): – Also used for instance matching across datasets – 3-way blocking with the crowd EUCLID – Microtask crowdsourcing applications for Linked Data 18
  • 19. CROWDQ – CROWD-POWERED QUERY UNDERSTANDING EUCLID – Microtask crowdsourcing applications for Linked Data
  • 20. Motivation • Web Search Engines can answer simple factual queries directly on the result page • Users with complex information needs are often unsatisfied • Purely automatic techniques are not enough • We want to solve it with Crowdsourcing! EUCLID – Microtask crowdsourcing applications for Linked Data 20
  • 21. CrowdQ • CrowdQ is the first system that uses crowdsourcing to – Understand the intended meaning – Build a structured query template – Answer the query over Linked Open Data Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013). EUCLID – Microtask crowdsourcing applications for Linked Data 21
  • 22. 22
  • 23. CrowdQ Architecture Off-line: query template generation with the help of the crowd On-line: query template matching using NLP and search over open data Keyword Query On# line'Complex'Query Processing Complex query classifier User Y Off# line'Complex'Query Decomposition query POS + NER tagging N N Structured Query Vetrical selection, Unstructured Search, ... Crowd Manager Match with existing Queries Templ + Answer Types query templates t1 t2 t3 Template Generation Answer Composition Query Template Index SERP Query Log Structured LOD Search Crowdsourcing Platform Result Joiner 23 LOD Open Data Cloud
  • 24. Hybrid Human-Machine Pipeline Q= birthdate of actors of forrest gump Query annotation Noun Noun Named entity Verification Is forrest gump this entity in the query? Entity Relations Which is the relation between: actors and forrest gump Schema element Starring Verification Is the relation between: Indiana Jones – Harrison Ford Back to the Future – Michael J. Fox of the same type as Forrest Gump – actors starring <dbpedia-owl:starring> EUCLID – Microtask crowdsourcing applications for Linked Data 24
  • 25. Structured query generation Q= birthdate of actors of forrest gump SELECT ?y ?x WHERE { ?y <dbpedia-owl:birthdate> ?x . ?z <dbpedia-owl:starring> ?y . ?z <rdfs:label> ‘Forrest Gump’ } Results from BTC09: EUCLID – Microtask crowdsourcing applications for Linked Data 25
  • 26. CROWDMAP & OTHERS EUCLID – Microtask crowdsourcing applications for Linked Data
  • 27. CrowdMAP • Experiments using MTurk, CrowdFlower and established benchmarks • Enhancing the results of automatic techniques • Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012] CartP 301-304 100R50P Edas-Iasted 100R50P Ekaw-Iasted 100R50P Cmt-Ekaw 100R50P ConfOf-Ekaw Imp 301-304 PRECISION 0.53 0.8 1.0 1.0 0.93 0.73 RECALL 1.0 0.42 0.7 0.75 0.65 1.0 27
  • 28. Taste IT! Try IT! • • • • Restaurant review Android app developed in the Insemtives project Uses Dbpedia concepts to generate structured reviews Uses mechanism design/gamification to configure incentives User study – 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts 2500 2000 1500 1000 500 0 CAFE FASTFOOD PUB RESTAURANT Numer of reviews Number of semantic annotations (type of cuisine) Number of semantic annotations (dishes) https://play.google.com/store/apps/details?id=insemtives.android&hl=en 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 28
  • 30. Ontology Population 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 30
  • 31. Linked Data Curation EUCLID – Microtask crowdsourcing applications for Linked Data 31
  • 32. Problems and Challenges • What is feasible and how can tasks be optimally translated into microtasks? – Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions • What to show to users – Natural language descriptions of Linked Data/SPARQL – How much context – What form of rendering – How about links? • How to combine with automatic tools – Which results to validate • • • Low precision (no fun for gamers...) Low recall (vs all possible questions) How to embed it into an existing application – Tasks are fine granular, perceived as additional burden to the actual functionality • What to do with the resulting data? – Integration into existing practices – Vocabularies! 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 32
  • 33. Web site: https://sites.google.com/site/microtasktutorial/ SLIDES and EXERCISES: https://github.com/maribelacosta/crowdsourcingtutorial Full-day tutorial ISWC2013 Sydney Australia 11/11/2013 EUCLID – Microtask crowdsourcing applications for Linked Data 33
  • 34. For exercises, quiz and further material visit our website: http://www.euclid-project.eu Course eBook Other channels: @euclid_project euclidproject EUCLID – Microtask crowdsourcing applications for Linked Data euclidproject 34

Notas do Editor

  1. What quality issues are humans able to detect?
  2. embarrassingly parallelizable