SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Efficient, Scalable, and
Provenance-Aware
Management
of Linked Data
Marcin Wylot
University of Fribourg, June 19, 2015
Doctoral advisor
Prof. Dr Philippe Cudré-Mauroux
Quiz
3
➢ Linked Data
➢ Big Data (3Vs)
➢ Semantic Web
➢ RDF
➢ Data Provenance
Life is about Links
4
Web of Documents
5
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
HTML
API
HTML
HTML
HTML
XML
➢ links between documents
➢ no structure, no semantics
➢ for human consumption
➢ cannot be processed by computers
6
silos gonna silo
What’s the Problem?
7
Web of Data
8
Web of Linked Data
9
thing
thing
thing
thing
thing
thing
thing
links define relationships between things
➢ a global database
➢ design for machines
➢ links between things
➢ explicit semantics
How Does it Change our Life?
10
➢ a company is moving to another city
○ aggregated information on taxes, prices, salaries,
unemployment, climat
➢ I’m moving to another country
○ mobile providers, bank accounts
➢ you’re buying a house
○ crime statistics, weather, house prices, neighbourhood,
traffic information
➢ media annotation
○ video annotated with Linked Data retrieving always an
actual bio of the speaker
Linked Geo Data
11
Clean Air Status and Trends - Ozone
12
LSM: Live Linked Data
13
swissbib
14
15
16
Linked Open Data
17
9960 data collections from independent contributors
Linked Data
12-05-30 18
➢ describes a method of publishing semi-structured data in
the Web
➢ data can be interlinked
➢ builds upon standard Web technologies
➢ extends the standards to share information in a way that
it can be read automatically by computers
Big data is high-volume, high-velocity and
high-variety information assets that demand
cost-effective, innovative forms of information
processing for enhanced insight and decision
making.
Big Data
19
Linked Data is Big Data
12-05-30 20
➢ Volume: data size growing exponentially
➢ Velocity: streams of data from the Internet
of Things Cloud
➢ Variety
○ semi-structured data
○ heterogenous linked collections of data
Data Integration
➢ Integrated and summarized data
➢ Necessity to established transparency
21
Data Provenance
“Provenance is information about
entities, activities, and people involved
in producing a piece of data or thing, which can be used to form
assessments about its quality, reliability or trustworthiness.”
Which pieces of data and how they were
combined to produce the results?
22
New Challenges
unstructuredness
AND
heterogeneity
23
How to efficiently store and query vast amounts
of Linked Data in the cloud?
24
➢ a new physiological data partitioning algorithm to
efficiently and effectively partition the graph and co-
locate related instances in partitions
➢ a new system architecture for handling fine-grained
partitions at scale
➢ novel data placement techniques to co-locate
semantically related pieces of data
➢ new data loading and query execution strategies
taking advantage of our system’s data partitions and
indices
How to store and track provenance in Linked
Data processing?
25
➢ a new way to express the provenance of query
results at two different granularity levels by leveraging
the concept of provenance polynomials
➢ two new storage models to represent provenance
data in a data store compactly
➢ query execution strategies to derive the
provenance polynomials while executing the queries
How can we efficiently support queries tailored
with provenance information?
26
➢ a characterization of provenance-enabled queries,
that is, queries tailored with provenance data
➢ five provenance-oriented query execution strategies
➢ storage model and indexing techniques to handle
provenance-aware query execution strategies
Contributions
➢ new advance data co-location and partitioning
techniques for efficient and scalable query
processing in the Cloud
➢ first efficient provenance-aware database for
Linked Data
27
Linked Data Technology Stack
➢ URI
➢ RDF
➢ SPARQL
28
URI: A Uniform Resource Identifier
29
“A Uniform Resource Identifier (URI) provides a
simple and extensible means for identifying a
resource.” -- RFC 3986
Some URIs for “real world” things:
https://www.linkedin.com/in/mwylot
http://dbpedia.org/page/Fribourg
http://www.geonames.org/2657895
RDF Data Model
30
➢ Standard model for data interchange on the
Web
➢ Statements about resources/things (triples)
○ Subject(URI) Predicate(URI) Object(URI) .
SPARQL Query Language
31
➢ query and manipulate RDF graph content
required and optional graph patterns along with their
conjunctions and disjunctions
➢ aggregation, subqueries, negation, creating values by
expressions, extensible value testing, and constraining
queries by source graph
➢ results can be result sets or graphs
SELECT ?t WHERE {
?a <type> <article> .
?a <tag> <Obama> .
?a <title> ?t . }
Outline
➢ Linked Data Management System
➢ Storing and Tracing Provenance
➢ Querying Provenance Information
32
Diplodocus
33
A new distributed Linked Data
management system implementing
a novel hybrid storage model
based on flexible RDF templates.
System Architecture
34
Student787
Course18
Student
is_a
takes
John Doe
Last
Name
First
Name
4826 StudentID
Student67
Course5
Course6
Student
is_a
takes
takes
John Doe
Last
Name
First
Name
37347
StudentID
Hybrid Storage
Student28
Course3
Course8
Student
is_a
takes
takes
John Doe
Last
Name
First
Name
28821
StudentID
list of
types
T01➢ Hybrid model
➢ Two perspective
(horizontal and
vertical)
➢ Horizontal -> graph
➢ Vertical -> analytic
➢ Co-location
35
Outline
➢ Linked Data Management System
➢ Storing and Tracing Provenance
➢ Querying Provenance Information
36
Physical Storage Models
Differences:
➢ ease of implementation
➢ memory consumption
➢ query execution
➢ interference with the original concept of molecule
1) SPOL 2) LSPO 3) SLPO 4) SPLO
37
S - Subject
P - Predicate
O - Object
L - graph label, context value
Provenance Polynomials
➢ Ability to characterize ways each source contributed
➢ Pinpoint the exact source to each result
➢ Trace back the list of sources the way they were combined
to deliver a result
"Algebraic structures for capturing the provenance
of sparql queries."
Geerts, Floris, et al.
Proceedings of the 16th International Conference
on Database Theory. ACM, 2013.
38
Example Polynomial
select ?lat ?long where {
?a [] ``Eiffel Tower''.
?a inCountry FR .
?a lat ?lat .
?a long ?long .
}
(l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9) 39
Findings
➢ tracing provenance overhead is considerable
but acceptable, on average about 60-70%
➢ most suitable storage model depends upon
data and workloads characteristics
40
Outline
➢ Linked Data Management System
➢ Storing and Tracing Provenance
➢ Querying Provenance Information
41
Provenance-Enabled Query
A Workload Query is a query producing results a user is
interested in. These results are referred to as workload
query results.
A Provenance Query is a query that selects a set of data
from which the workload query results should originate.
A Provenance-Enabled Query is a pair consisting of a
Workload Query and a Provenance Query, producing results
a user is interested in (as specified by the Workload Query)
and originating only from data pre-selected by the
Provenance Query.
42
Provenance-Enabled Query: Example
SELECT ?t WHERE {
?a <type> <article> .
?a <tag> <Obama> .
?a <title> ?t . }
➢ ensure that the articles come from sources attributed to the government
SELECT ?ctx WHERE {
?ctx prov:wasAttributedTo <government> . }
➢ ensure that the data used to produce the answer was associated a
“SeniorEditor” and a “Manager”
SELECT ?ctx WHERE {
?ctx prov:wasGeneratedBy <articleProd>.
<articleProd> prov:wasAssociatedWith ?ed .
?ed rdf:type <SeniorEdior> .
<articleProd> prov:wasAssociatedWith ?m .
?m rdf:type <Manager> . }
43
Executing Provenance-Enabled Queries
44
TripleProv: Query Execution Pipeline
45
Five query execution strategies for
provenance-enabled queries.
Results
46
Queries tailored with provenance
information can be executed faster due
to the selectivity of provenance
information.
Lessons Learnt
➢ there is room for further improvement in Linked Data
management
➢ co-location of related entities is the right way
➢ provenance overhead does not have to be high
➢ we can leverage provenance information to improve
performance
47
The Future
Integrated and machine processable
data providing transparent results.
48
Summary
➢ new advance data co-location and partitioning
techniques for efficient query processing
➢ cloud support for scalable query processing
➢ first efficient provenance-aware Linked Data
management system
➢ source code available online
49

Mais conteúdo relacionado

Mais procurados

Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
5 data preparation and processing2
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2Mahmoud Alfarra
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)Krishan Pareek
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesRajendran
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools forIJDKP
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processingMahmoud Alfarra
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
 

Mais procurados (19)

Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
5 data preparation and processing2
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Data Cleansing
Data CleansingData Cleansing
Data Cleansing
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Ghhh
GhhhGhhh
Ghhh
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Data mining
Data miningData mining
Data mining
 
data mining
data miningdata mining
data mining
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
Data mining
Data miningData mining
Data mining
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
7 data transformation
7  data transformation7  data transformation
7 data transformation
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 

Destaque

Cefpi Southern Region
Cefpi Southern RegionCefpi Southern Region
Cefpi Southern Regiontechnolibrary
 
Catastrophes Humanitaires et Communication
Catastrophes Humanitaires et CommunicationCatastrophes Humanitaires et Communication
Catastrophes Humanitaires et CommunicationJan-Cedric Hansen
 
160511 バス列(内部向け10分)
160511 バス列(内部向け10分)160511 バス列(内部向け10分)
160511 バス列(内部向け10分)隆 司
 
UDトークアプリ新たな機能
UDトークアプリ新たな機能UDトークアプリ新たな機能
UDトークアプリ新たな機能marutatu
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...Marta Villegas
 
Zika : situation épidémiologique à mi janvier 2016
Zika : situation épidémiologique à mi janvier 2016Zika : situation épidémiologique à mi janvier 2016
Zika : situation épidémiologique à mi janvier 2016Marianne Bliman
 
【UDC2015】データ可視化 131 交通量計測器TRAPO
【UDC2015】データ可視化 131 交通量計測器TRAPO【UDC2015】データ可視化 131 交通量計測器TRAPO
【UDC2015】データ可視化 131 交通量計測器TRAPOCSISi
 

Destaque (15)

Legal Liab
Legal LiabLegal Liab
Legal Liab
 
Hands Only CPR
Hands Only CPRHands Only CPR
Hands Only CPR
 
Page0153
Page0153Page0153
Page0153
 
Cefpi Southern Region
Cefpi Southern RegionCefpi Southern Region
Cefpi Southern Region
 
Catastrophes Humanitaires et Communication
Catastrophes Humanitaires et CommunicationCatastrophes Humanitaires et Communication
Catastrophes Humanitaires et Communication
 
Ebookslibrarylinknj
EbookslibrarylinknjEbookslibrarylinknj
Ebookslibrarylinknj
 
Tesis asfaltos
Tesis asfaltosTesis asfaltos
Tesis asfaltos
 
Wetterwald
WetterwaldWetterwald
Wetterwald
 
160511 バス列(内部向け10分)
160511 バス列(内部向け10分)160511 バス列(内部向け10分)
160511 バス列(内部向け10分)
 
UDトークアプリ新たな機能
UDトークアプリ新たな機能UDトークアプリ新たな機能
UDトークアプリ新たな機能
 
ESPECIACIÓN QUÍMICA DE FÓSFORO EN SEDIMENTOS SUPERFICIALES DEL GOLFO DE PARIA...
ESPECIACIÓN QUÍMICA DE FÓSFORO EN SEDIMENTOS SUPERFICIALES DEL GOLFO DE PARIA...ESPECIACIÓN QUÍMICA DE FÓSFORO EN SEDIMENTOS SUPERFICIALES DEL GOLFO DE PARIA...
ESPECIACIÓN QUÍMICA DE FÓSFORO EN SEDIMENTOS SUPERFICIALES DEL GOLFO DE PARIA...
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
 
Zika : situation épidémiologique à mi janvier 2016
Zika : situation épidémiologique à mi janvier 2016Zika : situation épidémiologique à mi janvier 2016
Zika : situation épidémiologique à mi janvier 2016
 
【UDC2015】データ可視化 131 交通量計測器TRAPO
【UDC2015】データ可視化 131 交通量計測器TRAPO【UDC2015】データ可視化 131 交通量計測器TRAPO
【UDC2015】データ可視化 131 交通量計測器TRAPO
 
CPR Training by American Red Cross
CPR Training by American Red CrossCPR Training by American Red Cross
CPR Training by American Red Cross
 

Semelhante a Efficient, Scalable, and Provenance-Aware Management of Linked Data

Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vsIan Feller
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overviewjkvr
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials DataIan Foster
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Denodo
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021Dendej Sawarnkatat
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 

Semelhante a Efficient, Scalable, and Provenance-Aware Management of Linked Data (20)

Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Progress in delivering transparency in research data
Progress in delivering transparency in research dataProgress in delivering transparency in research data
Progress in delivering transparency in research data
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Dive deep into your Data Pools
Dive deep into your Data PoolsDive deep into your Data Pools
Dive deep into your Data Pools
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 

Mais de eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 

Mais de eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 

Último

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 

Último (20)

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 

Efficient, Scalable, and Provenance-Aware Management of Linked Data

  • 1. Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot University of Fribourg, June 19, 2015 Doctoral advisor Prof. Dr Philippe Cudré-Mauroux
  • 2. Quiz 3 ➢ Linked Data ➢ Big Data (3Vs) ➢ Semantic Web ➢ RDF ➢ Data Provenance
  • 3. Life is about Links 4
  • 4. Web of Documents 5 HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML HTML API HTML HTML HTML XML ➢ links between documents ➢ no structure, no semantics ➢ for human consumption ➢ cannot be processed by computers
  • 8. Web of Linked Data 9 thing thing thing thing thing thing thing links define relationships between things ➢ a global database ➢ design for machines ➢ links between things ➢ explicit semantics
  • 9. How Does it Change our Life? 10 ➢ a company is moving to another city ○ aggregated information on taxes, prices, salaries, unemployment, climat ➢ I’m moving to another country ○ mobile providers, bank accounts ➢ you’re buying a house ○ crime statistics, weather, house prices, neighbourhood, traffic information ➢ media annotation ○ video annotated with Linked Data retrieving always an actual bio of the speaker
  • 11. Clean Air Status and Trends - Ozone 12
  • 12. LSM: Live Linked Data 13
  • 14. 15
  • 15. 16
  • 16. Linked Open Data 17 9960 data collections from independent contributors
  • 17. Linked Data 12-05-30 18 ➢ describes a method of publishing semi-structured data in the Web ➢ data can be interlinked ➢ builds upon standard Web technologies ➢ extends the standards to share information in a way that it can be read automatically by computers
  • 18. Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Big Data 19
  • 19. Linked Data is Big Data 12-05-30 20 ➢ Volume: data size growing exponentially ➢ Velocity: streams of data from the Internet of Things Cloud ➢ Variety ○ semi-structured data ○ heterogenous linked collections of data
  • 20. Data Integration ➢ Integrated and summarized data ➢ Necessity to established transparency 21
  • 21. Data Provenance “Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.” Which pieces of data and how they were combined to produce the results? 22
  • 23. How to efficiently store and query vast amounts of Linked Data in the cloud? 24 ➢ a new physiological data partitioning algorithm to efficiently and effectively partition the graph and co- locate related instances in partitions ➢ a new system architecture for handling fine-grained partitions at scale ➢ novel data placement techniques to co-locate semantically related pieces of data ➢ new data loading and query execution strategies taking advantage of our system’s data partitions and indices
  • 24. How to store and track provenance in Linked Data processing? 25 ➢ a new way to express the provenance of query results at two different granularity levels by leveraging the concept of provenance polynomials ➢ two new storage models to represent provenance data in a data store compactly ➢ query execution strategies to derive the provenance polynomials while executing the queries
  • 25. How can we efficiently support queries tailored with provenance information? 26 ➢ a characterization of provenance-enabled queries, that is, queries tailored with provenance data ➢ five provenance-oriented query execution strategies ➢ storage model and indexing techniques to handle provenance-aware query execution strategies
  • 26. Contributions ➢ new advance data co-location and partitioning techniques for efficient and scalable query processing in the Cloud ➢ first efficient provenance-aware database for Linked Data 27
  • 27. Linked Data Technology Stack ➢ URI ➢ RDF ➢ SPARQL 28
  • 28. URI: A Uniform Resource Identifier 29 “A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource.” -- RFC 3986 Some URIs for “real world” things: https://www.linkedin.com/in/mwylot http://dbpedia.org/page/Fribourg http://www.geonames.org/2657895
  • 29. RDF Data Model 30 ➢ Standard model for data interchange on the Web ➢ Statements about resources/things (triples) ○ Subject(URI) Predicate(URI) Object(URI) .
  • 30. SPARQL Query Language 31 ➢ query and manipulate RDF graph content required and optional graph patterns along with their conjunctions and disjunctions ➢ aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source graph ➢ results can be result sets or graphs SELECT ?t WHERE { ?a <type> <article> . ?a <tag> <Obama> . ?a <title> ?t . }
  • 31. Outline ➢ Linked Data Management System ➢ Storing and Tracing Provenance ➢ Querying Provenance Information 32
  • 32. Diplodocus 33 A new distributed Linked Data management system implementing a novel hybrid storage model based on flexible RDF templates.
  • 34. Student787 Course18 Student is_a takes John Doe Last Name First Name 4826 StudentID Student67 Course5 Course6 Student is_a takes takes John Doe Last Name First Name 37347 StudentID Hybrid Storage Student28 Course3 Course8 Student is_a takes takes John Doe Last Name First Name 28821 StudentID list of types T01➢ Hybrid model ➢ Two perspective (horizontal and vertical) ➢ Horizontal -> graph ➢ Vertical -> analytic ➢ Co-location 35
  • 35. Outline ➢ Linked Data Management System ➢ Storing and Tracing Provenance ➢ Querying Provenance Information 36
  • 36. Physical Storage Models Differences: ➢ ease of implementation ➢ memory consumption ➢ query execution ➢ interference with the original concept of molecule 1) SPOL 2) LSPO 3) SLPO 4) SPLO 37 S - Subject P - Predicate O - Object L - graph label, context value
  • 37. Provenance Polynomials ➢ Ability to characterize ways each source contributed ➢ Pinpoint the exact source to each result ➢ Trace back the list of sources the way they were combined to deliver a result "Algebraic structures for capturing the provenance of sparql queries." Geerts, Floris, et al. Proceedings of the 16th International Conference on Database Theory. ACM, 2013. 38
  • 38. Example Polynomial select ?lat ?long where { ?a [] ``Eiffel Tower''. ?a inCountry FR . ?a lat ?lat . ?a long ?long . } (l1 ⊕ l2 ⊕ l3) ⊗ (l4 ⊕ l5) ⊗ ( l6 ⊕ l7) ⊗ (l8 ⊕ l9) 39
  • 39. Findings ➢ tracing provenance overhead is considerable but acceptable, on average about 60-70% ➢ most suitable storage model depends upon data and workloads characteristics 40
  • 40. Outline ➢ Linked Data Management System ➢ Storing and Tracing Provenance ➢ Querying Provenance Information 41
  • 41. Provenance-Enabled Query A Workload Query is a query producing results a user is interested in. These results are referred to as workload query results. A Provenance Query is a query that selects a set of data from which the workload query results should originate. A Provenance-Enabled Query is a pair consisting of a Workload Query and a Provenance Query, producing results a user is interested in (as specified by the Workload Query) and originating only from data pre-selected by the Provenance Query. 42
  • 42. Provenance-Enabled Query: Example SELECT ?t WHERE { ?a <type> <article> . ?a <tag> <Obama> . ?a <title> ?t . } ➢ ensure that the articles come from sources attributed to the government SELECT ?ctx WHERE { ?ctx prov:wasAttributedTo <government> . } ➢ ensure that the data used to produce the answer was associated a “SeniorEditor” and a “Manager” SELECT ?ctx WHERE { ?ctx prov:wasGeneratedBy <articleProd>. <articleProd> prov:wasAssociatedWith ?ed . ?ed rdf:type <SeniorEdior> . <articleProd> prov:wasAssociatedWith ?m . ?m rdf:type <Manager> . } 43
  • 44. TripleProv: Query Execution Pipeline 45 Five query execution strategies for provenance-enabled queries.
  • 45. Results 46 Queries tailored with provenance information can be executed faster due to the selectivity of provenance information.
  • 46. Lessons Learnt ➢ there is room for further improvement in Linked Data management ➢ co-location of related entities is the right way ➢ provenance overhead does not have to be high ➢ we can leverage provenance information to improve performance 47
  • 47. The Future Integrated and machine processable data providing transparent results. 48
  • 48. Summary ➢ new advance data co-location and partitioning techniques for efficient query processing ➢ cloud support for scalable query processing ➢ first efficient provenance-aware Linked Data management system ➢ source code available online 49