SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Decentralized Evolution and Consolidation of RDF Graphs
Natanael Arndt and Michael Martin
June 7, 2017
ICWE 2017, Rome
LEDS
INKED NTERPRISE ATA ERVICESL E D S
Introduction
Introduction
1 / 45
Introduction
2 / 45
Introduction: Pfarrerbuch
content editor
(Project Team)
[protected zone]
SPARQL
Endpoint
HTML GUI
[stable]
OntoWiki
Persistency Layer
Backup Model
query, search
add, edit, maintain
3 / 45
Introduction: Catalogus Professorum Lipsiensium
synchronize
Model Data (SPARUL)
Linked Data
Linked Data
Partial RDF export
Full RDF export
Backup Model
experienced
web user
content editor
(Project Team)
general
web user
SPARQL
Endpoint
HTML GUI
[stable]
OntoWiki
Persistency Layer
SPARQL
Endpoint
HTML GUI
[experimental]
OntoWiki
Persistency Layer
HTML GUI
[stable]
CPL Frontend
Persistency Layer
OCPY
TOWEL
configure
configure
query, search
add, edit, maintain
getData
query, search
browse, annotate, discuss
synchronize
Model Data
synchronize
Model Data
browse, search
[protected zone]
[public zone]
4 / 45
Introduction
• Central SPARQL endpoints
• Single Point of Failure, Unavailability
• One consolidated status of the data
• Only trusted access allowed
• Asynchronous collaboration leads to inconsistency
5 / 45
Introduction: From Software Engineer To Data Engineering
• In Software Engineering the term Software Crisis was coined
• Systems and problems became more an more complex
• Software Engineering Methods made the process of creating software more
controllable
• Configuration Management brought Source Code Management
• CVS and SVN central systems
• Darcs, Mercurial, Git decentralized
• Git widely used (even Microsoft switched the Windows development to Git)
6 / 45
Introduction: From Software Engineer To Data Engineering
• Subject of collaboration are RDF Graphs rather then Source Code Files
• Consistency checks in DSCM ecosystem are made using Continous
integration
7 / 45
Introduction: From Software Engineer To Data Engineering
8 / 45
Related Work/State of the Art
Related Work/State of the Art
Approach storage quad
support
bnodes branches merge push/pull
TailR [4] hybrid noa yes nof no (yes)h
Eccrev [2] delta yes yes nof no no
R43ples [3] delta nob,c (yes)d yes no no
R&W base [5] delta noc (yes)e yes (yes)g no
dat chunks n/a n/a no no yes
a
The granularity of versioning are repositories; b
Only single graphs are put under version control;
c
The context is used to encode revisions; d
Blank nodes are skolemized; e
Blank nodes are
addressed by internal identifiers; f
Only linear change tracking is supported; g
Naive merge
implementation; h
No pull requests but history replication via memento API
9 / 45
Preliminaries
Preliminaries
• Atomic Graph
• Atomic Partition
• Difference
• Change
• Application of a Change
10 / 45
Preliminaries: Atomic Graph
A
C
B
D E
11 / 45
Preliminaries: Atomic Graph
A
C
B
D E
12 / 45
Preliminaries: Atomic Graph
A
C
B
D E
12 / 45
Preliminaries: Atomic Graph
A
C
B
D E
12 / 45
Preliminaries: Atomic Graph
A
C
B
D E
12 / 45
Preliminaries: Atomic Graph
A
C
B
D E
12 / 45
Preliminaries: Atomic Partition
A
C
B
D E
A
C
D
D E
A
C
B
D
13 / 45
Preliminaries: Atomic Partition
A
C
B
D E
A
C
D
D E
A
C
B
D
14 / 45
Preliminaries: Difference
∆(G, G′
) := (C+
, C−
)
Δ
A
C
B
D E
A
C
B
D E
15 / 45
Preliminaries: Difference
C−
:=
˙∪ (
˘P
(
P(G)  P(G′
)
))
A
C
B
D E
A
C
B
D E
⋃
16 / 45
Preliminaries: Difference
C−
:=
˙∪ (
˘P
(
P(G)  P(G′
)
))
A
C
D
D E
A
C
B
D
A
C
B
E
D E
C
D
A
⋃
A
C
B
D

17 / 45
Preliminaries: Difference
C+
:=
˙∪ (
˘P
(
P(G′
)  P(G)
))
A
C
D
D E
A
C
B
D
A
C
B
E
D E
C
D
A
⋃ 
A
C
B
E
18 / 45
Preliminaries: Difference resp. Change
C+
:=
˙∪ (
˘P
(
P(G′
)  P(G)
))
C−
:=
˙∪ (
˘P
(
P(G)  P(G′
)
))
∆(G, G′
) := (C+
, C−
)
A
C
B
D
A
C
B
E
Δ
A
C
B
D E
A
C
B
D E
19 / 45
Preliminaries: Application of a Change
Apl(G, (C+
G , C−
G )) :=
˙∪ (
˘P
(
(P(G)  P(C−
G )) ∪ P(C+
G )
))
A
C
B
D
A
C
B
E
A
C
B
D E
Apl
20 / 45
Preliminaries: Application of a Change
Apl(G, (C+
G , C−
G )) :=
˙∪ (
˘P
(
(P(G)  P(C−
G )) ∪ P(C+
G )
))
A
C
B
E
D E
C
D
A
A
C
D
D E
A
C
B
D
A
C
B
D
 ∪
A
C
B
E
21 / 45
Preliminaries: Application of a Change
Apl(G, (C+
G , C−
G )) :=
˙∪ (
˘P
(
(P(G)  P(C−
G )) ∪ P(C+
G )
))
A
C
B
E
D E
C
D
A
A
C
D
D E
A
C
B
D
A
C
B
D
 ∪
A
C
B
E
A
C
B
D E
22 / 45
Preliminaries: Application of a Change
Apl(G, (C+
G , C−
G )) :=
˙∪ (
˘P
(
(P(G)  P(C−
G )) ∪ P(C+
G )
))
A
C
B
D
A
C
B
E
A
C
B
D E
Apl
A
C
B
D E
23 / 45
Operations
Operations
• Commit
• Distributed Evolution
• Merge of Two Evolved Graphs
• Revert a Commit
24 / 45
Operations: Commit
A
A({G0
})
25 / 45
Operations: Commit
A
A({G0
})
Apl(G0
, (C+
G0 , C−
G0 )) = G
25 / 45
Operations: Commit
A B
A({G0
})
Apl(G0
, (C+
G0 , C−
G0 )) = G
B{A}({G})
25 / 45
Operations: Commit
A B C
A({G0
})
Apl(G0
, (C+
G0 , C−
G0 )) = G
B{A}({G})
C{B{A}}({G′
})
25 / 45
Operations: Distributed Evolution
A B C
D
Figure 1: Two branches evolved from a common commit
D{B{A}}({G′′
})
26 / 45
Operations: Merge of Two Evolved Graphs
Merge(C({G′
}), D({G′′
})) = E{C,D}({G′′′
})
A B C
D
E
Figure 2: Merging commits from two branches into a common version of the graph
27 / 45
Operations: Revert a Commit
∆−1
(G0
, G) = ∆(G, G0
)
A B B−1
Figure 3: A commit reverting the previous commit
28 / 45
Merge Strategies
Merge Strategies
• Union Merge
• All Ours/All Theirs
• Three-Way Merge
29 / 45
Merge Strategies: Union Merge
A
C
B
D E
A
C
D
D E
A
C
B
D
A
C
B
E
D E
C
D
A
A
C
B
D E
∪⋃
A
C
B
D E
=
30 / 45
Merge Strategies: All Ours/All Theirs
A
C
B
D E
A
C
B
D E
A
C
B
D E
X
31 / 45
Merge Strategies: Three-Way Merge
A
C
B
D E
A
C
B
D E
A
C
B
D E
32 / 45
Merge Strategies: Three-Way Merge
A
C
B
D E
A
C
B
D E
A
C
B
D E
E
C⁺=
C⁺=
C⁻=
A
C
B
A
C
B
D
33 / 45
Merge Strategies: Three-Way Merge
A
C
B
D E
A
C
B
D E
A
C
B
D E
A
C
B
D E
E
C⁺=
C⁺=
C⁻=
A
C
B
A
C
B
D
34 / 45
Evaluation
Evaluation
• We have a prototypical implementation of our concepts: QuitStore1
• We are using the Berlin SPARQL benchmark (BSBM) for evaluating our system
• We are using the Explore and Update Use Case since it provides SPARQL query
and update operations
• All scripts used are available at https://github.com/AKSW/QuitEval
1
https://github.com/AKSW/QuitStore
35 / 45
Evaluation: Correctness of Version Tracking
• We take git repository, the initial data set and the query execution log
(run.log) produced by BSBM
• We load the data into a store and execute all queries stored in the run.log
• We could verify, that the state of the store was always similar to the content
of the respective git commit
36 / 45
Evaluation: Correctness of Merge Method
• We take the graph generated by BSBM
• We generate branches with two randomly different sets of added and
deleted statements
• We generate a graph containing the expected result of the merge operation
• We execute the merge operation and compare the resulting graph to the
expected result
• This process was repeated 1000 times
37 / 45
Evaluation: Performance
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
200
400
600
800
1000
1200
MiB
MiB(memory)
#commits
Quit repo size
Quit with gc repo size
Quit memory
Quit with gc memory
38 / 45
Evaluation: Performance
0.01
0.1
1
10
100
1000
IN
SERT
D
ATA
D
ELETE
W
H
EREExplore
1Explore
2Explore
3Explore
4Explore
5Explore
7Explore
8Explore
9
Explore
10
Explore
11
Explore
12
queriespersecond(qps) quit versioning
quit versioning with gc
no versioning (baseline)
39 / 45
Conclusion
Conclusion
• Presented a formal framework for the distributed evolution of RDF
knowledge bases
• Atomic operations on RDF graphs
• Formalized definitions of the versioning operations: commit, branch, merge
and revert
• Quad aware, handle blank nodes, supports branches, supports merging with
conflict resolution, allows distributed collaboration with push and pull
• Merge strategies where transfered to the application on atomic graphs to be
used on RDF datasets
40 / 45
Future Work
Future Work
• Improve our Quit Store implementation to support the complete framework
• Explore provenance tracked by Git through an RDF interface ✓ [1]
• Implement the Quit architecture for real world problems
41 / 45
Future Work
experienced
web user
general
web user
content editor
(Project Team)
[protected zone]
SPARQL
Endpoint
HTML GUI
[stable]
OntoWiki
Persistency Layer
query, search
add, edit, maintain
clone/fetch/push
public + private
Data
Data
Transformation
Tasks (ETL)
add new Data
Legacy Data Sources
[public zone]
any RDF
Editor
Commenting
Interface
Browsing
Interfacequery, search
comment
42 / 45
References I
N. Arndt, P. Naumann, and E. Marx.
Exploring the evolution and provenance of git versioned rdf data.
In J. D. Fernández, J. Debattista, and J. Umbrich, editors, 3rd Workshop on
Managing the Evolution and Preservation of the Data Web (MEPDaW) co-located
with 14th European Semantic Web Conference (ESWC 2017), Portoroz, Slovenia,
May 2017.
M. Frommhold, R. N. Piris, N. Arndt, S. Tramp, N. Petersen, and M. Martin.
Towards Versioning of Arbitrary RDF Data.
In 12th International Conference on Semantic Systems Proceedings (SEMANTiCS
2016), SEMANTiCS ’16, Leipzig, Germany, Sept. 2016.
43 / 45
References II
M. Graube, S. Hensel, and L. Urbas.
Open semantic revision control with r43ples: Extending sparql to access
revisions of named graphs.
In Proceedings of the 12th International Conference on Semantic Systems,
SEMANTiCS 2016, pages 49–56, New York, NY, USA, 2016. ACM.
P. Meinhardt, M. Knuth, and H. Sack.
Tailr: A platform for preserving history on the web of data.
In Proceedings of the 11th International Conference on Semantic Systems,
SEMANTICS ’15, pages 57–64, New York, NY, USA, 2015. ACM.
44 / 45
References III
M. V. Sande, P. Colpaert, R. Verborgh, S. Coppens, E. Mannens, and R. V.
de Walle.
R&wbase: git for triples.
In C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas, and S. Auer, editors,
LDOW, volume 996 of CEUR Workshop Proceedings. CEUR-WS.org, 2013.
45 / 45

Mais conteúdo relacionado

Mais procurados

Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationDBOnto
 
The internals of gporca optimizer
The internals of gporca optimizerThe internals of gporca optimizer
The internals of gporca optimizerXin Zhang
 
GPDB Meetup GPORCA OSS 101
GPDB Meetup GPORCA OSS 101GPDB Meetup GPORCA OSS 101
GPDB Meetup GPORCA OSS 101Xin Zhang
 
Anaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationAnaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationGirinath Pillai
 
DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS
DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMSDYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS
DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMSPraveen Penumathsa
 
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...Neo4j
 
Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines
Heaven: A Framework for Systematic Comparative Research Approach for RSP EnginesHeaven: A Framework for Systematic Comparative Research Approach for RSP Engines
Heaven: A Framework for Systematic Comparative Research Approach for RSP EnginesRiccardo Tommasini
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Kevin Xu
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsMonica Beckwith
 
openCypher: Naming and Addressing Multiple Graphs
openCypher: Naming and Addressing Multiple GraphsopenCypher: Naming and Addressing Multiple Graphs
openCypher: Naming and Addressing Multiple GraphsopenCypher
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallelmfolk
 
Filelist
FilelistFilelist
FilelistNeelBca
 
RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolMDC_UNICA
 

Mais procurados (17)

Sprint 79
Sprint 79Sprint 79
Sprint 79
 
Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox Presentation
 
The internals of gporca optimizer
The internals of gporca optimizerThe internals of gporca optimizer
The internals of gporca optimizer
 
GPDB Meetup GPORCA OSS 101
GPDB Meetup GPORCA OSS 101GPDB Meetup GPORCA OSS 101
GPDB Meetup GPORCA OSS 101
 
HDF4 Mapping Project Update
HDF4 Mapping Project UpdateHDF4 Mapping Project Update
HDF4 Mapping Project Update
 
Anaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange InstallationAnaconda Python KNIME & Orange Installation
Anaconda Python KNIME & Orange Installation
 
Formula excel
Formula excelFormula excel
Formula excel
 
DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS
DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMSDYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS
DYNAMIC SLICING OF ASPECT-ORIENTED PROGRAMS
 
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
Graphing Enterprise IT – Representing IT Infrastructure and Business Processe...
 
Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines
Heaven: A Framework for Systematic Comparative Research Approach for RSP EnginesHeaven: A Framework for Systematic Comparative Research Approach for RSP Engines
Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors
 
Performance Co-Pilot
Performance Co-PilotPerformance Co-Pilot
Performance Co-Pilot
 
openCypher: Naming and Addressing Multiple Graphs
openCypher: Naming and Addressing Multiple GraphsopenCypher: Naming and Addressing Multiple Graphs
openCypher: Naming and Addressing Multiple Graphs
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
Filelist
FilelistFilelist
Filelist
 
RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer Tool
 

Semelhante a Decentralized Evolution and Consolidation of RDF Graphs

Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 reviewManageIQ
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collectionsBIOVIA
 
Managing large scale projects in R with R Suite
Managing large scale projects in R with R SuiteManaging large scale projects in R with R Suite
Managing large scale projects in R with R SuiteWLOG Solutions
 
OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...Ganesan Narayanasamy
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big DataPingCAP
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQLPingCAP
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonMicrosoft
 
Gearpump akka streams
Gearpump akka streamsGearpump akka streams
Gearpump akka streamsKam Kasravi
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
ACL-2022_tutorial_part_AB_V8 (1).pdf
ACL-2022_tutorial_part_AB_V8 (1).pdfACL-2022_tutorial_part_AB_V8 (1).pdf
ACL-2022_tutorial_part_AB_V8 (1).pdftuxinhui1
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Christian Catalan
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 

Semelhante a Decentralized Evolution and Consolidation of RDF Graphs (20)

Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections
 
Managing large scale projects in R with R Suite
Managing large scale projects in R with R SuiteManaging large scale projects in R with R Suite
Managing large scale projects in R with R Suite
 
OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
GraphQL + relay
GraphQL + relayGraphQL + relay
GraphQL + relay
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
Move from C to Go
Move from C to GoMove from C to Go
Move from C to Go
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
Gearpump akka streams
Gearpump akka streamsGearpump akka streams
Gearpump akka streams
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
slides
slidesslides
slides
 
ACL-2022_tutorial_part_AB_V8 (1).pdf
ACL-2022_tutorial_part_AB_V8 (1).pdfACL-2022_tutorial_part_AB_V8 (1).pdf
ACL-2022_tutorial_part_AB_V8 (1).pdf
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 

Último

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Último (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Decentralized Evolution and Consolidation of RDF Graphs

  • 1. Decentralized Evolution and Consolidation of RDF Graphs Natanael Arndt and Michael Martin June 7, 2017 ICWE 2017, Rome LEDS INKED NTERPRISE ATA ERVICESL E D S
  • 5. Introduction: Pfarrerbuch content editor (Project Team) [protected zone] SPARQL Endpoint HTML GUI [stable] OntoWiki Persistency Layer Backup Model query, search add, edit, maintain 3 / 45
  • 6. Introduction: Catalogus Professorum Lipsiensium synchronize Model Data (SPARUL) Linked Data Linked Data Partial RDF export Full RDF export Backup Model experienced web user content editor (Project Team) general web user SPARQL Endpoint HTML GUI [stable] OntoWiki Persistency Layer SPARQL Endpoint HTML GUI [experimental] OntoWiki Persistency Layer HTML GUI [stable] CPL Frontend Persistency Layer OCPY TOWEL configure configure query, search add, edit, maintain getData query, search browse, annotate, discuss synchronize Model Data synchronize Model Data browse, search [protected zone] [public zone] 4 / 45
  • 7. Introduction • Central SPARQL endpoints • Single Point of Failure, Unavailability • One consolidated status of the data • Only trusted access allowed • Asynchronous collaboration leads to inconsistency 5 / 45
  • 8. Introduction: From Software Engineer To Data Engineering • In Software Engineering the term Software Crisis was coined • Systems and problems became more an more complex • Software Engineering Methods made the process of creating software more controllable • Configuration Management brought Source Code Management • CVS and SVN central systems • Darcs, Mercurial, Git decentralized • Git widely used (even Microsoft switched the Windows development to Git) 6 / 45
  • 9. Introduction: From Software Engineer To Data Engineering • Subject of collaboration are RDF Graphs rather then Source Code Files • Consistency checks in DSCM ecosystem are made using Continous integration 7 / 45
  • 10. Introduction: From Software Engineer To Data Engineering 8 / 45
  • 12. Related Work/State of the Art Approach storage quad support bnodes branches merge push/pull TailR [4] hybrid noa yes nof no (yes)h Eccrev [2] delta yes yes nof no no R43ples [3] delta nob,c (yes)d yes no no R&W base [5] delta noc (yes)e yes (yes)g no dat chunks n/a n/a no no yes a The granularity of versioning are repositories; b Only single graphs are put under version control; c The context is used to encode revisions; d Blank nodes are skolemized; e Blank nodes are addressed by internal identifiers; f Only linear change tracking is supported; g Naive merge implementation; h No pull requests but history replication via memento API 9 / 45
  • 14. Preliminaries • Atomic Graph • Atomic Partition • Difference • Change • Application of a Change 10 / 45
  • 21. Preliminaries: Atomic Partition A C B D E A C D D E A C B D 13 / 45
  • 22. Preliminaries: Atomic Partition A C B D E A C D D E A C B D 14 / 45
  • 23. Preliminaries: Difference ∆(G, G′ ) := (C+ , C− ) Δ A C B D E A C B D E 15 / 45
  • 24. Preliminaries: Difference C− := ˙∪ ( ˘P ( P(G) P(G′ ) )) A C B D E A C B D E ⋃ 16 / 45
  • 25. Preliminaries: Difference C− := ˙∪ ( ˘P ( P(G) P(G′ ) )) A C D D E A C B D A C B E D E C D A ⋃ A C B D 17 / 45
  • 26. Preliminaries: Difference C+ := ˙∪ ( ˘P ( P(G′ ) P(G) )) A C D D E A C B D A C B E D E C D A ⋃ A C B E 18 / 45
  • 27. Preliminaries: Difference resp. Change C+ := ˙∪ ( ˘P ( P(G′ ) P(G) )) C− := ˙∪ ( ˘P ( P(G) P(G′ ) )) ∆(G, G′ ) := (C+ , C− ) A C B D A C B E Δ A C B D E A C B D E 19 / 45
  • 28. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B D A C B E A C B D E Apl 20 / 45
  • 29. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B E D E C D A A C D D E A C B D A C B D ∪ A C B E 21 / 45
  • 30. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B E D E C D A A C D D E A C B D A C B D ∪ A C B E A C B D E 22 / 45
  • 31. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B D A C B E A C B D E Apl A C B D E 23 / 45
  • 33. Operations • Commit • Distributed Evolution • Merge of Two Evolved Graphs • Revert a Commit 24 / 45
  • 36. Operations: Commit A B A({G0 }) Apl(G0 , (C+ G0 , C− G0 )) = G B{A}({G}) 25 / 45
  • 37. Operations: Commit A B C A({G0 }) Apl(G0 , (C+ G0 , C− G0 )) = G B{A}({G}) C{B{A}}({G′ }) 25 / 45
  • 38. Operations: Distributed Evolution A B C D Figure 1: Two branches evolved from a common commit D{B{A}}({G′′ }) 26 / 45
  • 39. Operations: Merge of Two Evolved Graphs Merge(C({G′ }), D({G′′ })) = E{C,D}({G′′′ }) A B C D E Figure 2: Merging commits from two branches into a common version of the graph 27 / 45
  • 40. Operations: Revert a Commit ∆−1 (G0 , G) = ∆(G, G0 ) A B B−1 Figure 3: A commit reverting the previous commit 28 / 45
  • 42. Merge Strategies • Union Merge • All Ours/All Theirs • Three-Way Merge 29 / 45
  • 43. Merge Strategies: Union Merge A C B D E A C D D E A C B D A C B E D E C D A A C B D E ∪⋃ A C B D E = 30 / 45
  • 44. Merge Strategies: All Ours/All Theirs A C B D E A C B D E A C B D E X 31 / 45
  • 45. Merge Strategies: Three-Way Merge A C B D E A C B D E A C B D E 32 / 45
  • 46. Merge Strategies: Three-Way Merge A C B D E A C B D E A C B D E E C⁺= C⁺= C⁻= A C B A C B D 33 / 45
  • 47. Merge Strategies: Three-Way Merge A C B D E A C B D E A C B D E A C B D E E C⁺= C⁺= C⁻= A C B A C B D 34 / 45
  • 49. Evaluation • We have a prototypical implementation of our concepts: QuitStore1 • We are using the Berlin SPARQL benchmark (BSBM) for evaluating our system • We are using the Explore and Update Use Case since it provides SPARQL query and update operations • All scripts used are available at https://github.com/AKSW/QuitEval 1 https://github.com/AKSW/QuitStore 35 / 45
  • 50. Evaluation: Correctness of Version Tracking • We take git repository, the initial data set and the query execution log (run.log) produced by BSBM • We load the data into a store and execute all queries stored in the run.log • We could verify, that the state of the store was always similar to the content of the respective git commit 36 / 45
  • 51. Evaluation: Correctness of Merge Method • We take the graph generated by BSBM • We generate branches with two randomly different sets of added and deleted statements • We generate a graph containing the expected result of the merge operation • We execute the merge operation and compare the resulting graph to the expected result • This process was repeated 1000 times 37 / 45
  • 52. Evaluation: Performance 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 200 400 600 800 1000 1200 MiB MiB(memory) #commits Quit repo size Quit with gc repo size Quit memory Quit with gc memory 38 / 45
  • 55. Conclusion • Presented a formal framework for the distributed evolution of RDF knowledge bases • Atomic operations on RDF graphs • Formalized definitions of the versioning operations: commit, branch, merge and revert • Quad aware, handle blank nodes, supports branches, supports merging with conflict resolution, allows distributed collaboration with push and pull • Merge strategies where transfered to the application on atomic graphs to be used on RDF datasets 40 / 45
  • 57. Future Work • Improve our Quit Store implementation to support the complete framework • Explore provenance tracked by Git through an RDF interface ✓ [1] • Implement the Quit architecture for real world problems 41 / 45
  • 58. Future Work experienced web user general web user content editor (Project Team) [protected zone] SPARQL Endpoint HTML GUI [stable] OntoWiki Persistency Layer query, search add, edit, maintain clone/fetch/push public + private Data Data Transformation Tasks (ETL) add new Data Legacy Data Sources [public zone] any RDF Editor Commenting Interface Browsing Interfacequery, search comment 42 / 45
  • 59. References I N. Arndt, P. Naumann, and E. Marx. Exploring the evolution and provenance of git versioned rdf data. In J. D. Fernández, J. Debattista, and J. Umbrich, editors, 3rd Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW) co-located with 14th European Semantic Web Conference (ESWC 2017), Portoroz, Slovenia, May 2017. M. Frommhold, R. N. Piris, N. Arndt, S. Tramp, N. Petersen, and M. Martin. Towards Versioning of Arbitrary RDF Data. In 12th International Conference on Semantic Systems Proceedings (SEMANTiCS 2016), SEMANTiCS ’16, Leipzig, Germany, Sept. 2016. 43 / 45
  • 60. References II M. Graube, S. Hensel, and L. Urbas. Open semantic revision control with r43ples: Extending sparql to access revisions of named graphs. In Proceedings of the 12th International Conference on Semantic Systems, SEMANTiCS 2016, pages 49–56, New York, NY, USA, 2016. ACM. P. Meinhardt, M. Knuth, and H. Sack. Tailr: A platform for preserving history on the web of data. In Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS ’15, pages 57–64, New York, NY, USA, 2015. ACM. 44 / 45
  • 61. References III M. V. Sande, P. Colpaert, R. Verborgh, S. Coppens, E. Mannens, and R. V. de Walle. R&wbase: git for triples. In C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas, and S. Auer, editors, LDOW, volume 996 of CEUR Workshop Proceedings. CEUR-WS.org, 2013. 45 / 45