SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
PARALLEL AND INCREMENTAL MATERIALISATION OF
RDF/DATALOG IN RDFOX
Boris Motik
University of Oxford
March 20, 2015
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
Introduction
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
Introduction
RDFOX SUMMARY
RDFox: a new RDF store and reasoner
http://www.cs.ox.ac.uk/isg/tools/RDFox/
Features:
RAM-based storage of RDF data
Currently centralised, but a distributed system is in the works
Datalog reasoning via materialisation
Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL
Very effective parallelisation
Efficient reasoning with owl:sameAs via rewriting
Known and widely-used technique, but correctness not trivial
The Backward-Forward (B/F) incremental maintenance algorithm
Considerably improves on DRed
Compatible with rewriting of owl:sameAs
SPARQL query answering
Most of SPARQL 1.0 and some of SPARQL 1.1
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
Parallel Materialisation of Datalog
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
Parallel Materialisation of Datalog
MAIN CHALLENGES
1 Assign workload to threads evenly
Rules are generally not independent due to recursion
Static assignment of rule instances can be affected due to data skew
⇒ Dynamic assignment with low overhead needed
2 Efficiently interleave . . .
. . . querying (during evaluation of rule bodies)
. . . updates (during updates of derived facts)
3 Provide indexes for efficient rule body evaluation
Crucial for elimination of duplicate triples ⇒ ensures termination
Usually sorted (and clustered) to allow for merge joins
Hash indexes can also be used
Individual (i.e., not bulk) index updates are inefficient
B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in
Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
⇒ R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
⇒ R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
⇒ R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
⇒ R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
⇒ A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(a,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
⇒ R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
⇒ R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
⇒ A(b)
A(c)
A(d)
A(e)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(b,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
⇒ A(c)
A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(c,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
⇒ A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(d,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
⇒ A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(e,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
⇒ A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(f,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
A(f)
⇒ A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(g,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of facts
ensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
Parallel Materialisation of Datalog
SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY
The critically algorithm depends on:
matching atoms t1, t2, t3 with ti a constant or variable
continuous concurrent updates
Our RDF storage data structure:
Hash-based indexes ⇒ naturally parallel data structure
‘Mostly’ lock-free: at least one thread makes progress at most of the time
compare: if a thread acquire a lock and dies, other threads are blocked
main benefit: performance is less susceptible to scheduling decisions
Main technical challenge: reduce thread interference
When A writes to a location cached by B, the cache of B is invalidated
Our updates ensure that threads (typically) write to different locations
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
Parallel Materialisation of Datalog
EVALUATION I: PARALLELISATION SPEEDUP
RDFox: an RDF store developed at Oxford University
http://www.cs.ox.ac.uk/isg/tools/RDFox/
8 16 24 32
2
4
6
8
10
12
14
16
18
20
ClarosL
ClarosLE
DBpediaL
DBpediaLE
LUBML 01K
LUBMU 01K
8 16 24 32
2
4
6
8
10
12
14
16
18
20
UOBML 01K
UOBMU 010
LUBMLE 01K
LUBML 05K
LUBMLE 05K
LUBMU 05K
Small concurrency overhead; parallelisation pays off already with two threads
Speedup continues to increase after we exhaust all physical cores
⇒ hyperthreading and parallelism can compensate CPU cache misses
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
Parallel Materialisation of Datalog
EVALUATION II: ORACLE’S SPARC T5
Machine specification:
4 TB of RAM
128 physical cores that support 1024 virtual cores via hyperthreading
Name Triples Time (s) Speedup Inference
Initial Resulting Threads rate
1 1024 (triples/s)
ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M
LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M
LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M
LUBML-140k: 8 G triples, materialised to 10.9 G triples
20 threads: 2000 s, inference rate 1.45 M triples/s
128 threads: 599 s, inference rate 4.84 M triples/s
Materialised dataset used about half of RAM (2 TB)
Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
Handling owl:sameAs via Rewriting
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
Handling owl:sameAs via Rewriting
HANDLING owl:sameAs VIA REWRITING
Rewriting: replace all equal constants with one representative
Well known approach; used in graphDB, Oracle, WebPIE, . . .
Much more efficient than direct materialisation
Open question I: Effective parallelisation
Lock-free maintenance of representatives
Care needed to ensure correctness and nonrepetition of derivations
Open question II: Query evaluation
Could expand rewritten data before query evaluation, but that is inefficient
Better: evaluate queries on rewritten data and expand the answer
Such a straightforward approach is incorrect:
Result cardinalities might be wrong
The presence of FILTER and BIND can make results incorrect
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
Handling owl:sameAs via Rewriting
EVALUATION OF REWRITING
UOBM 279 4 2.2M
AX 36M 1.2 332M 16,152M
REW 9.4M 9.7M 0.4 33.8M 4,256M 686
factor 3.2x 3.2x 9.9x 3.8x
Table 3: Materialisation Times with Axiomatisation and Rewriting
Test Claros DBpedia OpenCyc
Threads AX REW AX
REW
AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd sec spd sec spd
1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5
2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9
4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1
8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2
12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5
16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9
Test UniProt UOBM
Threads AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd
1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3
2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5
4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6
8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5
12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3
16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3
mode takes less than ten seconds, these results are difficult
o measure and are susceptible to skew.
Our results confirm that rewriting can significantly reduce
materialisation times. RDFox was consistently faster in the
REW mode than in the AX mode even on UniProt, where the
eduction in the number of triples is negligible. This is due to
he reduction in the number of derivations, mainly involving
ules (⇡1)–(⇡5), which is still significant on UniProt. In all
cases, the speedup of rewriting is typically much larger than
connected by the :hasSameHomeTownWith property. Thi
property is also symmetric and transitive so, for each pai
of connected resources, the number of times each triple i
derived by the transitivity rule is quadratic in the number o
connected resources. This leads to a large number of dupli
cate derivations that do not involve equality. Thus, althoug
it is helpful, rewriting does not reduce the number of deriva
tion in the same way as, for example, on Claros, which ex
plains the relatively modest speedup of REW over AX.
Speedup is bigger than the reduction in the number of triples
The number of derivations is the determining factor
Contrary to popular belief!
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
Incremental Materialisation Maintenance
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
2 Rederive facts that have an alternative derivation
C0(x) ← C0(x)D ∧ A(x)
C0(x) ← C0(x)D ∧ B(x)
Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n
C0(x) ← C0(x)D ∧ Cn(x)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ?
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a) ?
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
7 Stop propagation and terminate
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
EVALUATION OF THE B/F ALGORITHM
Table 2: Experimental results
Dataset |E | |I  I0
|
Rematerialise DRed B/F
Time Derivations Time Derivations Time Derivations
(s) Fwd (s) |D| DR2 DR4 DR5 (s) |C| Bwd Sat Del Prop
LUBM-1k-L
100 113 139.4 212.5M 0.0 1.0k 1.1k 0.8k 1.0k 0.0 0.5k 0.2k 0.3k 0.2k
|E| = 133.6M 5.0k 5.5k 101.8 212.5M 0.2 55.5k 67.2k 46.9k 59.8k 0.2 23.0k 9.3k 13.7k 7.4k
|I| = 182.4M 2.5M 2.7M 138.5 208.8M 39.4 10.3M 15.2M 6.6M 11.5M 32.8 10.0M 4.1M 5.6M 3.7M
Mt = 121.5s 5.0M 5.5M 91.8 205.0M 54.8 17.8M 26.3M 10.5M 18.9M 62.3 18.8M 7.8M 10.1M 7.5M
Md = 212.5M 7.5M 8.3M 89.2 201.3M 71.5 24.3M 35.5M 13.6M 24.3M 85.4 26.7M 11.0M 14.0M 11.2M
10.0M 11.0M 99.5 197.5M 127.9 30.0M 43.1M 15.9M 28.1M 102.2 34.1M 14.0M 17.4M 15.0M
UOBM-1k-Uo
100 160 3482.0 3.6G 8797.6 1.8G 2.6G 53.2M 2.6G 5.4 0.8k 0.5k 1.3k 0.5k
|E| = 254.8M 5.0k 85.2k 3417.8 3.6G 9539.3 1.8G 2.6G 53.2M 2.6G 28.2 105.9k 17.9k 42.1k 104.1k
|I| = 2.2G 17.0M 130.9M 3903.1 3.4G 8934.3 1.8G 2.7G 63.7M 2.5G 988.8 175.8M 47.6M 104.0M 196.7M
Mt = 5034.0s 34.0M 269.0M 4084.1 3.2G 9492.5 1.9G 2.8G 68.4M 2.4G 1877.2 340.7M 87.5M 182.3M 401.1M
Md = 3.6G 51.0M 422.8M 4010.0 3.0G 10659.3 1.9G 2.9G 71.5M 2.2G 2772.7 513.7M 125.2M 246.8M 622.0M
68.0M 581.4M 3981.9 2.8G 11351.6 1.9G 2.9G 73.3M 2.1G 3737.3 687.0M 162.5M 289.5M 848.6M
Claros-L
100 212 62.9 128.6M 0.0 0.8k 1.0k 0.2k 0.5k 0.0 0.6k 0.3k 0.7k 0.5k
|E| = 18.8M 5.0k 11.3k 62.8 128.6M 0.4 37.8k 50.7k 10.9k 23.9k 0.4 29.1k 18.8k 35.3k 26.8k
|I| = 74.2M 0.6M 1.3M 62.3 125.6M 32.3 4.1M 5.5M 1.1M 2.5M 14.9 3.1M 2.0M 3.6M 3.0M
Mt = 78.9s 1.2M 2.6M 61.2 122.6M 53.2 7.8M 10.8M 2.0M 4.8M 33.6 6.1M 3.8M 6.7M 6.0M
Md = 128.6M 1.7M 4.0M 60.5 119.5M 73.6 11.4M 15.9M 2.8M 6.8M 47.8 8.9M 5.6M 9.5M 9.1M
2.3M 5.5M 60.0 116.3M 91.0 14.8M 20.9M 3.6M 8.6M 60.6 11.7M 7.3M 12.0M 12.3M
Claros-LE
100 0.5k 3992.8 12.6G 0.0 1.3k 2.0k 0.3k 0.9k 0.0 1.0k 0.7k 1.0k 1.1k
|E| = 18.8M 2.5k 178.9k 5235.1 12.6G 8077.4 5.5M 11.7G 176.6k 11.7G 10.3 216.4k 161.2k 8.8M 320.0k
|I| = 533.7M 5.0k 427.5k 4985.1 12.6G 7628.2 6.0M 11.7G 186.0k 11.7G 16.5 485.6k 369.0k 8.9M 769.3k
Mt = 4024.5s 7.5k 609.6k 4855.0 12.6G 7419.1 6.5M 11.7G 193.9k 11.7G 19.5 683.4k 516.8k 9.0M 1.1M
Md = 12.9G 10.0k 780.8k 5621.3 12.6G 7557.9 6.8M 11.7G 207.6k 11.7G 3907.2 6.0M 723.0M 11.7G 16.9M
Test Datasets Table 2 summarises the properties of the
four datasets we used in our tests.
LUBM (Guo, Pan, and Heflin 2005) is a well-known RDF
benchmark. We extracted the datalog fragment of the LUBM
a congruence relation: derivations with owl:sameAs tend to
proliferate so an efficient incremental algorithm would have
to treat this property directly; we leave developing such an
extension to our future work. Please note that omitting the
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
Conclusion
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
Conclusion
RESEARCH DIRECTIONS
Add a data/query/reasoning distribution layer:
Initial results very promising
Implementation in progress
Future work:
Investigate potential for data compression
Improve join cardinality estimation
Improve query planning
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13

Mais conteúdo relacionado

Mais procurados

Big Data - Conceptos, herramientas y patrones
Big Data - Conceptos, herramientas y patronesBig Data - Conceptos, herramientas y patrones
Big Data - Conceptos, herramientas y patronesJuan José Domenech
 
Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)
Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)
Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)Leinylson Fontinele
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information RetrievalHarsh Thakkar
 
Chapter 2: Data Management Overviews
Chapter 2: Data Management OverviewsChapter 2: Data Management Overviews
Chapter 2: Data Management OverviewsAhmed Alorage
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Chapter 4: Data Architecture Management
Chapter 4: Data Architecture ManagementChapter 4: Data Architecture Management
Chapter 4: Data Architecture ManagementAhmed Alorage
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Banco de Dados - Part01
Banco de Dados - Part01Banco de Dados - Part01
Banco de Dados - Part01Rangel Javier
 
Chapter 5: Data Development
Chapter 5: Data Development Chapter 5: Data Development
Chapter 5: Data Development Ahmed Alorage
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
MicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best PracticesMicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best PracticesBiBoard.Org
 
Chapter 6: Data Operations Management
Chapter 6: Data Operations ManagementChapter 6: Data Operations Management
Chapter 6: Data Operations ManagementAhmed Alorage
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)Myungjin Lee
 
A Business Intelligence requirement gathering checklist
A Business Intelligence requirement gathering checklistA Business Intelligence requirement gathering checklist
A Business Intelligence requirement gathering checklistMadhumita Mantri
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graphAlan Morrison
 
Parquet and impala overview external
Parquet and impala overview externalParquet and impala overview external
Parquet and impala overview externalmattlieber
 

Mais procurados (20)

Système d'Information Géographique
Système d'Information GéographiqueSystème d'Information Géographique
Système d'Information Géographique
 
Big Data - Conceptos, herramientas y patrones
Big Data - Conceptos, herramientas y patronesBig Data - Conceptos, herramientas y patrones
Big Data - Conceptos, herramientas y patrones
 
Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)
Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)
Banco de Dados I - Aula 11 - Linguagem de Consulta SQL (Comandos DDL)
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Document Database
Document DatabaseDocument Database
Document Database
 
Chapter 2: Data Management Overviews
Chapter 2: Data Management OverviewsChapter 2: Data Management Overviews
Chapter 2: Data Management Overviews
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Chapter 4: Data Architecture Management
Chapter 4: Data Architecture ManagementChapter 4: Data Architecture Management
Chapter 4: Data Architecture Management
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Banco de Dados - Part01
Banco de Dados - Part01Banco de Dados - Part01
Banco de Dados - Part01
 
Chapter 5: Data Development
Chapter 5: Data Development Chapter 5: Data Development
Chapter 5: Data Development
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
MicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best PracticesMicroStrategy Design Challenges - Tips and Best Practices
MicroStrategy Design Challenges - Tips and Best Practices
 
Chapter 6: Data Operations Management
Chapter 6: Data Operations ManagementChapter 6: Data Operations Management
Chapter 6: Data Operations Management
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
지식그래프 개념과 활용방안 (Knowledge Graph - Introduction and Use Cases)
 
A Business Intelligence requirement gathering checklist
A Business Intelligence requirement gathering checklistA Business Intelligence requirement gathering checklist
A Business Intelligence requirement gathering checklist
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Parquet and impala overview external
Parquet and impala overview externalParquet and impala overview external
Parquet and impala overview external
 
Banco de Dados - Conceitos Básicos
Banco de Dados - Conceitos BásicosBanco de Dados - Conceitos Básicos
Banco de Dados - Conceitos Básicos
 

Semelhante a Parallel and incremental materialisation of RDF/DATALOG in RDFOX

Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationDBOnto
 
Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-NormalisationAjit Nayak
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDFSara EL HASSAD
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012taxonbytes
 
Learning Commonalities in RDF and SPARQL
Learning Commonalities in RDF and SPARQLLearning Commonalities in RDF and SPARQL
Learning Commonalities in RDF and SPARQLSara EL HASSAD
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidFederico Campoli
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQLOlaf Hartig
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataRoberto García
 

Semelhante a Parallel and incremental materialisation of RDF/DATALOG in RDFOX (20)

Parallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox PresentationParallel Datalog Reasoning in RDFox Presentation
Parallel Datalog Reasoning in RDFox Presentation
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-Normalisation
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF Graphs
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDF
 
Compact Representation of Large RDF Data Sets for Publishing and Exchange
Compact Representation of Large RDF Data Sets for Publishing and ExchangeCompact Representation of Large RDF Data Sets for Publishing and Exchange
Compact Representation of Large RDF Data Sets for Publishing and Exchange
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
 
Learning Commonalities in RDF and SPARQL
Learning Commonalities in RDF and SPARQLLearning Commonalities in RDF and SPARQL
Learning Commonalities in RDF and SPARQL
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) Acid
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQL
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 

Mais de Ioan Toma

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczIoan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionIoan Toma
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...Ioan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingIoan Toma
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadIoan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesIoan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsIoan Toma
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesIoan Toma
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in productionIoan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphIoan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...Ioan Toma
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolutionIoan Toma
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyIoan Toma
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 

Mais de Ioan Toma (18)

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Parallel and incremental materialisation of RDF/DATALOG in RDFOX

  • 1. PARALLEL AND INCREMENTAL MATERIALISATION OF RDF/DATALOG IN RDFOX Boris Motik University of Oxford March 20, 2015
  • 2. TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
  • 3. Introduction TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
  • 4. Introduction RDFOX SUMMARY RDFox: a new RDF store and reasoner http://www.cs.ox.ac.uk/isg/tools/RDFox/ Features: RAM-based storage of RDF data Currently centralised, but a distributed system is in the works Datalog reasoning via materialisation Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL Very effective parallelisation Efficient reasoning with owl:sameAs via rewriting Known and widely-used technique, but correctness not trivial The Backward-Forward (B/F) incremental maintenance algorithm Considerably improves on DRed Compatible with rewriting of owl:sameAs SPARQL query answering Most of SPARQL 1.0 and some of SPARQL 1.1 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
  • 5. Parallel Materialisation of Datalog TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
  • 6. Parallel Materialisation of Datalog MAIN CHALLENGES 1 Assign workload to threads evenly Rules are generally not independent due to recursion Static assignment of rule instances can be affected due to data skew ⇒ Dynamic assignment with low overhead needed 2 Efficiently interleave . . . . . . querying (during evaluation of rule bodies) . . . updates (during updates of derived facts) 3 Provide indexes for efficient rule body evaluation Crucial for elimination of duplicate triples ⇒ ensures termination Usually sorted (and clustered) to allow for merge joins Hash indexes can also be used Individual (i.e., not bulk) index updates are inefficient B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
  • 7. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 8. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM ⇒ R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 9. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) ⇒ R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 10. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) ⇒ R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(b) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 11. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) ⇒ R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(b) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 12. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) ⇒ A(a) R(c,f) R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(a,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 13. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) ⇒ R(c,f) R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(c) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 14. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) ⇒ R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(c) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 15. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) ⇒ A(b) A(c) A(d) A(e) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(b,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 16. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) ⇒ A(c) A(d) A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(c,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 17. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) ⇒ A(d) A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(d,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 18. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) ⇒ A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(e,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 19. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) A(e) ⇒ A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(f,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 20. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) A(e) A(f) ⇒ A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(g,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 21. Parallel Materialisation of Datalog PARALLELISING COMPUTATION Each thread extracts facts and evaluates subqueries independently The number of subqueries is determined by the number of facts ensures in practice that threads are equally loaded Requires no thread synchronisation ⇒ We partition rule instances dynamically and with little overhead Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
  • 22. Parallel Materialisation of Datalog SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY The critically algorithm depends on: matching atoms t1, t2, t3 with ti a constant or variable continuous concurrent updates Our RDF storage data structure: Hash-based indexes ⇒ naturally parallel data structure ‘Mostly’ lock-free: at least one thread makes progress at most of the time compare: if a thread acquire a lock and dies, other threads are blocked main benefit: performance is less susceptible to scheduling decisions Main technical challenge: reduce thread interference When A writes to a location cached by B, the cache of B is invalidated Our updates ensure that threads (typically) write to different locations Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
  • 23. Parallel Materialisation of Datalog EVALUATION I: PARALLELISATION SPEEDUP RDFox: an RDF store developed at Oxford University http://www.cs.ox.ac.uk/isg/tools/RDFox/ 8 16 24 32 2 4 6 8 10 12 14 16 18 20 ClarosL ClarosLE DBpediaL DBpediaLE LUBML 01K LUBMU 01K 8 16 24 32 2 4 6 8 10 12 14 16 18 20 UOBML 01K UOBMU 010 LUBMLE 01K LUBML 05K LUBMLE 05K LUBMU 05K Small concurrency overhead; parallelisation pays off already with two threads Speedup continues to increase after we exhaust all physical cores ⇒ hyperthreading and parallelism can compensate CPU cache misses Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
  • 24. Parallel Materialisation of Datalog EVALUATION II: ORACLE’S SPARC T5 Machine specification: 4 TB of RAM 128 physical cores that support 1024 virtual cores via hyperthreading Name Triples Time (s) Speedup Inference Initial Resulting Threads rate 1 1024 (triples/s) ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M LUBML-140k: 8 G triples, materialised to 10.9 G triples 20 threads: 2000 s, inference rate 1.45 M triples/s 128 threads: 599 s, inference rate 4.84 M triples/s Materialised dataset used about half of RAM (2 TB) Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
  • 25. Handling owl:sameAs via Rewriting TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
  • 26. Handling owl:sameAs via Rewriting HANDLING owl:sameAs VIA REWRITING Rewriting: replace all equal constants with one representative Well known approach; used in graphDB, Oracle, WebPIE, . . . Much more efficient than direct materialisation Open question I: Effective parallelisation Lock-free maintenance of representatives Care needed to ensure correctness and nonrepetition of derivations Open question II: Query evaluation Could expand rewritten data before query evaluation, but that is inefficient Better: evaluate queries on rewritten data and expand the answer Such a straightforward approach is incorrect: Result cardinalities might be wrong The presence of FILTER and BIND can make results incorrect B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
  • 27. Handling owl:sameAs via Rewriting EVALUATION OF REWRITING UOBM 279 4 2.2M AX 36M 1.2 332M 16,152M REW 9.4M 9.7M 0.4 33.8M 4,256M 686 factor 3.2x 3.2x 9.9x 3.8x Table 3: Materialisation Times with Axiomatisation and Rewriting Test Claros DBpedia OpenCyc Threads AX REW AX REW AX REW AX REW AX REW AX REW sec spd sec spd sec spd sec spd sec spd sec spd 1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5 2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9 4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1 8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2 12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5 16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9 Test UniProt UOBM Threads AX REW AX REW AX REW AX REW sec spd sec spd sec spd sec spd 1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3 2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5 4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6 8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5 12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3 16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3 mode takes less than ten seconds, these results are difficult o measure and are susceptible to skew. Our results confirm that rewriting can significantly reduce materialisation times. RDFox was consistently faster in the REW mode than in the AX mode even on UniProt, where the eduction in the number of triples is negligible. This is due to he reduction in the number of derivations, mainly involving ules (⇡1)–(⇡5), which is still significant on UniProt. In all cases, the speedup of rewriting is typically much larger than connected by the :hasSameHomeTownWith property. Thi property is also symmetric and transitive so, for each pai of connected resources, the number of times each triple i derived by the transitivity rule is quadratic in the number o connected resources. This leads to a large number of dupli cate derivations that do not involve equality. Thus, althoug it is helpful, rewriting does not reduce the number of deriva tion in the same way as, for example, on Claros, which ex plains the relatively modest speedup of REW over AX. Speedup is bigger than the reduction in the number of triples The number of derivations is the determining factor Contrary to popular belief! Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
  • 28. Incremental Materialisation Maintenance TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
  • 29. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 30. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 31. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 32. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: 1 Delete all facts with a derivation from A(a) C0(x)D ← A(x)D C0(x)D ← B(x)D Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n C0(x)D ← Cn(x)D Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 33. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: 1 Delete all facts with a derivation from A(a) C0(x)D ← A(x)D C0(x)D ← B(x)D Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n C0(x)D ← Cn(x)D 2 Rederive facts that have an alternative derivation C0(x) ← C0(x)D ∧ A(x) C0(x) ← C0(x)D ∧ B(x) Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n C0(x) ← C0(x)D ∧ Cn(x) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 34. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) B(a) C0(a) C1(a) . . . Cn(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 35. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 36. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) ? B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 37. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 38. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 39. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) ? C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 40. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 41. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable 6 So C0(a) is derivable too B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 42. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable 6 So C0(a) is derivable too 7 Stop propagation and terminate B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 43. Incremental Materialisation Maintenance EVALUATION OF THE B/F ALGORITHM Table 2: Experimental results Dataset |E | |I I0 | Rematerialise DRed B/F Time Derivations Time Derivations Time Derivations (s) Fwd (s) |D| DR2 DR4 DR5 (s) |C| Bwd Sat Del Prop LUBM-1k-L 100 113 139.4 212.5M 0.0 1.0k 1.1k 0.8k 1.0k 0.0 0.5k 0.2k 0.3k 0.2k |E| = 133.6M 5.0k 5.5k 101.8 212.5M 0.2 55.5k 67.2k 46.9k 59.8k 0.2 23.0k 9.3k 13.7k 7.4k |I| = 182.4M 2.5M 2.7M 138.5 208.8M 39.4 10.3M 15.2M 6.6M 11.5M 32.8 10.0M 4.1M 5.6M 3.7M Mt = 121.5s 5.0M 5.5M 91.8 205.0M 54.8 17.8M 26.3M 10.5M 18.9M 62.3 18.8M 7.8M 10.1M 7.5M Md = 212.5M 7.5M 8.3M 89.2 201.3M 71.5 24.3M 35.5M 13.6M 24.3M 85.4 26.7M 11.0M 14.0M 11.2M 10.0M 11.0M 99.5 197.5M 127.9 30.0M 43.1M 15.9M 28.1M 102.2 34.1M 14.0M 17.4M 15.0M UOBM-1k-Uo 100 160 3482.0 3.6G 8797.6 1.8G 2.6G 53.2M 2.6G 5.4 0.8k 0.5k 1.3k 0.5k |E| = 254.8M 5.0k 85.2k 3417.8 3.6G 9539.3 1.8G 2.6G 53.2M 2.6G 28.2 105.9k 17.9k 42.1k 104.1k |I| = 2.2G 17.0M 130.9M 3903.1 3.4G 8934.3 1.8G 2.7G 63.7M 2.5G 988.8 175.8M 47.6M 104.0M 196.7M Mt = 5034.0s 34.0M 269.0M 4084.1 3.2G 9492.5 1.9G 2.8G 68.4M 2.4G 1877.2 340.7M 87.5M 182.3M 401.1M Md = 3.6G 51.0M 422.8M 4010.0 3.0G 10659.3 1.9G 2.9G 71.5M 2.2G 2772.7 513.7M 125.2M 246.8M 622.0M 68.0M 581.4M 3981.9 2.8G 11351.6 1.9G 2.9G 73.3M 2.1G 3737.3 687.0M 162.5M 289.5M 848.6M Claros-L 100 212 62.9 128.6M 0.0 0.8k 1.0k 0.2k 0.5k 0.0 0.6k 0.3k 0.7k 0.5k |E| = 18.8M 5.0k 11.3k 62.8 128.6M 0.4 37.8k 50.7k 10.9k 23.9k 0.4 29.1k 18.8k 35.3k 26.8k |I| = 74.2M 0.6M 1.3M 62.3 125.6M 32.3 4.1M 5.5M 1.1M 2.5M 14.9 3.1M 2.0M 3.6M 3.0M Mt = 78.9s 1.2M 2.6M 61.2 122.6M 53.2 7.8M 10.8M 2.0M 4.8M 33.6 6.1M 3.8M 6.7M 6.0M Md = 128.6M 1.7M 4.0M 60.5 119.5M 73.6 11.4M 15.9M 2.8M 6.8M 47.8 8.9M 5.6M 9.5M 9.1M 2.3M 5.5M 60.0 116.3M 91.0 14.8M 20.9M 3.6M 8.6M 60.6 11.7M 7.3M 12.0M 12.3M Claros-LE 100 0.5k 3992.8 12.6G 0.0 1.3k 2.0k 0.3k 0.9k 0.0 1.0k 0.7k 1.0k 1.1k |E| = 18.8M 2.5k 178.9k 5235.1 12.6G 8077.4 5.5M 11.7G 176.6k 11.7G 10.3 216.4k 161.2k 8.8M 320.0k |I| = 533.7M 5.0k 427.5k 4985.1 12.6G 7628.2 6.0M 11.7G 186.0k 11.7G 16.5 485.6k 369.0k 8.9M 769.3k Mt = 4024.5s 7.5k 609.6k 4855.0 12.6G 7419.1 6.5M 11.7G 193.9k 11.7G 19.5 683.4k 516.8k 9.0M 1.1M Md = 12.9G 10.0k 780.8k 5621.3 12.6G 7557.9 6.8M 11.7G 207.6k 11.7G 3907.2 6.0M 723.0M 11.7G 16.9M Test Datasets Table 2 summarises the properties of the four datasets we used in our tests. LUBM (Guo, Pan, and Heflin 2005) is a well-known RDF benchmark. We extracted the datalog fragment of the LUBM a congruence relation: derivations with owl:sameAs tend to proliferate so an efficient incremental algorithm would have to treat this property directly; we leave developing such an extension to our future work. Please note that omitting the Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
  • 44. Conclusion TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
  • 45. Conclusion RESEARCH DIRECTIONS Add a data/query/reasoning distribution layer: Initial results very promising Implementation in progress Future work: Investigate potential for data compression Improve join cardinality estimation Improve query planning Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13