Parallel Datalog Reasoning in RDFox Presentation

PARALLEL DATALOG REASONING IN RDFOX
Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, Dan Olteanu
University of Oxford
September 25, 2014

TABLE OF CONTENTS
1 INTRODUCTION
2 OUR APPROACH TO PARALLEL MATERIALISATION
3 EVALUATION
4 CONCLUSION
Motik et al. Parallel Datalog Reasoning in RDFox 0/9

Introduction
TABLE OF CONTENTS
1 INTRODUCTION
3 EVALUATION
4 CONCLUSION

Introduction
CURRENT TRENDS IN KNOWLEDGE/DATABASE SYSTEMS
Materialisation is computationally intensive ) natural to parallelise
mid-range laptops have 4 cores, servers with 16 cores are routine
Price of RAM keeps falling
128 GB is routine, systems with 1 TB are emerging
in-memory databases: SAP’s HANA, Oracle’s TimesTen, YarcData’s Urika

Introduction
RDFOX SUMMARY
RDFox: a new RDF store and reasoner
http://www.cs.ox.ac.uk/isg/tools/RDFox/
store data in RAM
effectively use modern multi-core architectures
Core focus: materialisation reasoning (recursive) datalog rules
e.g., hx; rdf :type; yi ^ hy; rdfs:subClassOf ; zi ! hx; rdf :type; zi
handle OWL 2 RL and the Semantic Web Rule Language (SWRL)
explicitly store all implied triples so that queries can be evaluated directly
used in a number of systems: Oracle’s database, OWLIM, WebPIE
Used in projects for reasoning beyond OWL 2 RL
‘combined approach’ for reasoning with OWL 2 EL
PAGOaA reasoner for OWL 2 DL comdining RDFox with HermiT

Our Approach to Parallel Materialisation
TABLE OF CONTENTS
1 INTRODUCTION
3 EVALUATION
4 CONCLUSION

EXISTING APPROACHES TO PARALLEL MATERIALISATION
Interquery parallelism: run independent rules in parallel
degree of parallelism limited by the number of independent rules
) does not distribute workload to cores evenly
Intraquery parallelism: partition rule instantiations to N threads
e.g., constrain the body of rules evaluated by thread i to (x mod N = i)
) static partitioning may not distribute workload well due to data skew
) dynamic partitioning may incur an overhead due to load balancing
Goal: distribute workload to threads evenly and with minimum overhead

SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ^ R(x; y) ! A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery:

) R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: A(a)

R(a,b)
) R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: A(a)

R(a,b)
R(a,c)
) R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: A(b)

R(a,b)
R(a,c)
R(b,d)
) R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: A(b)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
) A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(a,y)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
) R(c,f)
R(c,g)
A(b)
A(c)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: A(c)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
) R(c,g)
A(b)
A(c)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: A(c)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
) A(b)
A(c)
A(d)
A(e)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(b,y)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
) A(c)
A(d)
A(e)
A(f)
A(g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(c,y)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
) A(d)
A(e)
A(f)
A(g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(d,y)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
) A(e)
A(f)
A(g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(e,y)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
) A(f)
A(g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(f,y)

R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
A(f)
) A(g)
A(x) ^ R(x; y) ! A(y)
For each fact:
Current subquery: R(g,y)

PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of facts
ensures in practice that threads are equally loaded
Requires no thread synchronisation
) We partition rule instances dynamically and with little overhead
EXAMPLE WHEN PARALLELISATION FAILS
A(x) ^ R(x; y) ! A(y)
R(a; b); R(b; c); R(c; d); R(d; e); A(a)

A(x) ^ R(x; y) ! A(y)
R(a; b); R(b; c); R(c; d); R(d; e); A(a); A(b)

A(x) ^ R(x; y) ! A(y)
R(a; b); R(b; c); R(c; d); R(d; e); A(a); A(b); A(c)

A(x) ^ R(x; y) ! A(y)
R(a; b); R(b; c); R(c; d); R(d; e); A(a); A(b); A(c); A(d)

A(x) ^ R(x; y) ! A(y)
R(a; b); R(b; c); R(c; d); R(d; e); A(a); A(b); A(c); A(d); A(e)

SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY
The critically algorithm depends on:
matching atoms ht1; t2; t3i with ti a constant or variable
continuous concurrent updates
Our RDF storage data structure:
hash-based indexes ) naturally parallel data structure
‘mostly’ lock-free: at least one thread makes progress at most of the time
compare: if a thread acquire a lock and dies, other threads are blocked
main benefit: performance is less susceptible to scheduling decisions
Main technical challenge: reduce thread interference
when A writes to a location cached by B, the cache of B is invalidated
our updates ensure that threads (typically) write to different locations

Evaluation
TABLE OF CONTENTS
1 INTRODUCTION
3 EVALUATION
4 CONCLUSION

Evaluation
CONCURRENCY OVERHEAD AND PARALLELISATION SPEEDUP
RDFox: an RDF store developed at Oxford University
http://www.cs.ox.ac.uk/isg/tools/RDFox/
8 16 24 32
20
18
16
14
12
10
8
6
4
2
ClarosL
ClarosLE
DBpediaL
DBpediaLE
LUBML01K
LUBMU01K
8 16 24 32
20
18
16
14
12
10
8
6
4
2
UOBML01K
UOBMU010
LUBMLE 01K
LUBML05K
LUBMLE 05K
LUBMU05K
Small concurrency overhead; parallelisation pays off already with two threads
Speedup continues to increase after we exhaust all physical cores
) hyperthreading and parallelism can compensate CPU cache misses

32 127 19.5 285 17.5 24 6.6 602 15.1 7 10.1 71 13.4 10 13.4 Mem Mem Mem 168 16.3 31 17.1
Evaluation
Seq. imp. 61 57 331 335 421 408 410 2610 2587 2643 6 798
Par. imp. 58 -4.9% 58 1.7% COMPARISON 334 0.9% 367 9.5% WITH 415 1.4% 415 THE 1.7% STATE 415 1.2% 2710 OF 3.8% THE 3553 ART
37% 2733 3.4% 6 0.0% 783 -1.9%
Triples 95.5M 555.1M 118.3M 1529.7M 182.4M 332.6M 219.0M 911.2M 1661.0M 1094.0M 35.6M 429.6M
Memory 4.2GB 18.0GB 6.1GB 51.9GB 9.3GB 13.8GB 11.1GB 49.0GB 75.5GB 53.3GB 1.1GB 20.2GB
Active 93.3% 98.8% 28.1% 94.5% 66.5% 90.3% 85.3% 66.5% 90.3% 85.3% 99.7% 99.1%
(b) Comparison of SysX with DBRDF and OWLIM-Lite
SysX PG-VP MDB-VP PG-TT MDB-TT OWLIM-Lite
Import Materialise Import Materialise Import Materialise Import Import Import Materialise
T B/t T B/t T B/t T B/t T B/t T B/t T B/t T B/t T B/t T B/t
ClarosL 48 89
2062 60
1138 165
25026 97
883 58
Mem
1174 217 896 94 300 54
14293 28
ClarosLE 4218 44 Mem Mem Time
DBpediaL 143 67
Mem
354 49
274 69
5844 148
15499 51
5968 174 4492 92 1735 171
14855 164
DBpediaLE 9538 49 Mem Mem Time
LUBML01K
332 74
71 65
7736 144
948 127
6136 56
27 41
7947 187 5606 104 2409 40
316 36
LUBMLE 01K 765 54 15632 112 Mem Time
LUBMU01K 113 67 Time 138 44 Time
UOBMU010 5 68 2501 43 116 154 Time 96 50 Mem 120 203 96 85 32 69 11311 24
UOBML01K 632 64 467 63 13864 119 6708 107 11063 41 358 33 14901 176 10892 101 4778 38 Time
Figure 2: Results of our Empirical Evaluation
DBRDF: an implementation using RDBMS and known techniques
PostgreSQL (row store) or MonetDB (column store) as underlying engine
vertical partitioning (VP) or triple table (TT) storage model
averages over three runs. OWLIM-Lite
materialisation and import, so we loaded each
the test datalog program and once with
subtracted the two times; Ontotext con-firmed
OWLIM-Lite: a commercial RDF store by OntoText
good materialisation time estimate.
Table (a) in Figure 2 show the speedup
number of used threads. The middle part
sequential and parallel import times,
slowdown for the parallel version. The
shows the number of triples, the mem-ory
speedup as s(1) = 1/(1 − p(1)); assuming that
p(1) = p(32), we get s(1) between 8.3 and 50. With 32
threads we thus already approach this maximum, which ex-plains
the flattening of the curves in Figure 2.
Table (b) in Figure 2 compares SysX with DBRDF on
PostgreSQL (PG) and MonetDB (MDB) with VP or TT
scheme, and with OWLIM-Lite. Columns T show the times
in seconds, and columns B/t show the number of bytes per
triple. Import in DBRDF is about 20 times slower than in
SysX; we found out that half of the import time is used in

Conclusion
TABLE OF CONTENTS
1 INTRODUCTION
3 EVALUATION
4 CONCLUSION

Conclusion
RESEARCH DIRECTIONS
Under development/already finished:
Deal with owl:sameAs via rewriting
Incremental deletions/additions
Add a data/query/reasoning distribution layer
Future work:
Investigate potential for data compression
Improve join cardinality estimation
Improve query planning

Parallel Datalog Reasoning in RDFox Presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Parallel Datalog Reasoning in RDFox Presentation

Semelhante a Parallel Datalog Reasoning in RDFox Presentation (20)

Último

Último (20)

Parallel Datalog Reasoning in RDFox Presentation