Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL
1. Digital Enterprise Research Institute www.deri.ie
Improve Efficiency of Mapping Data
between XML and RDF with XSPARQL
Stefan Bischof, Nuno Lopes, and Axel Polleres
Int. Conf. on Web Reasoning and Rule Systems
August 28, 2011
13/03/2008 FAST kick-off, Madrid, 2008 1
Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
2. XSPARQL: Bridging the gap of XML and RDF
Digital Enterprise Research Institute www.deri.ie
SPARQL XQuery
RDF XSPARQL XML
Example FOAF XSPARQL XML
2
3. Problem: Evaluating Nested Graph Patterns
Digital Enterprise Research Institute www.deri.ie
XQuery SPARQL
for $p $name from <persons.rdf>
where { $p a foaf:Person .
$p foaf:name $name . }
return
<person>
<name>{ $name }</name>
for $friend from <persons.rdf>
where { $p foaf:knows $friend .
$friend foaf:name $fname . }
return
<friend>{ $fname }</friend>
</person>
3
4. One Approach: Nested Loop Join in XQuery
Digital Enterprise Research Institute www.deri.ie
friendlist :=
for $fname from <persons.rdf> XQuery SPARQL
where { $p1 foaf:knows $friend .
$friend foaf:name $fname . }
for $p $name from <persons.rdf>
where { $p a foaf:Person .
$p foaf:name $name . }
return
<person>
<name>{ $name }</name>
for $friend in friendlist
where $p = $friend/$p1 Join
return
<friend>{ $fname }</friend>
</person>
4
5. Evaluation Results
Digital Enterprise Research Institute www.deri.ie
n
tio
ta
1000
en
m
ple
Im
RE
QL
E
R
se H
PA
au W
XS
Cl oop
h
100 X Pat
ive
L
oop
ed
s
dL tern
Na
st
te Pat
Time (sec)
Nes
Ne
rg e ph
Me Gra
S ort- M e rg
e
h
G rap
m ed
10 Na
1
1 10 100
Dataset Size (MB) scales with
number of saved
SPARQL calls
5
6. Future Work/My PhD Proposal
Digital Enterprise Research Institute www.deri.ie
• Formalise the integrated language XSPARQL
– Formalism combining XQuery (functional) with SPARQL (rel.algebra)
• Optimise XSPARQL using this formal model
– Currently only manual optimisations
– Useful for any approach manipulating both XML and RDF data
• RDFS + OWL reasoning
– Add different kinds of reasoning to the formal model
– SPARQL 1.1 entailment regimes
6
7. Thanks for your attention!
Digital Enterprise Research Institute www.deri.ie
Questions about
Digital Enterprise Research Institute
XSPARQL, syntax, Improve Efficiency of Mapping Data
between XML and RDF with XSPARQL
Stefan Bischof, Nuno Lopes, and Axel Polleres
semantics, XSPARQL: Bridging the gap of XML and RDF
‣ Language to map data between XML and RDF
‣ Combines the strengths of XQuery and SPARQL query languages
Evaluation: Optimisations on several data sizes
‣ XMark benchmarks for XQuery adopted to XSPARQL use case
‣ Optimisations are applicable for the 3 slowest out of 20 queries
implementation,
‣ Provides XQuery’s function library to SPARQL
‣ Provides SPARQL’s graph pattern matching facility to XQuery Results: XSPARQL can be faster
‣ Optimisations performed always better than standard XSPARQL
‣ SPARQL join optimisations were the fastest (when applicable)
XQuery SPARQL
prototype,
1000
n
tio
ta
en
XML XSPARQL RDF
m
ple
se
Im
au
Cl
QL
RE
R
optimisation,
PA
100
HE
XS
W
Prototype: Rewrite XSPARQL to XQuery
op
ive
Time (sec)
Lo
Na
th
Pa
ed
pX
st
oo
‣ Uses standard XQuery and SPARQL engines
Ne
dL
e ste
-Me
rg Ne
Sort
XSPARQL XML RDF erns
Patt
query data data raph
performance,
10 eG
Merg ph
Gra
Na med
XSPARQL XQuery XQuery SPARQL
rewriter query engine engine
XML or scales with
RDF
1
RDF/XML …
number of saved
SPARQL calls
1 10 100
‣ Try the prototype http://xsparql.deri.org/demo Dataset Size (MB)
Problem: Evaluating Nested Graph Patterns Conclusion: Maintainable and Efficient Mapping
‣ Loops with nested graph patterns result in a large number ‣ Performance of standard XSPARQL is drastically reduced for
interactions between XQuery and SPARQL engines queries containing nested graph patterns
‣ Prototype evaluates such joins naively as nested loop join ‣ Performance of such queries improves with different optimisations
‣ Prototype is unable to exploit high similarity of the SPARQL calls ‣ XSPARQL can provide better performance than ad-hoc setups for
mapping data between XML and RDF
Proposed Optimisations
‣ Minimize communication overhead for problematic queries Future Work: More Optimisations and Features
… visit us at our
‣ Reduce the number of interactions between XQuery and SPARQL ‣ Query also relational databases
‣ Perform only a static number of SPARQL calls by moving the join ‣ Create a concise formalisation of XSPARQL
‣ Move join to pure XQuery ‣ Exploit properties of XSPARQL fragments for optimisation
-! Nested loop join using an XQuery WHERE clause or XPath ‣ Support SPARQL 1.1 and SPARQL 1.1 Entailment Regimes
poster in the
-! Tail recursive implementation of sort-merge join More information http://xsparql.deri.org/
‣ Move join to SPARQL
Acknowledgements
-! Join by merging SPARQL graph patterns
This work has been funded by Science Foundation Ireland, Grant No. SFI/08/CI/I1380 (Lion-2) and
-! Join using named graph injection in triple store by an IRCSET scholarship
afternoon! Enabling Networked Knowledge
7