Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)
1. ICWE 2012 Tutorial
An Introduction to SPARQL and
Queries over Linked Data
●●●
Chapter 3: Querying Linked Data
Olaf Hartig
http://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research Group
Humboldt-Universität zu Berlin
2. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 2
3. SPARQL Endpoints
● SPARQL query processing service
● Supports the SPARQL protocol
● Issuing a SPARQL query is an HTTP GET request
with parameter query
URL-encoded string
with the SPARQL query
GET /sparql?query=PREFIX+rd... HTTP/1.1
Host: dbpedia.org
User-agent: my-sparql-client/0.1
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 3
4. Query Result Formats
● For SELECT and ASK queries: XML, JSON, plain text
● For CONSTRUCT and DESCRIBE: RDF/XML, Turtle, ...
● How to request?
● ACCEPT header
GET /sparql?query=PREFIX+rd... HTTP/1.1
Host: dbpedia.org
User-agent: my-sparql-client/0.1
Accept: application/sparql-results+json
● Non-standard alternative: parameter out
GET /sparql?out=json&query=... HTTP/1.1
Host: dbpedia.org
User-agent: my-sparql-client/0.1
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 4
5. SPARQL Client Libraries
● More convenient than on the protocol level:
● SPARQL JavaScript Library
http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html
● ARC for PHP http://arc.semsol.org/
● RAP – RDF API for PHP
http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
● Jena / ARQ (Java) http://jena.sourceforge.net/
● Sesame (Java) http://www.openrdf.org/
● SPARQL Wrapper (Python)
http://sparql-wrapper.sourceforge.net/
● PySPARQL (Python)
http://code.google.com/p/pysparql/
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 5
6. SPARQL Client Libraries
● Example with Jena ARQ:
import com.hp.hpl.jena.query.*;
String service = "..."; // address of the SPARQL endpoint
String query = "SELECT ..."; // your SPARQL query
QueryExecution e = QueryExecutionFactory.sparqlService( service,
query );
ResultSet results = e.execSelect();
while ( results.hasNext() ) {
QuerySolution s = results.nextSolution();
// …
}
e.close();
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 6
7. SPARQL Endpoints
● Several Linked Data sets exposed via SPARQL endpoint
● DBpedia http://dbpedia.org/sparql
● Musicbrainz http://dbtune.org/musicbrainz/sparql
● Semantic Web dog food http://data.semanticweb.org/sparql
● etc. http://esw.w3.org/topic/SparqlEndpoints
● Send your query, receive the result
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 7
8. SPARQL Endpoints
● Several Linked Data sets exposed via SPARQL endpoint
● DBpedia http://dbpedia.org/sparql
● Musicbrainz http://dbtune.org/musicbrainz/sparql
● Semantic Web dog food http://data.semanticweb.org/sparql
● etc. http://esw.w3.org/topic/SparqlEndpoints
● Send your query, receive the result
Querying a single dataset is quite boring
Querying a single dataset is quite boring
compared to:
compared to:
Issuing SPARQL queries over multiple datasets
Issuing SPARQL queries over multiple datasets
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 8
9. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 9
10. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 10
11. Querying a Given Collection
● Some public SPARQL endpoints provide access to a
collection of data from multiple sources
● http://lod.openlinksw.com/sparql
● http://sparql.sindice.com/
● Pros:
● Nothing to set up
● Good query execution times
● Cons:
● Queried data might be out of date
● Not all relevant data in the collection
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 11
12. Setting up Your Own Collection
● RDF-specific DBMSs:
● Virtuoso http://virtuoso.openlinksw.com/
● Allegro Graph http://www.franz.com/agraph/allegrograph/
● Bigdata http://www.systap.com/bigdata.htm
● OWLIM http://www.ontotext.com/owlim
● 4store http://4store.org/
● Jena TDB
http://jena.apache.org/
● Sesame
http://www.openrdf.org/
● etc.
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 12
13. Populating Your Own Collection
● Datasets provided as RDF dumps
● (Focused) crawling
● ldspider http://code.google.com/p/ldspider/
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 13
14. Setting up Your Own Collection
● Pros:
● All relevant data
● Independent of existence, availability,
efficiency of SPARQL endpoints
● Good query execution times
(once set up properly)
● Cons:
● Effort to set up
● Effort to operate
● Queried data might
be out of date
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 14
15. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 15
16. SPARQL Endpoint Federation
● Idea of federated query processing:
● Querying a query federation
service (mediator)
?
● Mediator distributes
sub-queries to
relevant sources
Finally, mediator ?
? ?
●
combines
sub-results
● Prototypes:
● FedX
● SPLENDID
● ANAPSID
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 16
17. SPARQL Endpoint Federation
● Pros:
● Queried data is up to date ?
● Cons:
● All relevant datasets
must be exposed via
a SPARQL endpoint
?
● Effort to set ? ?
up mediator
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 17
18. SPARQL 1.1 Federation Extension
● SERVICE pattern in SPARQL 1.1
● Explicitly specify query patterns whose execution
must be distributed to a remote SPARQL endpoint
SELECT ?v ?ve WHERE
SELECT ?v ?ve WHERE
{
{
?v rdf:type umbel-sc:Volcano ;
?v rdf:type umbel-sc:Volcano ;
p:location dbpedia:Italy .
p:location dbpedia:Italy .
SERVICE <http://volcanos.example.org/query> {
SERVICE <http://volcanos.example.org/query> {
?v p:lastEruption ?ve }
?v p:lastEruption ?ve }
}
}
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 18
19. For all these approaches ...
● … you have to know the relevant data sources beforehand
● When selecting a SPARQL endpoint over an existing
collection of datasets
● When setting up your own collection
● When configuring your federation system
● When using the SERVICE pattern
● … you restrict yourself to the selected sources
● … you do not tap the full potential of the Web
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 19
20. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 20
21. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Discovered data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 21
22. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Discovered data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 22
23. Main Idea
● Intertwine query evaluation with traversal of data links
We alternate between:
htt
●
p:/
/.
Evaluate parts of the query (triple patterns)
../m ?
●
on a continuously augmented set of data
ov
ie2
44
● Look up URIs in intermediate
9
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 23
24. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://.../movie2449
Query
http://.../movie2449 in
t or_
film http://mdb.../Paul ac
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 24
25. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
? aul
P
on a continuously augmented set of data
.../
db
/m
Look up URIs in intermediate
p:/
●
htt
solutions and add retrieved data
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 25
26. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?actor ?loc
solutions and add retrieved data http://mdb.../Paul http://geo.../Berlin
to the query-local dataset
http://mdb.../Paul
Query liv
http://.../movie2449 es
_in
film http://geo.../Berlin
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 26
27. Main Idea
● Intertwine query evaluation with traversal of data links
?actor
● We alternate between:
http://mdb.../Paul
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?actor ?loc
solutions and add retrieved data http://mdb.../Paul http://geo.../Berlin
to the query-local dataset
Query
http://.../movie2449
film
ing
n
r_i
Lo
ca
to
t io
ac
n
lives_in
?actor ?loc
Queried data
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 27
28. “Real World” Example
SELECT DISTINCT ?author ?phone WHERE {
?pub swc:isPartOf
<http://data.semanticweb.org/conference/eswc/2009/proceedings> .
?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .
?pub swrc:author ?author .
{ ?author owl:sameAs ?authorAlt }
Return phone numbers of
authors of ontology engineering papers
UNION
at ESWC'09.
{ ?authorAlt owl:sameAs ?author }
?authorAlt foaf:phone ?phone Result size 2
} # of retrieved docs 297
# of accessed servers 16
avg. execution time 1min 30sec
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 28
29. Summary
O. Hartig and A. Langegger. A Database Perspective on Consuming
Linked Data on the Web. Datenbankspektrum 10(2), 2010
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 29
30. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
➢ Foundations
➢ Iterator Based Implementation
➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 30
31. SPARQL Pattern Evaluation
eval(P,G ) = { μ1 , μ2 , ... }
http://.../movie2449
film
ing ?actor ?loc
_in
Lo http://mdb.../Paul http://geo.../Berlin
to r
ca
tio
ac
n
lives_in
?actor ?loc
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 31
32. SPARQL Linked Data Query
http://.../movie2449
film
in g
_in
Lo
to r
ca
tio
ac
n
lives_in
?actor ?loc
P
Q (W ) = { μ1 , μ2 , ... }
?actor ?loc
http://mdb.../Paul http://geo.../Berlin
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 32
33. Full-Web Semantics
P
Q (W ) = eval(P2AllData(W ))
{ μ1 , μ, , ... }
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 33
34. Reachability-based Semantics
● Seed URIs S
● Reachability criterion c
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 34
35. Reachability-based Semantics
P,S
Qc ( W ) = eval(P,AllData(W )) *
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 35
36. Reachability-based Semantics
P,S
Qc ( W ) = eval(P,AllData(W ))
All
*
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 36
37. Reachability-based Semantics
P,S
Qc ( W ) = eval(P,AllData(W ))
None
*
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 37
38. Reachability-based Semantics
P,S
Qc ( W ) = eval(P,AllData(W ))
Match
*
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 38
39. Computability
P,S
Qc ( W ) Match
● (Ordinary) Turing machines
unsuitable:
TM
● Limited data access capabilities
not properly captured
● Web machines
● Abiteboul and Vianu, 1997
● Mendelzon and Milo, 1997
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 39
40. LD Machine
● Multi-tape Turing machine
➔ Web Input # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
➔ Input
➔ Work
➔ Output
● Access to Web input is restricted
● Only by performing
a particular procedure
in a particular state
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 40
41. Finitely Computable LD Queries
➔ Web Input # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
➔ Input
➔ Work
➔ Output # enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #
● For Q exists an LD machine MQ such that for any W holds:
● MQ halts after a finite number of computation steps, and
● MQ outputs the complete result Q(W )
∙∙∙
step 1 ∙∙∙ step k - 3 step k - 2 step k – 1 step k
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 41
42. Eventually Computable LD Queries
➔ Web Input # enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
➔ Input
➔ Work
➔ Output # enc(μ1) # enc(μ2)
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W ), and
2. Each μ Q(W ) eventually appears on the output
✗ No guarantee for termination
∙∙∙ ∙∙∙
step step step step step step
Olaf Hartig - ICWE 2012 Tutorial "Ank - 2
k-3 Introduction to SPARQL and Queries over Linked Data" -+ 1 3: Querying 2
k-1 k k Chapter k + Linked Data 42
43. Main Results for cMatch-Semantics
Theorem: Any satisfiable SPARQL based Linked Data
Theorem: Any satisfiable SPARQL based Linked Data
P,S
query QcP,S under cMatch-semantics that is monotonic, is
query Q under cMatch-semantics that is monotonic, is
Match
at least eventually computable;
at least eventually computable;
Any non-monotonic QP,S is either finitely computable
Any non-monotonic QcP,S is either finitely computable
Match
or not even eventually computable.
or not even eventually computable.
Problem:
Problem: TERMINATION(cMatch ))
TERMINATION(cMatch
Web Input: W – a (potentially infinite) Web of Linked Data
Web Input: W – a (potentially infinite) Web of Linked Data
Ord.Input: S – a finite but nonempty set of seed URIs
Ord.Input: S – a finite but nonempty set of seed URIs
P – a SPARQL expression
P – a SPARQL expression
Question:
Question: Does an LD machine exist that computes QcP,S (W ))
Does an LD machine exist that computes QP,S (W Match
and halts?
and halts?
Theorem: TERMINATION(cMatch)) is not LD machine decidable.
Theorem: TERMINATION(cMatch is not LD machine decidable.
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 43
44. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
➢ Foundations
➢ Iterator Based Implementation
➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 44
56. Alternative Execution Order
tp1 = ( ?b , rdf:type , <http://.../Book> ) I1
END!
query-local tp2 = ( ?p , ex:interested_in , ?b ) I2
dataset
Next?
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3
: Next?
<http://.../alice> ex:affiliated_with <http://.../orgaX>
:
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 56
57. Alternative Execution Order
tp1 = ( ?b , rdf:type , <http://.../Book> ) I1
END!
query-local tp2 = ( ?p , ex:interested_in , ?b ) I2
dataset
END!
tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3
Computed query
END!
result may depend
on the order of triple patterns
= logical query execution plan
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 57
58. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
➢ Foundations
➢ Iterator Based Implementation
➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 58
59. Query Plan Selection
● Assessment criteria:
● Cost (query execution time)
● Benefit (size of computed of result)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection
● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE Assumptions about QcP,S : Match
● P refers to instance data
● FILTERING TP RULE
● S = uris(P)
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 59
60. Query Plan Selection
● Assessment criteria:
● Cost (query execution time)
● Benefit (size of computed of result)
● Cost and benefit must be estimated without plan execution
● Estimation impossible due to “zero knowledge”
● Heuristic Based Plan Selection
● DEPENDENCY RESPECT RULE
● SEED TP RULE
● NO VOCAB SEED RULE Assumptions about QcP,S : Match
● P refers to instance data
● FILTERING TP RULE
● S = uris(P)
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 60
61. DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern
already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
Query
tp2 = ( ?p , ex:interested_in , ?b ) √ I2
?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> )
<http://.../orgaX> I3
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 61
62. DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern
already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
tp2 = ( ?p , ex:interested_in , ?b ) I2
Query
?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> )
<http://.../orgaX> I3
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 62
63. DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern
already occurs in one of the preceding triple patterns
tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
tp2 = ( ?b , rdf:type , <http://.../Book> ) I2
Query
?p ex:affiliated_with tp3 = ( ?p , ex:interested_in , ?b )
<http://.../orgaX> I3
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 63
64. DEPENDENCY RESPECT RULE
Use a dependency respecting query plan
● Dependency respect: a variable from each triple pattern
already occurs in one of the preceding triple patterns
● Rationale: tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
Avoid
cartesian
products
tp2 = ( ?b , rdf:type , <http://.../Book> ) I2
Query
?p ex:affiliated_with tp3 = ( ?p , ex:interested_in , ?b )
<http://.../orgaX> I3
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 64
65. SEED TP RULE
Use a plan with a seed triple pattern
● Potential seed triple pattern
… is a triple pattern that contains at least one HTTP URI
● Seed triple pattern of a plan
… is the first triple pattern in the plan and Recall:
S = uris(P)
… is a potential seed triple pattern
Query
● Rationale: good
?p ex:affiliated_with <http://.../orgaX> √ starting point
?p ex:interested_in ?b √
?b rdf:type <http://.../Book> √
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 65
66. NO VOCAB SEED RULE
Avoid a seed triple pattern with vocabulary terms
● Not only vocabulary term URIs in the seed triple pattern
● Patterns to avoid: ?s ex:any_property ?o
?s rdf:type ex:any_class
● Rationale: URIs for vocabulary term usually resolve to
vocabulary definitions with little instance data
Query
?p ex:affiliated_with <http://.../orgaX> √
?p ex:interested_in ?b
?b rdf:type <http://.../Book>
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 66
67. FILTERING TP RULE
Use a plan where all filtering triple patterns are
as close to the seed triple pattern as possible
● Filtering triple pattern: each variable already occurs in one
of the preceding triple patterns
● For each result tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
consumed as input
a filtering TP can { ?p = <http://.../alice> }
only report 1 or 0
results as output tp2 = ( ?p , ex:interested_in , ?b ) I2
tp2' = ( <http://.../alice> , ex:interested_in , ?b )
● Rationale: Reduce { ?p = <http://.../alice> , ?b = <http://.../b1> }
cost
tp3 = ( ?b , rdf:type , <http://.../Book> ) I3
tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 67
68. Evaluation Procedure
● Generate all possible plans
● Execute each plan:
● 5 runs (+ 1 initial warm-up run)
● Use an initially empty query-local dataset for each run
● Measure for each plan:
● Avg. execution time
● Avg. number of RDF documents retrieved during execution
● Avg. number of query results
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 68
69. Evaluation Query (Example)
SELECT ?spec ?genus WHERE { Of what genus are
the species that are
geospecies:4qyn7 gs:inFamily ?fam . ● classified in the
?fam skos:narrowerTransitive ?spec . same family as the
?spec skos:closeMatch ?sp2 . American Badger,
● and expected in the
?sp2 rdfs:subClassOf ?genus .
same states as the
?spec gs:isExpectedIn ?loc . American Badger ?
geospecies:4qyn7 gs:isExpectedIn ?loc
?loc rdf:type gs:State . }
● 2 potential seed triple patterns that
satisfy our NO SEED VOCAB RULE
● 56 different dependency respecting
plans, each contains 2 filtering TPs Picture source: Wikipedia
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 69
70. Measurements
30 400
retrieved documents
300
20
query results
200
10
100
0 0
0 30 60 90 120 150 180 0 30 60 90 120 150 180
query exec. times (in seconds) query exec. times (in seconds)
Percentage of plans in each group with a filtering TP in specific positions
1st Filtering TP 2nd Filtering TP
100 100
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
TP position in the ordered BGP TP position in the ordered BGP
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 70
71. Summary (Linked Data Queries)
● Theoretical foundations of Linked Data queries
● Full-Web semantics, (family of) reachability based semantics
● Theoretical properties of queries (e.g. computability)
● Link traversal based query execution
● Novel paradigm for executing Linked Data queries
● Sound and complete for conjunctive Linked Data queries
under cMatch-semantics
● Iterator implementation of the LTBQE paradigm
● Trades off completeness for a termination guarantee
● Degree of completeness depends on execution order of TPs
● Heuristic based plan selection
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 71
72. Chapter 3
Accessing a SPARQL Endpoint
Queries over Multiple Datasets
➢ Query a given collection
➢ Manage your own collection
➢ Use a query federation system
➢ Link traversal based query execution
Linked Data Queries
➢ Foundations
➢ Iterator Based Implementation
➢ Query Planning
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 72
73. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 73