Querying data on the Web – client or server?

Querying data on the Web: 
client or server?
Ruben Verborgh
Ghent University – iMinds

The current Semantic Web 
has many implicit assumptions.
We should be able 
to answer all queries.
Complexity is more important 
than availability.
Data servers 
need to be expensive.

Those assumptions are 
not necessarily wrong.
They’re also not necessarily 
the only possible ones.

Some queries are 
hard to answer.
Availability is
a top priority.
Low-cost data servers 
have potential.
Let’s rethink our assumptions, 
just to see what’s possible.

Different assumptions lead 
to a different Semantic Web.
Maybe they bring us closer 
to the Web We Want.

The Semantic Web’s assumptions
Client-side query execution
client or server?
New query opportunities

1. Clients need a different protocol.

The Web for humans oﬀers
an HTTP interface to HTML.
client dataHTTP
HTML

The Web for applications oﬀers
an HTTP interface to JSON.
client dataHTTP
JSON

an HTTP interface to RDF.
client dataHTTP
RDF

an SPARQL interface to RDF.
client dataHTTP
RDF
SPARQL

Documents need a new language.
Semantic Web clients were 
perceived as very limited.
Querying needs a new protocol.
…unlike “simple” JSON clients.

2. Live queries require that protocol.

public SPARQL endpoints
There are 3 common ways 
to publish Linked Data.
Linked Data documents
downloadable data dumps

…and that’s not always a good thing.
Public SPARQL endpoints 
oﬀer a very powerful interface.
Clients can ask any query…
…if the endpoint is available.
Hosting an endpoint is costly.

Low-cost to host.
Linked Data documents 
seem to work like the Web.
Solve queries by traversing links.
Many queries cannot be solved.

Set up your own endpoint.
Downloadable data dumps 
have high availability.
Data is not live.
You’re not really querying the Web.

2. Live queries require that protocol.
3. Clients can request any query.

The query language abstracts away 
the steps needed to solve it.
In SPARQL, asking a simple query 
is as easy as asking a diﬃcult one.
In contrast to the rest of the Web, 
clients are in control.

With a JSON interface, the server
decides how clients access data.
client dataHTTP
JSON

client dataHTTP
RDF
SPARQL
With a SPARQL interface, clients 
decide how they access data.

Clients can ask anything, also 
queries that bring servers down.
The majority 
of public SPARQL endpoints 
has less than 95% availability.
That means the endpoint 
—and thus your application— 
doesn’t work 1.5 days each month.

If you have operational need 
for SPARQL accessible data, 
you must have your own infrastructure.
No public endpoints. 
Public endpoints are for lookups and discovery; 
sort of a dataset demo.
—Orri Erling, OpenLink (2014)

SEMANTICthings we happen to have 
downloaded from the
WEB

If you want to study 
a subject on Wikipedia,
do you download all 
4,614,000 articles first?

The Semantic Web’s assumptions
Client-side query execution
New query opportunities
client or server?

data 
dump
SPARQL 
endpoint
Any fragment of a Linked Data set 
is called a Linked Data Fragment.
derefer- 
encing
high server efforthigh client effort
all subject SPARQL querySELECTOR

Each type of Linked Data Fragment 
is defined by three characteristics.
selector
metadata
controls
What data does it contain?
What do we know about it?
What can we do next?

a SPARQL query
(none)
(none)
SPARQL CONSTRUCT result
selector
metadata
controls

a speciﬁc entity
creator, maintainer, …
links to other LD documents
Linked Data Document
selector
metadata
controls

everything
(none)
data dump
number of triples, ﬁle size
selector
metadata
controls

Can we query fragments that 
balance client and server effort?
data 
dump
SPARQL 
endpoint
triple 
pattern 
fragments
derefer- 
encing
all subject SPARQL querytriple pattern

triple pattern
total number of matches
access to all other fragments
selector
metadata
controls
Triple pattern fragments are cheap 
yet enable efficient querying.

data (first 100)
controls (other fragments)
metadata (total count)

Other APIs exist, but are specific.
Triple pattern fragment servers 
enable clients to execute queries.
Triple patterns work on all datasets.
Combine data, metadata & controls.

How to answer this query using 
only triple pattern fragments?
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
}

Get the corresponding fragments 
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en. 
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce. 
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist. 
…

and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
±470,000
12
…
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce. 
…
…

Start with the smallest fragment. 
Start with the first match.
?person a dbpedia-owl:Artist ±61,
±470,
12
?person dbpedia-owl:birthPlace
…
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…

SELECT ?person WHERE {
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
}

?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York. 
…
…

?person a dbpedia-owl:Artist. ±61,000
75?person dbpo:birthPlace dbpedia:York.
…
…

Start with the smallest fragment. 
Start with the first match.
?person a dbpedia-owl:Artist ±61,
75?person dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…

ASK {
dbp:John_Flaxman a dbpo:Artist.
dbp:John_Flaxman dbpo:birthPlace dbp:York.
dbp:York foaf:name "York"@en.
}

Get the corresponding fragment 
dbpedia:John_Flaxman a dbpedia-owl:Artist. 1
dbpedia:John_Flaxman a dbpedia-owl:Artist.
!
Output the match:
?person = dbpedia:John_Flaxman 
?city = dbpedia:York

Recursively repeat the process 
for all bindings.
?person dbpo:birthPlace dbpedia:York.
…
…

Use the Web’s protocol HTTP.
This way of querying 
changes the usual assumptions.
Don’t be smart; enable intelligence.
Some queries will be hard / slow.

Querying semantic datasources 
means managing expectations.
data 
dump
SPARQL 
endpoint
triple 
pattern 
fragments
derefer- 
encing
low availabilityhigh availability
low freshness / speed high freshness / speed

Coupling access and processing 
leads to low availability.
SPARQL Server
Client
Client
Client
Client
Client
Client
Client
(a) sparql endpoints perform all processing on the server, leading to fast
query execution with low data bandwidth, and a rapidly overloaded server.

LDF Server
Client
ClientClient
Client
Client
Client
Client Client
Client
(b) ldf servers only support simple requests and can thus handle far higher
loads. Clients perform the querying, so they need more (cacheable) data.
Enabling clients to query 
leads to high scalability.

Show a sorted list of molecules 
that match certain characteristics.
…
Molecules endpoint 
approach
fragment 
approach

Molecules
endpoint 
approach
SPARQL 
endpoint
Molecules

endpoint 
approachSELECT DISTINCT(?mol) MIN(?name)
WHERE {
?mol rdfs:label ?name;
…
…
}
ORDER BY ?name

endpoint 
approach
SELECT DISTINCT(?mol) MIN(?name)
WHERE {
…
…
}
ORDER BY ?name

endpoint 
approach
DISTINCT
MIN
SORT BY
keep all results in memory
keep all results in memory, blocking
keep all results in memory, blocking
Consequences:
Doesn’t matter; we’re waiting anyway.

fragments 
approach
No blocking operators; streaming matters.
SELECT ?mol ?name
WHERE {
…
…
}

Molecules
fragments 
approach
MoleculesMolecules

The algorithm remains the same 
when clients use one or multiple 
triple pattern fragment servers.
Federation also becomes 
substantially easier.
Avoid the unavailability cascade.

An optimal solution doesn’t exist. 
We should look at all APIs.
data 
dump
SPARQL 
endpoint
triple 
pattern 
fragments
derefer- 
encing

Servers indicate what they do, 
enabling clients to query optimally.
“This server supports triple patterns 
and full-text search on objects.”
“This server supports SPARQL queries 
with up to 2 joins.”
“This server supports Linked Data documents.”

Different assumptions 
lead to different trade-offs.
Live querying of public data
is possible at low cost, 
but at slower speeds…
…for now :-)

Let your browser 
solve a SPARQL query: 
client.linkeddatafragments.org
Ruben Verborgh
Ghent University – iMinds

Querying data on the Web – client or server?

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Querying data on the Web – client or server?

Semelhante a Querying data on the Web – client or server? (20)

Último

Último (20)

Querying data on the Web – client or server?