SlideShare uma empresa Scribd logo
1 de 66
Querying data on the Web:

client or server?
Ruben Verborgh
Ghent University – iMinds
The current Semantic Web

has many implicit assumptions.
We should be able

to answer all queries.
Complexity is more important

than availability.
Data servers

need to be expensive.
Those assumptions are

not necessarily wrong.
They’re also not necessarily

the only possible ones.
Some queries are

hard to answer.
Availability is
a top priority.
Low-cost data servers

have potential.
Let’s rethink our assumptions,

just to see what’s possible.
Different assumptions lead

to a different Semantic Web.
Maybe they bring us closer

to the Web We Want.
…but what do we want?
The Semantic Web’s assumptions
Client-side query execution
Querying data on the Web:

client or server?
New query opportunities
1. Clients need a different protocol.
The Web for humans offers
an HTTP interface to HTML.
client dataHTTP
HTML
The Web for applications offers
an HTTP interface to JSON.
client dataHTTP
JSON
The Web for applications offers
an HTTP interface to RDF.
client dataHTTP
RDF
The Web for applications offers
an SPARQL interface to RDF.
client dataHTTP
RDF
SPARQL
Documents need a new language.
Semantic Web clients were

perceived as very limited.
Querying needs a new protocol.
…unlike “simple” JSON clients.
1. Clients need a different protocol.
2. Live queries require that protocol.
public SPARQL endpoints
There are 3 common ways

to publish Linked Data.
Linked Data documents
downloadable data dumps
…and that’s not always a good thing.
Public SPARQL endpoints

offer a very powerful interface.
Clients can ask any query…
…if the endpoint is available.
Hosting an endpoint is costly.
Low-cost to host.
Linked Data documents

seem to work like the Web.
Solve queries by traversing links.
Many queries cannot be solved.
Set up your own endpoint.
Downloadable data dumps

have high availability.
Data is not live.
You’re not really querying the Web.
1. Clients need a different protocol.
2. Live queries require that protocol.
3. Clients can request any query.
The query language abstracts away

the steps needed to solve it.
In SPARQL, asking a simple query

is as easy as asking a difficult one.
In contrast to the rest of the Web,

clients are in control.
With a JSON interface, the server
decides how clients access data.
client dataHTTP
JSON
client dataHTTP
RDF
SPARQL
With a SPARQL interface, clients

decide how they access data.
Clients can ask anything, also

queries that bring servers down.
The majority

of public SPARQL endpoints

has less than 95% availability.
That means the endpoint

—and thus your application—

doesn’t work 1.5 days each month.
If you have operational need

for SPARQL accessible data,

you must have your own infrastructure.
No public endpoints.

Public endpoints are for lookups and discovery;

sort of a dataset demo.
—Orri Erling, OpenLink (2014)
SEMANTICthings we happen to have

downloaded from the
WEB
If you want to study

a subject on Wikipedia,
do you download all

4,614,000 articles first?
1. Clients need a different protocol.
2. Live queries require that protocol.
3. Clients can request any query.
The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:

client or server?
data

dump
SPARQL

endpoint
Any fragment of a Linked Data set

is called a Linked Data Fragment.
derefer-

encing
high server efforthigh client effort
all subject SPARQL querySELECTOR
Each type of Linked Data Fragment

is defined by three characteristics.
selector
metadata
controls
What data does it contain?
What do we know about it?
What can we do next?
a SPARQL query
(none)
(none)
SPARQL CONSTRUCT result
selector
metadata
controls
Each type of Linked Data Fragment

is defined by three characteristics.
a specific entity
creator, maintainer, …
links to other LD documents
Linked Data Document
selector
metadata
controls
Each type of Linked Data Fragment

is defined by three characteristics.
everything
(none)
data dump
number of triples, file size
selector
metadata
controls
Each type of Linked Data Fragment

is defined by three characteristics.
Can we query fragments that

balance client and server effort?
data

dump
SPARQL

endpoint
triple

pattern

fragments
derefer-

encing
high server efforthigh client effort
all subject SPARQL querytriple pattern
triple pattern
total number of matches
access to all other fragments
selector
metadata
controls
Triple pattern fragments are cheap

yet enable efficient querying.
data (first 100)
controls (other fragments)
metadata (total count)
Other APIs exist, but are specific.
Triple pattern fragment servers

enable clients to execute queries.
Triple patterns work on all datasets.
Combine data, metadata & controls.
How to answer this query using

only triple pattern fragments?
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
}
Get the corresponding fragments

?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.

…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.

…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.

…
Get the corresponding fragments

and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
±470,000
12
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.

…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.

…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.

…
Start with the smallest fragment.

Start with the first match.
?person a dbpedia-owl:Artist ±61,
±470,
12
?person dbpedia-owl:birthPlace
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.

…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
How to answer this query using

only triple pattern fragments?
SELECT ?person WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
}
Get the corresponding fragments

?person a dbpedia-owl:Artist.
?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.

…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.

…
Get the corresponding fragments

and read the count metadata.
?person a dbpedia-owl:Artist. ±61,000
75?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.

…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.

…
Start with the smallest fragment.

Start with the first match.
?person a dbpedia-owl:Artist ±61,
75?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.

…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
How to answer this query using

only triple pattern fragments?
ASK {
dbp:John_Flaxman a dbpo:Artist.
dbp:John_Flaxman dbpo:birthPlace dbp:York.
dbp:York foaf:name "York"@en.
}
Get the corresponding fragment

and read the count metadata.
dbpedia:John_Flaxman a dbpedia-owl:Artist. 1
dbpedia:John_Flaxman a dbpedia-owl:Artist.
!
Output the match:
?person = dbpedia:John_Flaxman

?city = dbpedia:York
Recursively repeat the process

for all bindings.
?person dbpo:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.

…
?city foaf:name "York"@en.
dbpedia:York foaf:name “York”@en.
dbpedia:York,_Ontario foaf:name “York”@en.

…
Use the Web’s protocol HTTP.
This way of querying

changes the usual assumptions.
Don’t be smart; enable intelligence.
Some queries will be hard / slow.
Querying semantic datasources

means managing expectations.
data

dump
SPARQL

endpoint
triple

pattern

fragments
derefer-

encing
high server efforthigh client effort
low availabilityhigh availability
low freshness / speed high freshness / speed
The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:

client or server?
Coupling access and processing

leads to low availability.
SPARQL Server
Client
Client
Client
Client
Client
Client
Client
(a) sparql endpoints perform all processing on the server, leading to fast
query execution with low data bandwidth, and a rapidly overloaded server.
LDF Server
Client
ClientClient
Client
Client
Client
Client Client
Client
(b) ldf servers only support simple requests and can thus handle far higher
loads. Clients perform the querying, so they need more (cacheable) data.
Enabling clients to query

leads to high scalability.
Show a sorted list of molecules

that match certain characteristics.
…
Molecules endpoint

approach
fragment

approach
Molecules
endpoint

approach
SPARQL

endpoint
Molecules
Show a sorted list of molecules

that match certain characteristics.
endpoint

approachSELECT DISTINCT(?mol) MIN(?name)
WHERE {
?mol rdfs:label ?name;
…
…
}
ORDER BY ?name
Show a sorted list of molecules

that match certain characteristics.
endpoint

approach
Show a sorted list of molecules

that match certain characteristics.
SELECT DISTINCT(?mol) MIN(?name)
WHERE {
?mol rdfs:label ?name;
…
…
}
ORDER BY ?name
endpoint

approach
DISTINCT
MIN
SORT BY
keep all results in memory
keep all results in memory, blocking
keep all results in memory, blocking
Consequences:
Doesn’t matter; we’re waiting anyway.
Show a sorted list of molecules

that match certain characteristics.
fragments

approach
No blocking operators; streaming matters.
Show a sorted list of molecules

that match certain characteristics.
SELECT ?mol ?name
WHERE {
?mol rdfs:label ?name;
…
…
}
Molecules
fragments

approach
MoleculesMolecules
Show a sorted list of molecules

that match certain characteristics.
The algorithm remains the same

when clients use one or multiple

triple pattern fragment servers.
Federation also becomes

substantially easier.
Avoid the unavailability cascade.
An optimal solution doesn’t exist.

We should look at all APIs.
data

dump
SPARQL

endpoint
triple

pattern

fragments
derefer-

encing
Servers indicate what they do,

enabling clients to query optimally.
“This server supports triple patterns

and full-text search on objects.”
“This server supports SPARQL queries

with up to 2 joins.”
“This server supports Linked Data documents.”
The Semantic Web’s assumptions
Client-side query execution
New query opportunities
Querying data on the Web:

client or server?
Different assumptions

lead to different trade-offs.
Live querying of public data
is possible at low cost,

but at slower speeds…
…for now :-)
Let your browser

solve a SPARQL query:

client.linkeddatafragments.org
Ruben Verborgh
Ghent University – iMinds

Mais conteúdo relacionado

Mais procurados

Distributed Affordance
Distributed AffordanceDistributed Affordance
Distributed Affordance
Ruben Verborgh
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIs
Ruben Verborgh
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
Ruben Verborgh
 
RESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumption
Ruben Verborgh
 
Web data from R
Web data from RWeb data from R
Web data from R
schamber
 

Mais procurados (20)

Distributed Affordance
Distributed AffordanceDistributed Affordance
Distributed Affordance
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIs
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
 
The web – A hypermedia story
The web – A hypermedia storyThe web – A hypermedia story
The web – A hypermedia story
 
Hypermedia APIs that make sense
Hypermedia APIs that make senseHypermedia APIs that make sense
Hypermedia APIs that make sense
 
Hypermedia Cannot be the Engine
Hypermedia Cannot be the EngineHypermedia Cannot be the Engine
Hypermedia Cannot be the Engine
 
Linking media, data, and services
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and services
 
RESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumption
 
Web data from R
Web data from RWeb data from R
Web data from R
 
2010 Sopac Cosugi
2010 Sopac Cosugi2010 Sopac Cosugi
2010 Sopac Cosugi
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Flink Community Update 2015 June
Flink Community Update 2015 JuneFlink Community Update 2015 June
Flink Community Update 2015 June
 
SPARQL Query Forms
SPARQL Query FormsSPARQL Query Forms
SPARQL Query Forms
 
Getting Started With The Talis Platform
Getting Started With The Talis PlatformGetting Started With The Talis Platform
Getting Started With The Talis Platform
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIs
 
Tutorial Linked APIs
Tutorial Linked APIsTutorial Linked APIs
Tutorial Linked APIs
 
Kibana: Real-World Examples
Kibana: Real-World ExamplesKibana: Real-World Examples
Kibana: Real-World Examples
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with Hydra
 

Semelhante a Querying data on the Web – client or server?

2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
Jun Zhao
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
Miel Vander Sande
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
Jun Zhao
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemT
Laura Chiticariu
 
Semantic Web Servers
Semantic Web ServersSemantic Web Servers
Semantic Web Servers
webhostingguy
 

Semelhante a Querying data on the Web – client or server? (20)

The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
Opportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership MetadataOpportunistic Linked Data Querying through Approximate Membership Metadata
Opportunistic Linked Data Querying through Approximate Membership Metadata
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data FragmentsESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
 
Phalcon 2 High Performance APIs - DevWeekPOA 2015
Phalcon 2 High Performance APIs - DevWeekPOA 2015Phalcon 2 High Performance APIs - DevWeekPOA 2015
Phalcon 2 High Performance APIs - DevWeekPOA 2015
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Graphql
GraphqlGraphql
Graphql
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemT
 
cyclades eswc2016
cyclades eswc2016cyclades eswc2016
cyclades eswc2016
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning Issues
 
Semantic Web Servers
Semantic Web ServersSemantic Web Servers
Semantic Web Servers
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Querying data on the Web – client or server?

  • 1. Querying data on the Web:
 client or server? Ruben Verborgh Ghent University – iMinds
  • 2. The current Semantic Web
 has many implicit assumptions. We should be able
 to answer all queries. Complexity is more important
 than availability. Data servers
 need to be expensive.
  • 3. Those assumptions are
 not necessarily wrong. They’re also not necessarily
 the only possible ones.
  • 4. Some queries are
 hard to answer. Availability is a top priority. Low-cost data servers
 have potential. Let’s rethink our assumptions,
 just to see what’s possible.
  • 5. Different assumptions lead
 to a different Semantic Web. Maybe they bring us closer
 to the Web We Want.
  • 6. …but what do we want?
  • 7. The Semantic Web’s assumptions Client-side query execution Querying data on the Web:
 client or server? New query opportunities
  • 8. 1. Clients need a different protocol.
  • 9. The Web for humans offers an HTTP interface to HTML. client dataHTTP HTML
  • 10. The Web for applications offers an HTTP interface to JSON. client dataHTTP JSON
  • 11. The Web for applications offers an HTTP interface to RDF. client dataHTTP RDF
  • 12. The Web for applications offers an SPARQL interface to RDF. client dataHTTP RDF SPARQL
  • 13. Documents need a new language. Semantic Web clients were
 perceived as very limited. Querying needs a new protocol. …unlike “simple” JSON clients.
  • 14. 1. Clients need a different protocol. 2. Live queries require that protocol.
  • 15. public SPARQL endpoints There are 3 common ways
 to publish Linked Data. Linked Data documents downloadable data dumps
  • 16. …and that’s not always a good thing. Public SPARQL endpoints
 offer a very powerful interface. Clients can ask any query… …if the endpoint is available. Hosting an endpoint is costly.
  • 17. Low-cost to host. Linked Data documents
 seem to work like the Web. Solve queries by traversing links. Many queries cannot be solved.
  • 18. Set up your own endpoint. Downloadable data dumps
 have high availability. Data is not live. You’re not really querying the Web.
  • 19. 1. Clients need a different protocol. 2. Live queries require that protocol. 3. Clients can request any query.
  • 20. The query language abstracts away
 the steps needed to solve it. In SPARQL, asking a simple query
 is as easy as asking a difficult one. In contrast to the rest of the Web,
 clients are in control.
  • 21. With a JSON interface, the server decides how clients access data. client dataHTTP JSON
  • 22. client dataHTTP RDF SPARQL With a SPARQL interface, clients
 decide how they access data.
  • 23. Clients can ask anything, also
 queries that bring servers down. The majority
 of public SPARQL endpoints
 has less than 95% availability. That means the endpoint
 —and thus your application—
 doesn’t work 1.5 days each month.
  • 24. If you have operational need
 for SPARQL accessible data,
 you must have your own infrastructure. No public endpoints.
 Public endpoints are for lookups and discovery;
 sort of a dataset demo. —Orri Erling, OpenLink (2014)
  • 25. SEMANTICthings we happen to have
 downloaded from the WEB
  • 26. If you want to study
 a subject on Wikipedia, do you download all
 4,614,000 articles first?
  • 27. 1. Clients need a different protocol. 2. Live queries require that protocol. 3. Clients can request any query.
  • 28. The Semantic Web’s assumptions Client-side query execution New query opportunities Querying data on the Web:
 client or server?
  • 29. data
 dump SPARQL
 endpoint Any fragment of a Linked Data set
 is called a Linked Data Fragment. derefer-
 encing high server efforthigh client effort all subject SPARQL querySELECTOR
  • 30. Each type of Linked Data Fragment
 is defined by three characteristics. selector metadata controls What data does it contain? What do we know about it? What can we do next?
  • 31. a SPARQL query (none) (none) SPARQL CONSTRUCT result selector metadata controls Each type of Linked Data Fragment
 is defined by three characteristics.
  • 32. a specific entity creator, maintainer, … links to other LD documents Linked Data Document selector metadata controls Each type of Linked Data Fragment
 is defined by three characteristics.
  • 33. everything (none) data dump number of triples, file size selector metadata controls Each type of Linked Data Fragment
 is defined by three characteristics.
  • 34. Can we query fragments that
 balance client and server effort? data
 dump SPARQL
 endpoint triple
 pattern
 fragments derefer-
 encing high server efforthigh client effort all subject SPARQL querytriple pattern
  • 35. triple pattern total number of matches access to all other fragments selector metadata controls Triple pattern fragments are cheap
 yet enable efficient querying.
  • 36. data (first 100) controls (other fragments) metadata (total count)
  • 37. Other APIs exist, but are specific. Triple pattern fragment servers
 enable clients to execute queries. Triple patterns work on all datasets. Combine data, metadata & controls.
  • 38. How to answer this query using
 only triple pattern fragments? SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. }
  • 39. Get the corresponding fragments
 ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  • 40. Get the corresponding fragments
 and read the count metadata. ?person a dbpedia-owl:Artist. ±61,000 ±470,000 12 ?person dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  • 41. Start with the smallest fragment.
 Start with the first match. ?person a dbpedia-owl:Artist ±61, ±470, 12 ?person dbpedia-owl:birthPlace ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 … dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency. dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce. … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. …
  • 42. How to answer this query using
 only triple pattern fragments? SELECT ?person WHERE { ?person a dbpedia-owl:Artist. ?person dbpedia-owl:birthPlace dbpedia:York. dbpedia:York foaf:name "York"@en. }
  • 43. Get the corresponding fragments
 ?person a dbpedia-owl:Artist. ?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  • 44. Get the corresponding fragments
 and read the count metadata. ?person a dbpedia-owl:Artist. ±61,000 75?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki a dbpedia-owl:Artist. dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
 …
  • 45. Start with the smallest fragment.
 Start with the first match. ?person a dbpedia-owl:Artist ±61, 75?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … dbpedia:Aamir_Zaki dbpedia:Ahmad_Morid a dbpedia-owl:Artist. …
  • 46. How to answer this query using
 only triple pattern fragments? ASK { dbp:John_Flaxman a dbpo:Artist. dbp:John_Flaxman dbpo:birthPlace dbp:York. dbp:York foaf:name "York"@en. }
  • 47. Get the corresponding fragment
 and read the count metadata. dbpedia:John_Flaxman a dbpedia-owl:Artist. 1 dbpedia:John_Flaxman a dbpedia-owl:Artist. ! Output the match: ?person = dbpedia:John_Flaxman
 ?city = dbpedia:York
  • 48. Recursively repeat the process
 for all bindings. ?person dbpo:birthPlace dbpedia:York. dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York. dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
 … ?city foaf:name "York"@en. dbpedia:York foaf:name “York”@en. dbpedia:York,_Ontario foaf:name “York”@en.
 …
  • 49. Use the Web’s protocol HTTP. This way of querying
 changes the usual assumptions. Don’t be smart; enable intelligence. Some queries will be hard / slow.
  • 50. Querying semantic datasources
 means managing expectations. data
 dump SPARQL
 endpoint triple
 pattern
 fragments derefer-
 encing high server efforthigh client effort low availabilityhigh availability low freshness / speed high freshness / speed
  • 51. The Semantic Web’s assumptions Client-side query execution New query opportunities Querying data on the Web:
 client or server?
  • 52. Coupling access and processing
 leads to low availability. SPARQL Server Client Client Client Client Client Client Client (a) sparql endpoints perform all processing on the server, leading to fast query execution with low data bandwidth, and a rapidly overloaded server.
  • 53. LDF Server Client ClientClient Client Client Client Client Client Client (b) ldf servers only support simple requests and can thus handle far higher loads. Clients perform the querying, so they need more (cacheable) data. Enabling clients to query
 leads to high scalability.
  • 54. Show a sorted list of molecules
 that match certain characteristics. … Molecules endpoint
 approach fragment
 approach
  • 55. Molecules endpoint
 approach SPARQL
 endpoint Molecules Show a sorted list of molecules
 that match certain characteristics.
  • 56. endpoint
 approachSELECT DISTINCT(?mol) MIN(?name) WHERE { ?mol rdfs:label ?name; … … } ORDER BY ?name Show a sorted list of molecules
 that match certain characteristics.
  • 57. endpoint
 approach Show a sorted list of molecules
 that match certain characteristics. SELECT DISTINCT(?mol) MIN(?name) WHERE { ?mol rdfs:label ?name; … … } ORDER BY ?name
  • 58. endpoint
 approach DISTINCT MIN SORT BY keep all results in memory keep all results in memory, blocking keep all results in memory, blocking Consequences: Doesn’t matter; we’re waiting anyway. Show a sorted list of molecules
 that match certain characteristics.
  • 59. fragments
 approach No blocking operators; streaming matters. Show a sorted list of molecules
 that match certain characteristics. SELECT ?mol ?name WHERE { ?mol rdfs:label ?name; … … }
  • 60. Molecules fragments
 approach MoleculesMolecules Show a sorted list of molecules
 that match certain characteristics.
  • 61. The algorithm remains the same
 when clients use one or multiple
 triple pattern fragment servers. Federation also becomes
 substantially easier. Avoid the unavailability cascade.
  • 62. An optimal solution doesn’t exist.
 We should look at all APIs. data
 dump SPARQL
 endpoint triple
 pattern
 fragments derefer-
 encing
  • 63. Servers indicate what they do,
 enabling clients to query optimally. “This server supports triple patterns
 and full-text search on objects.” “This server supports SPARQL queries
 with up to 2 joins.” “This server supports Linked Data documents.”
  • 64. The Semantic Web’s assumptions Client-side query execution New query opportunities Querying data on the Web:
 client or server?
  • 65. Different assumptions
 lead to different trade-offs. Live querying of public data is possible at low cost,
 but at slower speeds… …for now :-)
  • 66. Let your browser
 solve a SPARQL query:
 client.linkeddatafragments.org Ruben Verborgh Ghent University – iMinds