SlideShare uma empresa Scribd logo
1 de 115
Baixar para ler offline
Web Data Management in RDF Age
M. Tamer ¨Ozsu
University of Waterloo
David R. Cheriton School of Computer Science
1ICDCS’17/2017-06-07
Acknowledgements
This presentation draws upon collaborative research and
discussions with the following colleagues (in alphabetical order)
G¨une¸s Alu¸c, University of Waterloo; now at SAP
Khuzaima Daudjee, University of Waterloo
Olaf Hartig, University of Waterloo; now at Link¨oping Univ.
Lei Chen, Hong Kong University of Science & Technology
Lei Zou, Peking University
2ICDCS’17/2017-06-07
Web Data Management
A long term research interest in the DB community
2000 2004
2011 2011
3ICDCS’17/2017-06-07
Interest Due to Properties of Web Data
Lack of a schema
Data is at best “semi-structured”
Missing data, additional attributes, similar data but not
identical
Volatility
Changes frequently
May conform to one schema now, but not later
Scale
Does it make sense to talk about a schema for Web?
How do you capture “everything”?
Querying difficulty
What is the user language?
What are the primitives?
Arent search engines or metasearch engines sufficient?
4ICDCS’17/2017-06-07
More Recent Approaches to Web Querying
Fusion Tables
Users contribute data in spreadsheet, CVS, KML format
Possible joins between multiple data sets
Extensive visualization
5ICDCS’17/2017-06-07
More Recent Approaches to Web Querying
Fusion Tables
Users contribute data in spreadsheet, CVS, KML format
Possible joins between multiple data sets
Extensive visualization
XML
Data exchange language
Primarily tree based structure
<list title="MOVIES">
<film>
<title>The Shining</title>
<director>Stanley Kubrick</director>
<actor>Jack Nicholson</actor>
</film>
<film>
<title>Spartacus</title>
<director>Stanley Kubrick</director>
</film>
<film>
<title>The Passenger</title>
<actor>Jack Nicholson</actor>
</film>
...
</list>
root
film
title
“The Shining”
director
“Stanley Kubrick”
actor
“Jack Nicholson”
film
...
film
title
“The Passenger”
actor
“Jack Nicholson”
5ICDCS’17/2017-06-07
More Recent Approaches to Web Querying
Fusion Tables
Users contribute data in spreadsheet, CVS, KML format
Possible joins between multiple data sets
Extensive visualization
XML
Data exchange language
Primarily tree based structure
Linked Open Data (LOD)
W3C work; community effort
Maintains autonomy of data sources
Low barrier to entry
5ICDCS’17/2017-06-07
Traditional Hypertext-based Web Access
IMDb World
Book
Data exposed
to the Web
via HTML
6ICDCS’17/2017-06-07
Linked Data Publishing Principles
IMDb World
Book
(http://...linkedmdb.../Shining,releaseDate, 23 May 1980)
(http://...linkedmdb.../Shining, filmLocation, http://cia.../UK)
(http://...linkedmdb.../29704,actedIn, http://...linkedmdb.../Shining)
...
(http://cia.../UK, hasPopulation, 63230000)
...
Shining
UK
Data model: RDF
Global identifier: URI
Access mechanism: HTTP
Connection: data links
7ICDCS’17/2017-06-07
Linked Object Data – Closer Look
8ICDCS’17/2017-06-07
LOD Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 3000 datasets with
>84B triples
Size almost doubling every year
9ICDCS’17/2017-06-07
LOD Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 3000 datasets with
>84B triples
Size almost doubling every year
As of March 2009
LinkedCT
Reactome
Taxonomy
KEGG
PubMed
GeneID
Pfam
UniProt
OMIM
PDB
Symbol
ChEBI
Daily
Med
Disea-
some
CAS
HGNC
Inter
Pro
Drug
Bank
UniParc
UniRef
ProDom
PROSITE
Gene
Ontology
Homolo
Gene
Pub
Chem
MGI
UniSTS
GEO
Species
Jamendo
BBC
Programm
es
Music-
brainz
Magna-
tune
BBC
Later +
TOTP
Surge
Radio
MySpace
Wrapper
Audio-
Scrobbler
Linked
MDB
BBC
John
Peel
BBC
Playcount
Data
Gov-
Track
US
Census
Data
riese
Geo-
names
lingvoj
World
Fact-
book
Euro-
stat
IRIT
Toulouse
SW
Conference
Corpus
RDF Book
Mashup
Project
Guten-
berg
DBLP
Hannover
DBLP
Berlin
LAAS-
CNRS
Buda-
pest
BME
IEEE
IBM
Resex
Pisa
New-
castle
RAE
2001
CiteSeer
ACM
DBLP
RKB
Explorer
eprints
LIBRIS
Semantic
Web.org Eurécom
ECS
South-
ampton
RevyuSIOC
Sites
Doap-
space
Flickr
exporter
FOAF
profiles
flickr
wrappr
Crunch
Base
Sem-
Web-
Central
Open-
Guides
Wiki-
company
QDOS
Pub
Guide
Open
Calais
RDF
ohloh
W3C
WordNet
Open
Cyc
UMBEL
Yago
DBpedia
Freebase
Virtuoso
Sponger
March ’09:
89 datasets
9ICDCS’17/2017-06-07
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
LOD Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 3000 datasets with
>84B triples
Size almost doubling every year
As of September 2010
Music
Brainz
(zitgist)
P20
YAGO
World
Fact-
book
(FUB)
WordNet
(W3C)
WordNet
(VUA)
VIVO UF
VIVO
Indiana
VIVO
Cornell
VIAF
URI
Burner
Sussex
Reading
Lists
Plymouth
Reading
Lists
UMBEL
UK Post-
codes
legislation
.gov.uk
Uberblic
UB
Mann-
heim
TWC LOGD
Twarql
transport
data.gov
.uk
totl.net
Tele-
graphis
TCM
Gene
DIT
Taxon
Concept
The Open
Library
(Talis)
t4gm
Surge
Radio
STW
RAMEAU
SH
statistics
data.gov
.uk
St.
Andrews
Resource
Lists
ECS
South-
ampton
EPrints
Semantic
Crunch
Base
semantic
web.org
Semantic
XBRL
SW
Dog
Food
rdfabout
US SEC
Wiki
UN/
LOCODE
Ulm
ECS
(RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-
castle
LAAS
KISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints
dotAC
DEPLOY
DBLP
(RKB
Explorer)
Course-
ware
CORDIS
CiteSeer
Budapest
ACM
riese
Revyu
research
data.gov
.uk
reference
data.gov
.uk
Recht-
spraak.
nl
RDF
ohloh
Last.FM
(rdfize)
RDF
Book
Mashup
PSH
Product
DB
PBAC
Poké-
pédia
Ord-
nance
Survey
Openly
Local
The Open
Library
Open
Cyc
Open
Calais
OpenEI
New
York
Times
NTU
Resource
Lists
NDL
subjects
MARC
Codes
List
Man-
chester
Reading
Lists
Lotico
The
London
Gazette
LOIUS
lobid
Resources
lobid
Organi-
sations
Linked
MDB
Linked
LCCN
Linked
GeoData
Linked
CT
Linked
Open
Numbers
lingvoj
LIBRIS
Lexvo
LCSH
DBLP
(L3S)
Linked
Sensor Data
(Kno.e.sis)
Good-
win
Family
Jamendo
iServe
NSZL
Catalog
GovTrack
GESIS
Geo
Species
Geo
Names
Geo
Linked
Data
(es)
GTAA
STITCH
SIDER
Project
Guten-
berg
(FUB)
Medi
Care
Euro-
stat
(FUB)
Drug
Bank
Disea-
some
DBLP
(FU
Berlin)
Daily
Med
Freebase
flickr
wrappr
Fishes
of Texas
FanHubz
Event-
Media
EUTC
Produc-
tions
Eurostat
EUNIS
ESD
stan-
dards
Popula-
tion (En-
AKTing)
NHS
(EnAKTing)
Mortality
(En-
AKTing)
Energy
(En-
AKTing)
CO2
(En-
AKTing)
education
data.gov
.uk
ECS
South-
ampton
Gem.
Norm-
datei
data
dcs
MySpace
(DBTune)
Music
Brainz
(DBTune)
Magna-
tune
John
Peel
(DB
Tune)
classical
(DB
Tune)
Audio-
scrobbler
(DBTune)
Last.fm
Artists
(DBTune)
DB
Tropes
dbpedia
lite
DBpedia
Pokedex
Airports
NASA
(Data
Incu-
bator)
Music
Brainz
(Data
Incubator)
Moseley
Folk
Discogs
(Data In-
cubator)
Climbing
Linked Data
for Intervals
Cornetto
Chronic-
ling
America
Chem2
Bio2RDF
biz.
data.
gov.uk
UniSTS
UniRef
Uni
Path-
way
UniParc
Taxo-
nomy
UniProt
SGD
Reactome
PubMed
Pub
Chem
PRO-
SITE
ProDom
Pfam PDB
OMIM
OBO
MGI
KEGG
Reaction
KEGG
Pathway
KEGG
Glycan
KEGG
Enzyme
KEGG
Drug
KEGG
Cpd
InterPro
Homolo
Gene
HGNC
Gene
Ontology
GeneID
Gen
Bank
ChEBI
CAS
Affy-
metrix
BibBase
BBC
Wildlife
Finder
BBC
Program
mes
BBC
Music
rdfabout
US Census
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
September ’10:
203 datasets
9ICDCS’17/2017-06-07
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
LOD Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 3000 datasets with
>84B triples
Size almost doubling every year
As of September 2011
Music
Brainz
(zitgist)
P20
Turismo
de
Zaragoza
yovisto
Yahoo!
Geo
Planet
YAGO
World
Fact-
book
El
Viajero
Tourism
WordNet
(W3C)
WordNet
(VUA)
VIVO UF
VIVO
Indiana
VIVO
Cornell
VIAF
URI
Burner
Sussex
Reading
Lists
Plymouth
Reading
Lists
UniRef
UniProt
UMBEL
UK Post-
codes
legislation
data.gov.uk
Uberblic
UB
Mann-
heim
TWC LOGD
Twarql
transport
data.gov.
uk
Traffic
Scotland
theses.
fr
Thesau-
rus W
totl.net
Tele-
graphis
TCM
Gene
DIT
Taxon
Concept
Open
Library
(Talis)
tags2con
delicious
t4gm
info
Swedish
Open
Cultural
Heritage
Surge
Radio
Sudoc
STW
RAMEAU
SH
statistics
data.gov.
uk
St.
Andrews
Resource
Lists
ECS
South-
ampton
EPrints
SSW
Thesaur
us
Smart
Link
Slideshare
2RDF
semantic
web.org
Semantic
Tweet
Semantic
XBRL
SW
Dog
Food
Source Code
Ecosystem
Linked Data
US SEC
(rdfabout)
Sears
Scotland
Geo-
graphy
Scotland
Pupils &
Exams
Scholaro-
meter
WordNet
(RKB
Explorer)
Wiki
UN/
LOCODE
Ulm
ECS
(RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-
castle
LAAS
KISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP
(RKB
Explorer)
Crime
Reports
UK
Course-
ware
CORDIS
(RKB
Explorer)
CiteSeer
Budapest
ACM
riese
Revyu
research
data.gov.
ukRen.
Energy
Genera-
tors
reference
data.gov.
uk
Recht-
spraak.
nl
RDF
ohloh
Last.FM
(rdfize)
RDF
Book
Mashup
Rådata
nå!
PSH
Product
Types
Ontology
Product
DB
PBAC
Poké-
pédia
patents
data.go
v.uk
Ox
Points
Ord-
nance
Survey
Openly
Local
Open
Library
Open
Cyc
Open
Corpo-
rates
Open
Calais
OpenEI
Open
Election
Data
Project
Open
Data
Thesau-
rus
Ontos
News
Portal
OGOLOD
Janus
AMP
Ocean
Drilling
Codices
New
York
Times
NVD
ntnusc
NTU
Resource
Lists
Norwe-
gian
MeSH
NDL
subjects
ndlna
my
Experi-
ment
Italian
Museums
medu-
cator
MARC
Codes
List
Man-
chester
Reading
Lists
Lotico
Weather
Stations
London
Gazette
LOIUS
Linked
Open
Colors
lobid
Resources
lobid
Organi-
sations
LEM
Linked
MDB
LinkedL
CCN
Linked
GeoData
LinkedCT
Linked
User
Feedback
LOV
Linked
Open
Numbers
LODE
Eurostat
(Ontology
Central)
Linked
EDGAR
(Ontology
Central)
Linked
Crunch-
base
lingvoj
Lichfield
Spen-
ding
LIBRIS
Lexvo
LCSH
DBLP
(L3S)
Linked
Sensor Data
(Kno.e.sis)
Klapp-
stuhl-
club
Good-
win
Family
National
Radio-
activity
JP
Jamendo
(DBtune)
Italian
public
schools
ISTAT
Immi-
gration
iServe
IdRef
Sudoc
NSZL
Catalog
Hellenic
PD
Hellenic
FBD
Piedmont
Accomo-
dations
GovTrack
GovWILD
Google
Art
wrapper
gnoss
GESIS
GeoWord
Net
Geo
Species
Geo
Names
Geo
Linked
Data
GEMET
GTAA
STITCH
SIDER
Project
Guten-
berg
Medi
Care
Euro-
stat
(FUB)
EURES
Drug
Bank
Disea-
some
DBLP
(FU
Berlin)
Daily
Med
CORDIS
(FUB)
Freebase
flickr
wrappr
Fishes
of Texas
Finnish
Munici-
palities
ChEMBL
FanHubz
Event
Media
EUTC
Produc-
tions
Eurostat
Europeana
EUNIS
EU
Insti-
tutions
ESD
stan-
dards
EARTh
Enipedia
Popula-
tion (En-
AKTing)
NHS
(En-
AKTing) Mortality
(En-
AKTing)
Energy
(En-
AKTing)
Crime
(En-
AKTing)
CO2
Emission
(En-
AKTing)
EEA
SISVU
educatio
n.data.g
ov.uk
ECS
South-
ampton
ECCO-
TCP
GND
Didactal
ia
DDC Deutsche
Bio-
graphie
data
dcs
Music
Brainz
(DBTune)
Magna-
tune
John
Peel
(DBTune)
Classical
(DB
Tune)
Audio
Scrobbler
(DBTune)
Last.FM
artists
(DBTune)
DB
Tropes
Portu-
guese
DBpedia
dbpedia
lite
Greek
DBpedia
DBpedia
data-
open-
ac-uk
SMC
Journals
Pokedex
Airports
NASA
(Data
Incu-
bator)
Music
Brainz
(Data
Incubator)
Moseley
Folk
Metoffice
Weather
Forecasts
Discogs
(Data
Incubator)
Climbing
data.gov.uk
intervals
Data
Gov.ie
data
bnf.fr
Cornetto
reegle
Chronic-
ling
America
Chem2
Bio2RDF
Calames
business
data.gov.
uk
Bricklink
Brazilian
Poli-
ticians
BNB
UniSTS
UniPath
way
UniParc
Taxono
my
UniProt
(Bio2RDF)
SGD
Reactome
PubMed
Pub
Chem
PRO-
SITE
ProDom
Pfam
PDB
OMIM
MGI
KEGG
Reaction
KEGG
Pathway
KEGG
Glycan
KEGG
Enzyme
KEGG
Drug
KEGG
Com-
pound
InterPro
Homolo
Gene
HGNC
Gene
Ontology
GeneID
Affy-
metrix
bible
ontology
BibBase
FTS
BBC
Wildlife
Finder
BBC
Program
mes BBC
Music
Alpine
Ski
Austria
LOCAH
Amster-
dam
Museum
AGROV
OC
AEMET
US Census
(rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
September ’11:
295 datasets, 25B
triples
9ICDCS’17/2017-06-07
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
LOD Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 3000 datasets with
>84B triples
Size almost doubling every year
April ’14:
570 datasets, ???
triples
9ICDCS’17/2017-06-07
Max Schmachtenberg, Christian Bizer, and Heiko Paulheim: Adoption of Linked
Data Best Practices in Different Topical Domains. In Proc. ISWC, 2014.
Globally Distributed Network of Data
10ICDCS’17/2017-06-07
Outline
1 RDF Technology [¨Ozsu, 2016]
Data Warehousing Approach
Distributed RDF Processing
2 Federated RDF Systems
SPARQL Endpoint Federation
General RDF Federation
3 LOD – Live Querying Approach [Hartig, 2013a]
Traversal-based approaches
Index-based approaches
Hybrid approaches
4 Conclusions
11ICDCS’17/2017-06-07
Outline
1 RDF Technology [¨Ozsu, 2016]
Data Warehousing Approach
Distributed RDF Processing
2 Federated RDF Systems
SPARQL Endpoint Federation
General RDF Federation
3 LOD – Live Querying Approach [Hartig, 2013a]
Traversal-based approaches
Index-based approaches
Hybrid approaches
4 Conclusions
12ICDCS’17/2017-06-07
RDF Introduction
Everything is an uniquely named
resource
http://data.linkedmdb.org/resource/actor/JN29704
13ICDCS’17/2017-06-07
RDF Introduction
Everything is an uniquely named
resource
Prefixes can be used to shorten the
names
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
13ICDCS’17/2017-06-07
RDF Introduction
Everything is an uniquely named
resource
Prefixes can be used to shorten the
names
Properties of resources can be
defined
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
13ICDCS’17/2017-06-07
RDF Introduction
Everything is an uniquely named
resource
Prefixes can be used to shorten the
names
Properties of resources can be
defined
Relationships with other resources
can be defined
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
13ICDCS’17/2017-06-07
RDF Introduction
Everything is an uniquely named
resource
Prefixes can be used to shorten the
names
Properties of resources can be
defined
Relationships with other resources
can be defined
Resource descriptions can be
contributed by different
people/groups and can be located
anywhere in the web
Integrated web “database”
xmlns:y=http://data.linkedmdb.org/resource/actor/
y:JN29704
y:JN29704:hasName “Jack Nicholson”
y:JN29704:BornOnDate “1937-04-22”
y:TS2014:title “The Shining”
y:TS2014:releaseDate “1980-05-23”
y:TS2014
JN29704:movieActor
13ICDCS’17/2017-06-07
RDF Data Model
Triple: Subject, Predicate (Property),
Object (s, p, o)
Subject: the entity that is described
(URI or blank node)
Predicate: a feature of the entity (URI)
Object: value of the feature (URI,
blank node or literal)
(s, p, o) ∈ (U ∪ B) × U × (U ∪ B ∪ L)
Set of RDF triples is called an RDF graph
U
Subject Object
U B U B L
U: set of URIs
B: set of blank nodes
L: set of literals
Predicate
Subject Predicate Object
http://...imdb.../film/2014 rdfs:label “The Shining”
http://...imdb.../film/2014 movie:releaseDate “1980-05-23”
http://...imdb.../29704 movie:actor name “Jack Nicholson”
. . . . . . . . .
14ICDCS’17/2017-06-07
RDF Example Instance
Prefixes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/
bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/
lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/
Subject Predicate Object
mdb: film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:initial release date “1980-05-23”’
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2014 movie:actor mdb:actor/29704
mdb:film/2014 movie:actor mdb: actor/30013
mdb:film/2014 movie:music contributor mdb: music contributor/4110
mdb:film/2014 foaf:based near geo:2635167
mdb:film/2014 movie:relatedBook bm:0743424425
mdb:film/2014 movie:language lexvo:iso639-3/eng
mdb:director/8476 movie:director name “Stanley Kubrick”
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:film/424 movie:director mdb:director/8476
mdb:film/424 rdfs:label “Spartacus”
mdb:actor/29704 movie:actor name “Jack Nicholson”
mdb:film/1267 movie:actor mdb:actor/29704
mdb:film/1267 rdfs:label “The Last Tycoon”
mdb:film/3418 movie:actor mdb:actor/29704
mdb:film/3418 rdfs:label “The Passenger”
geo:2635167 gn:name “United Kingdom”
geo:2635167 gn:population 62348447
geo:2635167 gn:wikipediaArticle wp:United Kingdom
bm:books/0743424425 dc:creator bm:persons/Stephen+King
bm:books/0743424425 rev:rating 4.7
bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer
lexvo:iso639-3/eng rdfs:label “English”
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn
URI Literal
URI
15ICDCS’17/2017-06-07
RDF Graph
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”
refs:label
mob:music contributor
music contributor
lexvo:iso639 3/eng
language
bm:books/0743424425
4.7
rev:rating
bm:persons/StephenKing
dc:creator
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
wp:UnitedKingdom
gn:wikipediaArticle
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
“Shelley Duvall”
movie:actor name
“English”
rdf:label
lexvo:iso3166/CA
lvont:usedIn
lexvo:script/latin
lvont:usesScript
movie:relatedBook
scam:hasOffer
foaf:based near
movie:actor
movie:director
movie:actor
movie:actor movie:actor
movie:director movie:director
16ICDCS’17/2017-06-07
UniProt in RDF http://www.uniprot.org
UniProt collects data from >150 biological resources
Claim: “lack of a common standard to represent and link
information makes data integration an expensive business” ⇒
RDF can help
17ICDCS’17/2017-06-07
UniProt in RDF – What does the data look like?
UniProt accession for the human CYP51 protein – Q16850
Encode it as RDF:
18ICDCS’17/2017-06-07
http://purl.uniprot.org/uniprot/Q16850.rdf
UniProt in RDF – What does the data look like?
UniProt accession for the human CYP51 protein – Q16850
Encode it as RDF:
XML/RDF format
<rdf:Description 2-¿rdf:about=”http://purl.uniprot.org/citations/8619637”>
<rdf:type 2-¿rdf:resource=”http://purl.uniprot.org/core/Journal Citation”/>
<title>The ubiquitously expressed human CYP51 encodes lanosterol 14 alpha-demethylase, a
cytochrome P450 whose expression is regulated by oxysterols.</title>
<author>Stroemstedt M.</author>
<author>Rozman D.</author>
<author>Waterman M.R.</author>
<skos:exactMatch rdf:resource=”http://purl.uniprot.org/pubmed/8619637”/>
<foaf:primaryTopicOf rdf:resource=”https://www.ncbi.nlm.nih.gov/pubmed/8619637”/>
<dcterms:identifier>doi:10.1006/abbi.1996.0193</dcterms:identifier>
<date rdf:datatype=”http://www.w3.org/2001/XMLSchema#gYear”>1996</date>
<name>Arch. Biochem. Biophys.</name>
<volume>329</volume>
<pages>73-81</pages>
</rdf:Description>
Subject
Predicate
Object
18ICDCS’17/2017-06-07
http://purl.uniprot.org/uniprot/Q16850.rdf
UniProt in RDF – What does the data look like?
UniProt accession for the human CYP51 protein – Q16850
Encode it as RDF:
XML/RDF format
<rdf:Description 2-¿rdf:about=”http://purl.uniprot.org/citations/8619637”>
<rdf:type 2-¿rdf:resource=”http://purl.uniprot.org/core/Journal Citation”/>
<title>The ubiquitously expressed human CYP51 encodes lanosterol 14 alpha-demethylase, a
cytochrome P450 whose expression is regulated by oxysterols.</title>
<author>Stroemstedt M.</author>
<author>Rozman D.</author>
<author>Waterman M.R.</author>
<skos:exactMatch rdf:resource=”http://purl.uniprot.org/pubmed/8619637”/>
<foaf:primaryTopicOf rdf:resource=”https://www.ncbi.nlm.nih.gov/pubmed/8619637”/>
<dcterms:identifier>doi:10.1006/abbi.1996.0193</dcterms:identifier>
<date rdf:datatype=”http://www.w3.org/2001/XMLSchema#gYear”>1996</date>
<name>Arch. Biochem. Biophys.</name>
<volume>329</volume>
<pages>73-81</pages>
</rdf:Description>
This can be shown as a table <Subject, Predicate, Object >
Subject
Predicate
Object
18ICDCS’17/2017-06-07
http://purl.uniprot.org/uniprot/Q16850.rdf
RDF Query Model – SPARQL
Query Model - SPARQL Protocol and RDF Query Language
Given U (set of URIs), L (set of literals), and V (set of
variables), a SPARQL expression is defined recursively:
an atomic triple pattern, which is an element of
(U ∪ V ) × (U ∪ V ) × (U ∪ V ∪ L)
?x rdfs:label “The Shining”
P FILTER R, where P is a graph pattern expression and R is a
built-in SPARQL condition (i.e., analogous to a SQL predicate)
?x rev:rating ?p FILTER(?p > 3.0)
P1 AND/OPT/UNION P2, where P1 and P2 are graph
pattern expressions
19ICDCS’17/2017-06-07
RDF Query Model – SPARQL
Query Model - SPARQL Protocol and RDF Query Language
Given U (set of URIs), L (set of literals), and V (set of
variables), a SPARQL expression is defined recursively:
an atomic triple pattern, which is an element of
(U ∪ V ) × (U ∪ V ) × (U ∪ V ∪ L)
?x rdfs:label “The Shining”
P FILTER R, where P is a graph pattern expression and R is a
built-in SPARQL condition (i.e., analogous to a SQL predicate)
?x rev:rating ?p FILTER(?p > 3.0)
P1 AND/OPT/UNION P2, where P1 and P2 are graph
pattern expressions
Example:
SELECT ?name
WHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .
?d movie : director name ” Stanley Kubrick ” .
?m movie : relatedBook ?b . ?b rev : r a t i n g ? r .
FILTER(? r > 4.0)
}
19ICDCS’17/2017-06-07
SPARQL Queries
SELECT ?name
WHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .
?d movie : director name ” Stanley Kubrick ” .
?m movie : relatedBook ?b . ?b rev : r a t i n g ? r .
FILTER(? r > 4.0)
}
?m ?d
movie:director
?name
rdfs:label
?b
movie:relatedBook
“Stanley Kubrick”
movie:director name
?r
rev:rating
FILTER(?r > 4.0)
20ICDCS’17/2017-06-07
UniProt in RDF – The Data Can Be Queried
RDF encoded UniProt data can be queried using SPARQL:
http://sparql.uniprot.org/sparql
21ICDCS’17/2017-06-07
UniProt in RDF – The Data Can Be Queried
RDF encoded UniProt data can be queried using SPARQL:
http://sparql.uniprot.org/sparql
Get the GO function for Q16850 (from UniProt SPARQL endpoint)
PREFIX upc:< http :// p u r l . u n i p r o t . org / core/>
PREFIX r d f : <http ://www. w3 . org /1999/02/22− rdf −syntax−ns#>
SELECT ? goid ? g o l a b e l
WHERE {
<http :// p u r l . u n i p r o t . org / u n i p r o t /Q16850> a upc : Protein ;
upc : c l a s s i f i e d W i t h ? keyword .
? keyword r d f s : seeAlso ? goid .
? goid r d f s : l a b e l ? g o l a b e l .
}
Find the differential expression of probes and the p
value that map to Q16850 (from Expression Atlas SPARQL endpoint)
PREFIX r d f s : <http ://www. w3 . org /2000/01/ rdf −schema#>
PREFIX a t l a s t e r m s : <http :// r d f . ebi . ac . uk/ terms / a t l a s />
SELECT d i s t i n c t ? valueLabel ? pvalue
WHERE {
? value r d f s : l a b e l ? valueLabel .
? value a t l a s t e r m s : pValue ? pvalue .
? value a t l a s t e r m s : isMeasurementOf ? probe .
? probe a t l a s t e r m s : dbXref <http :// p u r l . u n i p r o t . org / u n i p r o t /Q16850> .
}
ORDER BY ASC(? pvalue )
21ICDCS’17/2017-06-07
Na¨ıve Triple Store Design
SELECT ?name
WHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .
?d movie : director name ” Stanley Kubrick ” .
?m movie : relatedBook ?b . ?b rev : r a t i n g ? r .
FILTER(? r > 4.0)
}
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:initial release date “1980-05-23”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2014 movie:actor mdb:actor/29704
mdb:film/2014 movie:actor mdb: actor/30013
mdb:film/2014 movie:music contributor mdb: music contributor/4110
mdb:film/2014 foaf:based near geo:2635167
mdb:film/2014 movie:relatedBook bm:0743424425
mdb:film/2014 movie:language lexvo:iso639-3/eng
mdb:director/8476 movie:director name “Stanley Kubrick”
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:film/424 movie:director mdb:director/8476
mdb:film/424 rdfs:label “Spartacus”
mdb:actor/29704 movie:actor name “Jack Nicholson”
mdb:film/1267 movie:actor mdb:actor/29704
mdb:film/1267 rdfs:label “The Last Tycoon”
mdb:film/3418 movie:actor mdb:actor/29704
mdb:film/3418 rdfs:label “The Passenger”
geo:2635167 gn:name “United Kingdom”
geo:2635167 gn:population 62348447
geo:2635167 gn:wikipediaArticle wp:United Kingdom
bm:books/0743424425 dc:creator bm:persons/Stephen+King
bm:books/0743424425 rev:rating 4.7
bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer
lexvo:iso639-3/eng rdfs:label “English”
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn
22ICDCS’17/2017-06-07
Na¨ıve Triple Store Design
SELECT ?name
WHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .
?d movie : director name ” Stanley Kubrick ” .
?m movie : relatedBook ?b . ?b rev : r a t i n g ? r .
FILTER(? r > 4.0)
}
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:initial release date “1980-05-23”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2014 movie:actor mdb:actor/29704
mdb:film/2014 movie:actor mdb: actor/30013
mdb:film/2014 movie:music contributor mdb: music contributor/4110
mdb:film/2014 foaf:based near geo:2635167
mdb:film/2014 movie:relatedBook bm:0743424425
mdb:film/2014 movie:language lexvo:iso639-3/eng
mdb:director/8476 movie:director name “Stanley Kubrick”
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:film/424 movie:director mdb:director/8476
mdb:film/424 rdfs:label “Spartacus”
mdb:actor/29704 movie:actor name “Jack Nicholson”
mdb:film/1267 movie:actor mdb:actor/29704
mdb:film/1267 rdfs:label “The Last Tycoon”
mdb:film/3418 movie:actor mdb:actor/29704
mdb:film/3418 rdfs:label “The Passenger”
geo:2635167 gn:name “United Kingdom”
geo:2635167 gn:population 62348447
geo:2635167 gn:wikipediaArticle wp:United Kingdom
bm:books/0743424425 dc:creator bm:persons/Stephen+King
bm:books/0743424425 rev:rating 4.7
bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer
lexvo:iso639-3/eng rdfs:label “English”
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn
SELECT T1 . o b j e c t
FROM
T as T1 , T as T2 , T as T3 ,
T as T4 , T as T5
WHERE T1 . p=” r d f s : l a b e l ”
AND T2 . p=” movie : relatedBook ”
AND T3 . p=” movie : d i r e c t o r ”
AND T4 . p=” rev : r a t i n g ”
AND T5 . p=” movie : d i r e c t o r n a m e ”
AND T1 . s=T2 . s
AND T1 . s=T3 . s
AND T2 . o=T4 . s
AND T3 . o=T5 . s
AND T4 . o > 4.0
AND T5 . o=” S t a n l e y Kubrick ”
22ICDCS’17/2017-06-07
Na¨ıve Triple Store Design
SELECT ?name
WHERE {
?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .
?d movie : director name ” Stanley Kubrick ” .
?m movie : relatedBook ?b . ?b rev : r a t i n g ? r .
FILTER(? r > 4.0)
}
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:initial release date “1980-05-23”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2014 movie:actor mdb:actor/29704
mdb:film/2014 movie:actor mdb: actor/30013
mdb:film/2014 movie:music contributor mdb: music contributor/4110
mdb:film/2014 foaf:based near geo:2635167
mdb:film/2014 movie:relatedBook bm:0743424425
mdb:film/2014 movie:language lexvo:iso639-3/eng
mdb:director/8476 movie:director name “Stanley Kubrick”
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:film/424 movie:director mdb:director/8476
mdb:film/424 rdfs:label “Spartacus”
mdb:actor/29704 movie:actor name “Jack Nicholson”
mdb:film/1267 movie:actor mdb:actor/29704
mdb:film/1267 rdfs:label “The Last Tycoon”
mdb:film/3418 movie:actor mdb:actor/29704
mdb:film/3418 rdfs:label “The Passenger”
geo:2635167 gn:name “United Kingdom”
geo:2635167 gn:population 62348447
geo:2635167 gn:wikipediaArticle wp:United Kingdom
bm:books/0743424425 dc:creator bm:persons/Stephen+King
bm:books/0743424425 rev:rating 4.7
bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer
lexvo:iso639-3/eng rdfs:label “English”
lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA
lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn
SELECT T1 . o b j e c t
FROM
T as T1 , T as T2 , T as T3 ,
T as T4 , T as T5
WHERE T1 . p=” r d f s : l a b e l ”
AND T2 . p=” movie : relatedBook ”
AND T3 . p=” movie : d i r e c t o r ”
AND T4 . p=” rev : r a t i n g ”
AND T5 . p=” movie : d i r e c t o r n a m e ”
AND T1 . s=T2 . s
AND T1 . s=T3 . s
AND T2 . o=T4 . s
AND T3 . o=T5 . s
AND T4 . o > 4.0
AND T5 . o=” S t a n l e y Kubrick ”
Easy to implement
but
too many self-joins!
22ICDCS’17/2017-06-07
Exhaustive Indexing
RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore
[Weiss et al., 2008]
Strings are mapped to ids using a mapping table
Create indexes for permutations of the three columns: SPO,
SOP, PSO, POS, OPS, OSP
Subject Property Object
0 1 2
0 3 4
5 6 7
8 9 5
...
...
...
ID Value
0 mdb: film/2014
1 rdfs:label
2 “The Shining”
3 movie:initial release date
4 “1980-05-23”
5 mdb:director/8476
6 movie:director name
7 “Stanley Kubrick”
8 mdb:film/2685
9 movie:director23ICDCS’17/2017-06-07
Exhaustive Indexing
RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore
[Weiss et al., 2008]
Strings are mapped to ids using a mapping table
Create indexes for permutations of the three columns: SPO,
SOP, PSO, POS, OPS, OSP
Each triple pattern can be answered by a range query
Joins between triple patterns computed using merge join
Subject Property Object
0 1 2
0 3 4
5 6 7
8 9 5
...
...
...
ID Value
0 mdb: film/2014
1 rdfs:label
2 “The Shining”
3 movie:initial release date
4 “1980-05-23”
5 mdb:director/8476
6 movie:director name
7 “Stanley Kubrick”
8 mdb:film/2685
9 movie:director23ICDCS’17/2017-06-07
Exhaustive Indexing
RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore
[Weiss et al., 2008]
Strings are mapped to ids using a mapping table
Create indexes for permutations of the three columns: SPO,
SOP, PSO, POS, OPS, OSP
Each triple pattern can be answered by a range query
Joins between triple patterns computed using merge join
Subject Property Object
0 1 2
0 3 4
5 6 7
8 9 5
...
...
...
ID Value
0 mdb: film/2014
1 rdfs:label
2 “The Shining”
3 movie:initial release date
4 “1980-05-23”
5 mdb:director/8476
6 movie:director name
7 “Stanley Kubrick”
8 mdb:film/2685
9 movie:director
Advantages
Eliminates some of the joins – they become range queries
Merge join is easy and fast
23ICDCS’17/2017-06-07
Exhaustive Indexing
RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore
[Weiss et al., 2008]
Strings are mapped to ids using a mapping table
Create indexes for permutations of the three columns: SPO,
SOP, PSO, POS, OPS, OSP
Each triple pattern can be answered by a range query
Joins between triple patterns computed using merge join
Subject Property Object
0 1 2
0 3 4
5 6 7
8 9 5
...
...
...
ID Value
0 mdb: film/2014
1 rdfs:label
2 “The Shining”
3 movie:initial release date
4 “1980-05-23”
5 mdb:director/8476
6 movie:director name
7 “Stanley Kubrick”
8 mdb:film/2685
9 movie:director
Advantages
Eliminates some of the joins – they become range queries
Merge join is easy and fast
Disadvantages
Space usage
Expensive updates
23ICDCS’17/2017-06-07
Property Tables
Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF
[Bornea et al., 2013]
Clustered property table: group together the properties that
tend to occur in the same (or similar) subjects
Property-class table: cluster the subjects with the same type
of property into one property table
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:actor/29704 movie:actor name “Jack Nicholson”
. . . . . . . . .
Subject refs:label movie:director
mob:film/2014 “The Shining” mob:director/8476
mob:film/2685 “The Clockwork Orange” mob:director/8476
Subject movie:actor name
mdb:actor “Jack Nicholson”
24ICDCS’17/2017-06-07
Property Tables
Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF
[Bornea et al., 2013]
Clustered property table: group together the properties that
tend to occur in the same (or similar) subjects
Property-class table: cluster the subjects with the same type
of property into one property table
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:actor/29704 movie:actor name “Jack Nicholson”
. . . . . . . . .
Subject refs:label movie:director
mob:film/2014 “The Shining” mob:director/8476
mob:film/2685 “The Clockwork Orange” mob:director/8476
Subject movie:actor name
mdb:actor “Jack Nicholson”
Advantages
Fewer joins
If the data is structured, we have a relational system – similar
to normalized relations
24ICDCS’17/2017-06-07
Property Tables
Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF
[Bornea et al., 2013]
Clustered property table: group together the properties that
tend to occur in the same (or similar) subjects
Property-class table: cluster the subjects with the same type
of property into one property table
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:actor/29704 movie:actor name “Jack Nicholson”
. . . . . . . . .
Subject refs:label movie:director
mob:film/2014 “The Shining” mob:director/8476
mob:film/2685 “The Clockwork Orange” mob:director/8476
Subject movie:actor name
mdb:actor “Jack Nicholson”
Advantages
Fewer joins
If the data is structured, we have a relational system – similar
to normalized relations
Disadvantages
Potentially a lot of NULLs
Clustering is not trivial
Multi-valued properties are complicated
24ICDCS’17/2017-06-07
Vertical Partitioning
Binary Tables [Abadi et al., 2007, 2009]:
Grouping by properties – for each property, build a two-column
table, containing both subject and object, ordered by subjects
n two column tables (n is the number of unique properties in
the data)
Subject Property Object
mdb:film/2014 rdfs:label “The Shining”
mdb:film/2014 movie:director mdb:director/8476
mdb:film/2685 movie:director mdb:director/8476
mdb:film/2685 rdfs:label “A Clockwork Orange”
mdb:actor/29704 movie:actor name “Jack Nicholson”
. . . . . . . . .
Subject Object
mdb:film/2014 mdb:director/8476
mdb:film/2685 mdb:director/8476
movie:director
Subject Object
mob:film/2014 “The Shining”
mob:film/2685 “The Clockwork Orange”
refs:label
Subject Object
mdb:actor/29704 “Jack Nicholson”
movie:actor name
25ICDCS’17/2017-06-07
Vertical Partitioning
Binary Tables [Abadi et al., 2007, 2009]:
Grouping by properties – for each property, build a two-column
table, containing both subject and object, ordered by subjects
n two column tables (n is the number of unique properties in
the data)
Advantages
Supports multi-valued properties
No NULLs
No clustering
Read only needed attributes (i.e. less I/O)
Good performance for subject-subject joins
25ICDCS’17/2017-06-07
Vertical Partitioning
Binary Tables [Abadi et al., 2007, 2009]:
Grouping by properties – for each property, build a two-column
table, containing both subject and object, ordered by subjects
n two column tables (n is the number of unique properties in
the data)
Advantages
Supports multi-valued properties
No NULLs
No clustering
Read only needed attributes (i.e. less I/O)
Good performance for subject-subject joins
Disadvantages
Not useful for subject-object joins
Expensive inserts
25ICDCS’17/2017-06-07
Vertical Partitioning
Binary Tables [Abadi et al., 2007, 2009]:
Grouping by properties – for each property, build a two-column
table, containing both subject and object, ordered by subjects
n two column tables (n is the number of unique properties in
the data)
TripleBit [Yuan et al., 2013]:
Create a table with |triple| columns, |objects| + |subjects| rows
with “1” if object/subject exists in triple; groups columns by
predicate
Compress columns (since they are sparse); partition by
predicate, then partition into chunks
(P,S,O) and (P,O,S) indexes on the chunks
25ICDCS’17/2017-06-07
Graph-based Approach [Zou and ¨Ozsu, 2017]
Answering SPARQL query ≡ subgraph matching using
homomorphism
gStore [Zou et al., 2011, 2014], chameleon-db [Alu¸c et al., 2013]
?m ?d
movie:director
?name
rdfs:label
?b
movie:relatedBook
“Stanley Kubrick”
movie:director name
?r
rev:rating
FILTER(?r > 4.0)
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”
refs:label
mob:music contributor
music contributor
lexvo:iso639 3/eng
language
bm:books/0743424425
4.7
rev:rating
bm:persons/StephenKing
dc:creator
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
wp:UnitedKingdom
gn:wikipediaArticle
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
“Shelley Duvall”
movie:actor name
“English”
rdf:label
lexvo:iso3166/CA
lvont:usedIn
lexvo:script/latin
lvont:usesScript
movie:relatedBook
scam:hasOffer
foaf:based near
movie:actor
movie:director
movie:actor
movie:actor movie:actor
movie:director movie:director
Subgraph
M
atching
26ICDCS’17/2017-06-07
Graph-based Approach [Zou and ¨Ozsu, 2017]
Answering SPARQL query ≡ subgraph matching using
homomorphism
gStore [Zou et al., 2011, 2014], chameleon-db [Alu¸c et al., 2013]
?m ?d
movie:director
?name
rdfs:label
?b
movie:relatedBook
“Stanley Kubrick”
movie:director name
?r
rev:rating
FILTER(?r > 4.0)
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”
refs:label
mob:music contributor
music contributor
lexvo:iso639 3/eng
language
bm:books/0743424425
4.7
rev:rating
bm:persons/StephenKing
dc:creator
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
wp:UnitedKingdom
gn:wikipediaArticle
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
“Shelley Duvall”
movie:actor name
“English”
rdf:label
lexvo:iso3166/CA
lvont:usedIn
lexvo:script/latin
lvont:usesScript
movie:relatedBook
scam:hasOffer
foaf:based near
movie:actor
movie:director
movie:actor
movie:actor movie:actor
movie:director movie:director
Subgraph
M
atching
Advantages
Maintains the graph structure
Full set of queries can be handled
26ICDCS’17/2017-06-07
Graph-based Approach [Zou and ¨Ozsu, 2017]
Answering SPARQL query ≡ subgraph matching using
homomorphism
gStore [Zou et al., 2011, 2014], chameleon-db [Alu¸c et al., 2013]
?m ?d
movie:director
?name
rdfs:label
?b
movie:relatedBook
“Stanley Kubrick”
movie:director name
?r
rev:rating
FILTER(?r > 4.0)
mdb:film/2014
“1980-05-23”
movie:initial release date
“The Shining”
refs:label
mob:music contributor
music contributor
lexvo:iso639 3/eng
language
bm:books/0743424425
4.7
rev:rating
bm:persons/StephenKing
dc:creator
bm:offers/0743424425amazonOffer
geo:2635167
“United Kingdom”
gn:name
62348447
gn:population
wp:UnitedKingdom
gn:wikipediaArticle
mdb:actor/29704
“Jack Nicholson”
movie:actor name
mdb:film/3418
“The Passenger”
refs:label
mdb:film/1267
“The Last Tycoon”
refs:label
mdb:director/8476
“Stanley Kubrick”
movie:director name
mdb:film/2685
“A Clockwork Orange”
refs:label
mdb:film/424
“Spartacus”
refs:label
mdb:actor/30013
“Shelley Duvall”
movie:actor name
“English”
rdf:label
lexvo:iso3166/CA
lvont:usedIn
lexvo:script/latin
lvont:usesScript
movie:relatedBook
scam:hasOffer
foaf:based near
movie:actor
movie:director
movie:actor
movie:actor movie:actor
movie:director movie:director
Subgraph
M
atching
Advantages
Maintains the graph structure
Full set of queries can be handled
Disadvantages
Graph pattern matching is expensive
26ICDCS’17/2017-06-07
Two Systems
gStoreSystem Architecture
Offline Online
Storage
Input Input
RDF Parser
RDF Graph
Builder
Encoding
Module
VS*-tree
builder
RDF data
RDF Triples
RDF Graph
Signature Graph
Key-Value
Store
VS*-tree
Store
SPARQL
Parser
SPARQL Query
Encoding
Module
VS*-tree
Query Graph
Filter
Module
Join
Module
Signature Graph
Node Candidate
Results
Fig. 4. System Architecture
bitstrings, denoted as vS ig(u). We encode query Q with the
same encoding method. Consequently, the match between Q
and G can be verified by simply checking the match between
corresponding encoded bitstrings.
Given a vertex u, we encode each of its adjacent edges
e(eLabel, nLabel) into a bitstring, where eLabel is the edge
chameleon-db
Structural Index
...
Vertex Index
Spill Index
ClusterIndexStorageSystem
StorageAdvisor
Query
Engine Plan Generation Evaluation
27ICDCS’17/2017-06-07
Two Systems
gStoreSystem Architecture
Offline Online
Storage
Input Input
RDF Parser
RDF Graph
Builder
Encoding
Module
VS*-tree
builder
RDF data
RDF Triples
RDF Graph
Signature Graph
Key-Value
Store
VS*-tree
Store
SPARQL
Parser
SPARQL Query
Encoding
Module
VS*-tree
Query Graph
Filter
Module
Join
Module
Signature Graph
Node Candidate
Results
Fig. 4. System Architecture
bitstrings, denoted as vS ig(u). We encode query Q with the
same encoding method. Consequently, the match between Q
and G can be verified by simply checking the match between
corresponding encoded bitstrings.
Given a vertex u, we encode each of its adjacent edges
e(eLabel, nLabel) into a bitstring, where eLabel is the edge
12,000 lines of C++ code under
Linux (plus code for SPARQL parser)
Encode each vertex of RDF graph as
a bit array capturing the
neighbourhood relationship (G∗
)
Build a multilevel summary tree index
(VS∗
-tree) to capture “connections”
1111 1111
0110 1111 1101 1101
0000 1110 0110 1001 1100 1001 1001 1101
0000 1000
0000 0100 0000 0010
0010 1000
0100 0001
1000 0001
0000 1001
0100 1000
1001 1000
0001 0100
0001 0001
005
004 006
001
002
003
007
011
008
009
010
d1
1
d2
1 d2
2
d3
1 d3
2 d3
3 d3
4
G3
G2
G1
11101
10010
10001 01100
10000 00001 01100
00010
10000
01000
01000
10000
10000
10000
10000
00010
00100
01000
01000
01000
01000
27ICDCS’17/2017-06-07
Two Systems
gStoreSystem Architecture
Offline Online
Storage
Input Input
RDF Parser
RDF Graph
Builder
Encoding
Module
VS*-tree
builder
RDF data
RDF Triples
RDF Graph
Signature Graph
Key-Value
Store
VS*-tree
Store
SPARQL
Parser
SPARQL Query
Encoding
Module
VS*-tree
Query Graph
Filter
Module
Join
Module
Signature Graph
Node Candidate
Results
Fig. 4. System Architecture
bitstrings, denoted as vS ig(u). We encode query Q with the
same encoding method. Consequently, the match between Q
and G can be verified by simply checking the match between
corresponding encoded bitstrings.
Given a vertex u, we encode each of its adjacent edges
e(eLabel, nLabel) into a bitstring, where eLabel is the edge
Encode the query graph similarly (Q∗
)
Find candidate matching nodes of Q∗
in G∗
using VS*-tree
Multiway join of the candidates
12,000 lines of C++ code under
Linux (plus code for SPARQL parser)
Encode each vertex of RDF graph as
a bit array capturing the
neighbourhood relationship (G∗
)
Build a multilevel summary tree index
(VS∗
-tree) to capture “connections”
1111 1111
0110 1111 1101 1101
0000 1110 0110 1001 1100 1001 1001 1101
0000 1000
0000 0100 0000 0010
0010 1000
0100 0001
1000 0001
0000 1001
0100 1000
1001 1000
0001 0100
0001 0001
005
004 006
001
002
003
007
011
008
009
010
d1
1
d2
1 d2
2
d3
1 d3
2 d3
3 d3
4
G3
G2
G1
11101
10010
10001 01100
10000 00001 01100
00010
10000
01000
01000
10000
10000
10000
10000
00010
00100
01000
01000
01000
01000
27ICDCS’17/2017-06-07
Two Systems
35,000 lines of C++ code under Linux
(plus code for SPARQL 1.0 parser)
Adaptivity to workload due to
variability of Web workloads and the
variability of composition of SPARQL
triple patterns
An experiment [Alu¸c et al., 2014a]
No single system is a sole winner
across all queries
No single system is the sole loser
across all queries, either
2–5 orders of magnitude difference
in the performance between the
best and the worst system for a
given query
The winner in one query may
timeout in another
Performance difference widens as
dataset size increases
Group-by-query approach [Alu¸c et al.,
2014b]
chameleon-db
Structural Index
...
Vertex Index
Spill Index
ClusterIndexStorageSystem
StorageAdvisor
Query
Engine Plan Generation Evaluation
27ICDCS’17/2017-06-07
Remember the Environment
Distributed environment
Some (not all) of the data
sites can process SPARQL
queries – SPARQL
endpoints
See next section
28ICDCS’17/2017-06-07
Remember the Environment
Distributed environment
Some (not all) of the data
sites can process SPARQL
queries – SPARQL
endpoints
See next section
Alternatives
Cloud-based approaches
Data re-distribution +
query decomposition
Data re-distribution +
partial evaluation
28ICDCS’17/2017-06-07
Cloud-based Solutions [Kaoudi and Manolescu, 2015]
RDF data warehouse D is partitioned ({D1, . . . , Dn}) and
placed on cloud platforms (such as HDFS, HBase)
29ICDCS’17/2017-06-07
Cloud-based Solutions [Kaoudi and Manolescu, 2015]
RDF data warehouse D is partitioned ({D1, . . . , Dn}) and
placed on cloud platforms (such as HDFS, HBase)
SPARQL query is run through MapReduce jobs
Data parallel execution
29ICDCS’17/2017-06-07
Cloud-based Solutions [Kaoudi and Manolescu, 2015]
RDF data warehouse D is partitioned ({D1, . . . , Dn}) and
placed on cloud platforms (such as HDFS, HBase)
SPARQL query is run through MapReduce jobs
Data parallel execution
Examples: HARD [Rohloff and Schantz, 2010] , HadoopRDF
[Husain et al., 2011] , EAGRE [Zhang et al., 2013] and
JenaHBase [Khadilkar et al., 2012]
29ICDCS’17/2017-06-07
Cloud-based Solutions [Kaoudi and Manolescu, 2015]
RDF data warehouse D is partitioned ({D1, . . . , Dn}) and
placed on cloud platforms (such as HDFS, HBase)
SPARQL query is run through MapReduce jobs
Data parallel execution
Examples: HARD [Rohloff and Schantz, 2010] , HadoopRDF
[Husain et al., 2011] , EAGRE [Zhang et al., 2013] and
JenaHBase [Khadilkar et al., 2012]
High scalability and fault-tolerance
Possibly low performance since MapReduce is not suitable for
graph processing
29ICDCS’17/2017-06-07
Partition-based Approaches
(Offline) Partition an RDF data warehouse (graph) into
several fragments that are distributed to sites
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
30ICDCS’17/2017-06-07
Partition-based Approaches
(Offline) Partition an RDF data warehouse (graph) into
several fragments that are distributed to sites
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
Partitioning alternatives
Table-based (e.g., [Husain et al., 2011])
Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013])
Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013])
30ICDCS’17/2017-06-07
Partition-based Approaches
(Offline) Partition an RDF data warehouse (graph) into
several fragments that are distributed to sites
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
Partitioning alternatives
Table-based (e.g., [Husain et al., 2011])
Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013])
Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013])
(Online) SPARQL query decomposed Q = {Q1, . . . , Qk} ⇒
query graph is decomposed
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
30ICDCS’17/2017-06-07
Partition-based Approaches
(Offline) Partition an RDF data warehouse (graph) into
several fragments that are distributed to sites
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
Partitioning alternatives
Table-based (e.g., [Husain et al., 2011])
Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013])
Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013])
(Online) SPARQL query decomposed Q = {Q1, . . . , Qk} ⇒
query graph is decomposed
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
Examples: GraphPartition [Huang et al., 2011], WARP [Hose
and Schenkel, 2013] , Partout [Galarraga et al., 2014] ,
Vertex-block [Lee and Liu, 2013]
30ICDCS’17/2017-06-07
Partition-based Approaches
(Offline) Partition an RDF data warehouse (graph) into
several fragments that are distributed to sites
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
Partitioning alternatives
Table-based (e.g., [Husain et al., 2011])
Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013])
Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013])
(Online) SPARQL query decomposed Q = {Q1, . . . , Qk} ⇒
query graph is decomposed
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
Examples: GraphPartition [Huang et al., 2011], WARP [Hose
and Schenkel, 2013] , Partout [Galarraga et al., 2014] ,
Vertex-block [Lee and Liu, 2013]
High performance
Great for parallelizing centralized RDF data
May not be possible to re-partition and re-allocate Web data
(i.e., LOD)
Each approach requires a specific partitioning strategy – no
generic partitioning
Query decomposition may not be easy
30ICDCS’17/2017-06-07
Partial Query Evaluation (PQE)
RDF data warehouse is partitioned and distributed as before
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
SPARQL query is not decomposed
Partial query evaluation – Distributed gStore [Peng et al., 2016]
31ICDCS’17/2017-06-07
Partial Query Evaluation (PQE)
RDF data warehouse is partitioned and distributed as before
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
SPARQL query is not decomposed
Partial query evaluation – Distributed gStore [Peng et al., 2016]
f (x) ⇒ f (s, d) ⇒ f (f (s), d)) ⇒ Final Answerf (s, d)
known inputs unknown inputs
31ICDCS’17/2017-06-07
Partial Query Evaluation (PQE)
RDF data warehouse is partitioned and distributed as before
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
SPARQL query is not decomposed
Partial query evaluation – Distributed gStore [Peng et al., 2016]
f (x) ⇒ f (s, d) ⇒ f (f (s), d)) ⇒ Final Answerf (s, d)
known inputs unknown inputs
f (f (s), d))
partial results
31ICDCS’17/2017-06-07
Partial Query Evaluation (PQE)
RDF data warehouse is partitioned and distributed as before
RDF data D = {D1, . . . , Dn}
Allocate each Di to a site
SPARQL query is not decomposed
Partial query evaluation – Distributed gStore [Peng et al., 2016]
f (x) ⇒ f (s, d) ⇒ f (f (s), d)) ⇒ Final Answerf (s, d)
known inputs unknown inputs
f (f (s), d))
partial results
Query is the function and each Di is the known input
31ICDCS’17/2017-06-07
Distributed SPARQL Using PQE [Peng et al., 2016]
Two steps:
1. Evaluate a query at each site to find local matches
These are local partial matches
D1
D2
D3
D4
32ICDCS’17/2017-06-07
Distributed SPARQL Using PQE [Peng et al., 2016]
Two steps:
1. Evaluate a query at each site to find local matches
These are local partial matches
2. Assemble the partial matches to get final result
Crossing match
Centralized assembly
Distributed assembly
D1
D2
D3
D4
Crossing match
32ICDCS’17/2017-06-07
Distributed SPARQL Using PQE [Peng et al., 2016]
Two steps:
1. Evaluate a query at each site to find local matches
These are local partial matches
2. Assemble the partial matches to get final result
Crossing match
Centralized assembly
Distributed assembly
D1
D2
D3
D4
Crossing match
High performance due to parallelization
Do not have to deal with query decomposition
May not be possible to re-partition and re-allocate Web data
(i.e., LOD)
RDF storage sites need to be modified to handle partial query
processing
32ICDCS’17/2017-06-07
Outline
1 RDF Technology [¨Ozsu, 2016]
Data Warehousing Approach
Distributed RDF Processing
2 Federated RDF Systems
SPARQL Endpoint Federation
General RDF Federation
3 LOD – Live Querying Approach [Hartig, 2013a]
Traversal-based approaches
Index-based approaches
Hybrid approaches
4 Conclusions
33ICDCS’17/2017-06-07
Remember the Environment
Distributed environment
SPARQL endpoints can
process SPARQL queries
Non-SPARQL endpoints
require additional
components
34ICDCS’17/2017-06-07
Remember the Environment
Distributed environment
SPARQL endpoints can
process SPARQL queries
Non-SPARQL endpoints
require additional
components
Issues
Query decomposition
Localization (source
selection)
Result composition
34ICDCS’17/2017-06-07
SPARQL Endpoint Federation
No data re-partitioning/re-distribution
Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint
SPARQL query decomposed Q = {Q1, . . . , Qk}
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID
[Acosta et al., 2011]
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
RDF Sources
35ICDCS’17/2017-06-07
SPARQL Endpoint Federation
No data re-partitioning/re-distribution
Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint
SPARQL query decomposed Q = {Q1, . . . , Qk}
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID
[Acosta et al., 2011]
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
RDF Sources
Control Site
Metadata
35ICDCS’17/2017-06-07
SPARQL Endpoint Federation
No data re-partitioning/re-distribution
Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint
SPARQL query decomposed Q = {Q1, . . . , Qk}
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID
[Acosta et al., 2011]
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
RDF Sources
Control Site
Metadata
Data included at the source
Supported access patterns
Statistical information
· · ·
35ICDCS’17/2017-06-07
SPARQL Endpoint Federation
No data re-partitioning/re-distribution
Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint
SPARQL query decomposed Q = {Q1, . . . , Qk}
Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn}
E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID
[Acosta et al., 2011]
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
RDF Sources
Control Site
Metadata
Data integration approach
May be the only way to proceed if RDF data is already
distributed with autonomous owners
Not all RDF data storage points are SPARQL endpoints
35ICDCS’17/2017-06-07
Not All RDF Storage Sites are SPARQL Endpoints
Use the mediator-wrapper paradigm
Wrappers provide SPARQL endpoint functionality
Mediators may be introduced if wrappers are thin
E.g.: DARQ [Quilitz and Leser, 2008], FedX [Schwarte et al.,
2011b]
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
RDF Storage
A
Wrapper
Wrapper
Mediator
RDF Storage
B
RDF Sources
Control Site
Metadata
36ICDCS’17/2017-06-07
Federated Query Processing
Query
Decomposition &
Source Selection
SPARQL queries
Local Evaluation
Join Partial
Matches
SPARQL matches
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
RDF Sources
Control Site
37ICDCS’17/2017-06-07
Query Decomposition
Each triple pattern has to a set of RDF sources based on the
values of its subject, property, and object.
SELECT ?x ?n
WHERE {
?x g : parentFeature ?k .
?k g : name ”Canada” .
?y sameAs ?x .
?y n : topicPage ?n .
}
38ICDCS’17/2017-06-07
Query Decomposition
Each triple pattern has to a set of RDF sources based on the
values of its subject, property, and object.
SELECT ?x ?n
WHERE {
?x g : parentFeature ?k .
?k g : name ”Canada” .
?y sameAs ?x .
?y n : topicPage ?n .
}
{GeoNames} {GeoNames} {DBPedia,GeoNames,NYTimes,
SWDogFood,LinkedMDB}
{NYTimes}
38ICDCS’17/2017-06-07
Query Decomposition
Each triple pattern has to a set of RDF sources based on the
values of its subject, property, and object.
SELECT ?x ?n
WHERE {
?x g : parentFeature ?k .
?k g : name ”Canada” .
?y sameAs ?x .
?y n : topicPage ?n .
}
{GeoNames} {GeoNames} {DBPedia,GeoNames,NYTimes,
SWDogFood,LinkedMDB}
{NYTimes}
q1
@{GeoNames} q2
@{. . .} q3
@{NYTimes}
SELECT ?x
WHERE {
?x g : parentFeature ?k .
?k g : name ”Canada” .
}
SELECT ?y ?x
WHERE {
?y sameAs ?x .
}
SELECT ?y ?n
WHERE {
?y n : topicPage ?n .
}
38ICDCS’17/2017-06-07
Data Localization
Metadata-based approaches
Use the information in the metadata repository to determine
which sources are relevant
DARQ [Quilitz and Leser, 2008]
QTree [Harth et al., 2010; Prasser et al., 2012]
HiBISCus [Saleem and Ngomo, 2014]
. . .
ASK query-based approach
Asking whether or not a triple pattern has an answer at a
source
FedX [Schwarte et al., 2011a,b]
39ICDCS’17/2017-06-07
Query Processing over Federated RDF Systems
Jamendo
SWDogFood
GeoNames
LinkedMDB
DBPedia NYTimes
Metadata
SELECT ?x ?n
WHERE {
?x g : parentFeature ?k .
?k g : name ”Canada” .
?y sameAs ?x .
?y n : topicPage ?n .
}
SELECT ?x
WHERE {
?x g : parentFeature ?k .
?k g : name ”Canada” .
}
SELECT ?y ?x
WHERE {
?y sameAs ?x .
}
SELECT ?y ?x
WHERE {
?y sameAs ?x .
}
SELECT ?y ?x
WHERE {
?y sameAs ?x .
}
SELECT ?y ?x
WHERE {
?y sameAs ?x .
}
SELECT ?y ?x
WHERE {
?y sameAs ?x .
}
SELECT ?y ?n
WHERE {
?y n : topicPage ?n .
}
40ICDCS’17/2017-06-07
UniProt Federation – EBI RDF Platform
Curated computational models of biological pro-
cesses
Sample information for reference samples and sam-
ples for which data exist in one of the EBI’s assay
databases
Curated chemical database of bioactive molecules
with drug-like properties
Genome databases for vertebrates and other eukary-
otic species
Gene expression data from the Gene Expression Atlas
Curated and peer-reviewed pathways
41ICDCS’17/2017-06-07
Federated Access to UniProt Collection
Get the Reactome pathways where Q16850 is associated, then get all the
other proteins in that pathway and pull out their expression from the atlas,
along with the GO annotations from UniProt
PREFIX r d f : <http ://www. w3 . org /1999/02/22− rdf −syntax−ns#>
PREFIX r d f s : <http ://www. w3 . org /2000/01/ rdf −schema#>
PREFIX biopax3 : <http ://www. biopax . org / r e l e a s e / biopax−l e v e l 3 . owl#>
PREFIX a t l a s t e r m s : <http :// r d f . ebi . ac . uk/ terms / a t l a s />
PREFIX upc:< http :// p u r l . u n i p r o t . org / core/>
SELECT DISTINCT ?pathwayname ? e x p r e s s i o n V a l u e ? g o l a b e l
WHERE {
# Get the pathways that r e f e r e n c e Q16850
? pathway r d f : type biopax3 : Pathway .
? pathway biopax3 : displayName ?pathwayname .
? pathway biopax3 : pathwayComponent
[? r e l [ biopax3 : e n t i t y R e f e r e n c e ? dbXref ] ] .
? pathway biopax3 : pathwayComponent
[? r e l [ biopax3 : e n t i t y R e f e r e n c e <http :// p u r l . u n i p r o t . org / u n i p r o t /Q16850 >]] .
# Get the e x p r e s s i o n f o r those p r o t e i n s
SERVICE <http ://www. e bi . ac . uk/ r d f / s e r v i c e s / a t l a s / sparql > {
? value r d f s : l a b e l ? e x p r e s s i o n V a l u e .
? value a t l a s t e r m s : pValue ? pvalue .
? value a t l a s t e r m s : isMeasurementOf ? probe .
? probe a t l a s t e r m s : dbXref ? dbXref .
}
# get the GO f u n c t i o n s from Uniprot
SERVICE <http :// u n i p r o t . org / sparql > {
? dbXref a upc : Protein ;
upc : c l a s s i f i e d W i t h ? keyword .
? keyword r d f s : seeAlso ? goid .
? goid r d f s : l a b e l ? g o l a b e l .
}
}
42ICDCS’17/2017-06-07
Outline
1 RDF Technology [¨Ozsu, 2016]
Data Warehousing Approach
Distributed RDF Processing
2 Federated RDF Systems
SPARQL Endpoint Federation
General RDF Federation
3 LOD – Live Querying Approach [Hartig, 2013a]
Traversal-based approaches
Index-based approaches
Hybrid approaches
4 Conclusions
43ICDCS’17/2017-06-07
Live Query Processing
Not all data resides at
SPARQL endpoints
Freshness of access to data
important
Potentially countably infinite
data sources
Live querying
On-line execution
Only rely on linked data
principles
Alternatives
Traversal-based
approaches
Index-based approaches
Hybrid approaches
44ICDCS’17/2017-06-07
Linked Data Model [Hartig, 2012]
Web of Linked Data
Given a finite or countably infinite set D of Linked Documents, a
Web of Linked Data is a tuple W = (D, adoc, data) where:
D ⊆ D,
adoc is a partial mapping from URIs to D, and
data is a total mapping from D to finite sets of RDF triples.
45ICDCS’17/2017-06-07
Linked Data Model [Hartig, 2012]
Web of Linked Data
Given a finite or countably infinite set D of Linked Documents, a
Web of Linked Data is a tuple W = (D, adoc, data) where:
D ⊆ D,
adoc is a partial mapping from URIs to D, and
data is a total mapping from D to finite sets of RDF triples.
Data Links
A Web of Linked Data W = (D, adoc, data)
contains a data link from document d ∈ D to
document d ∈ D if there exists a URI u such
that:
u is mentioned in an RDF triple
t ∈ data(d), and
d = adoc(u).
45ICDCS’17/2017-06-07
SPARQL Query Semantics in Live Querying
Full-web semantics
Scope of evaluating a SPARQL expression is all Linked Data
Query result completeness cannot be guaranteed by any
(terminating) execution
46ICDCS’17/2017-06-07
SPARQL Query Semantics in Live Querying
Full-web semantics
Scope of evaluating a SPARQL expression is all Linked Data
Query result completeness cannot be guaranteed by any
(terminating) execution
Reachability-based query semantics
Query consists of a SPARQL expression, a set of seed URIs S,
and a reachability condition c
Scope: all data along paths of data links that satisfy the
condition
Computationally feasible
46ICDCS’17/2017-06-07
Traversal Approaches
Discover relevant URIs recursively
by traversing (specific) data links
at query execution runtime [Hartig,
2013b; Ladwig and Tran, 2011]
Implements reachability-based
query semantics
Start from a set of seed URIs
Recursively follow and discover
new URIs
Important issue is selection of seed
URIs
Retrieved data serves to discover
new URIs and to construct result
47ICDCS’17/2017-06-07
Traversal Approaches
Discover relevant URIs recursively
by traversing (specific) data links
at query execution runtime [Hartig,
2013b; Ladwig and Tran, 2011]
Implements reachability-based
query semantics
Start from a set of seed URIs
Recursively follow and discover
new URIs
Important issue is selection of seed
URIs
Retrieved data serves to discover
new URIs and to construct result
Advantages
Easy to implement.
No data structure to maintain.
47ICDCS’17/2017-06-07
Traversal Approaches
Discover relevant URIs recursively
by traversing (specific) data links
at query execution runtime [Hartig,
2013b; Ladwig and Tran, 2011]
Implements reachability-based
query semantics
Start from a set of seed URIs
Recursively follow and discover
new URIs
Important issue is selection of seed
URIs
Retrieved data serves to discover
new URIs and to construct result
Advantages
Easy to implement.
No data structure to maintain.
Disadvantages
Possibilities for parallelized data retrieval are limited
Repeated data retrieval introduces significant query latency.
47ICDCS’17/2017-06-07
Index Approaches
Use pre-populated index to determine relevant URIs (and to
avoid as many irrelevant ones as possible)
Different index keys possible; e.g., triple patterns [Umbrich
et al., 2011]
Index entries a set of URIs
Indexed URIs may appear multiple times (i.e., associated with
multiple index keys)
Each URI in such an entry may be paired with a cardinality
(utilized for source ranking)
Key: tp Entry: {uri1, uri2, , urin}
GET urii
48ICDCS’17/2017-06-07
Index Approaches
Use pre-populated index to determine relevant URIs (and to
avoid as many irrelevant ones as possible)
Different index keys possible; e.g., triple patterns [Umbrich
et al., 2011]
Index entries a set of URIs
Indexed URIs may appear multiple times (i.e., associated with
multiple index keys)
Each URI in such an entry may be paired with a cardinality
(utilized for source ranking)
Key: tp Entry: {uri1, uri2, , urin}
GET urii
Advantages
Data retrieval can be fully parallelized
Reduces the impact of data retrieval on query execution time
48ICDCS’17/2017-06-07
Index Approaches
Use pre-populated index to determine relevant URIs (and to
avoid as many irrelevant ones as possible)
Different index keys possible; e.g., triple patterns [Umbrich
et al., 2011]
Index entries a set of URIs
Indexed URIs may appear multiple times (i.e., associated with
multiple index keys)
Each URI in such an entry may be paired with a cardinality
(utilized for source ranking)
Key: tp Entry: {uri1, uri2, , urin}
GET urii
Advantages
Data retrieval can be fully parallelized
Reduces the impact of data retrieval on query execution time
Disadvantages
Querying can only start after index construction
Depends on what has been selected for the index
Freshness may be an issue
Index maintenance
48ICDCS’17/2017-06-07
Hybrid Approach
Perform a traversal-based execution using a prioritized list of
URIs to look up [Ladwig and Tran, 2010]
Initial seed from the pre-populated index
Non-seed URIs are ranked by a function based on information
in the index
New discovered URIs that are not in the index are ranked
according to number of referring documents
49ICDCS’17/2017-06-07
Outline
1 RDF Technology [¨Ozsu, 2016]
Data Warehousing Approach
Distributed RDF Processing
2 Federated RDF Systems
SPARQL Endpoint Federation
General RDF Federation
3 LOD – Live Querying Approach [Hartig, 2013a]
Traversal-based approaches
Index-based approaches
Hybrid approaches
4 Conclusions
50ICDCS’17/2017-06-07
Conclusions
RDF and Linked Object Data seem to have considerable
promise for Web data management
There are prototype systems that provide alternative solutions
There are commercial systems as well
See https://www.w3.org/wiki/SparqlImplementations
for a list
More work needs to be done
Query semantics
Adaptive system design
Optimizations – both in data warehousing and distributed
environments
Live querying requires significant thought to reduce latency
51ICDCS’17/2017-06-07
Conclusions
What I did not talk about:
Not much on general distributed/parallel processing
Not much on SPARQL semantics
Nothing about RDFS – no schema stuff
Nothing about entailment regimes > 0 ⇒ no reasoning
52ICDCS’17/2017-06-07
Thank you!
Research supported by
53ICDCS’17/2017-06-07
References I
Abadi, D. J., Marcus, A., Madden, S., and Hollenbach, K. (2009). SW-Store: a
vertically partitioned DBMS for semantic web data management. VLDB J.,
18(2):385–406.
Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2007). Scalable
semantic web data management using vertical partitioning. In Proc. 33rd
Int. Conf. on Very Large Data Bases, pages 411–422.
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., and Ruckhaus, E. (2011).
ANAPSID: an adaptive query processing engine for SPARQL endpoints. In
Proc. 10th Int. Semantic Web Conf., pages 18–34.
Alu¸c, G., Hartig, O., ¨Ozsu, M. T., and Daudjee, K. (2014a). Diversified stress
testing of RDF data management systems. In Proc. 13th Int. Semantic Web
Conf., pages 197–212.
Alu¸c, G., ¨Ozsu, M. T., and Daudjee, K. (2014b). Workload matters: Why RDF
databases need a new design. Proc. VLDB Endowment, 7(10):837–840.
54ICDCS’17/2017-06-07
References II
Alu¸c, G., ¨Ozsu, M. T., Daudjee, K., and Hartig, O. (2013). chameleon-db: a
workload-aware robust RDF data management system. Technical Report
CS-2013-10, University of Waterloo. Available at
https://cs.uwaterloo.ca/sites/ca.computer-science/files/
uploads/files/CS-2013-10.pdf.
Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P.,
Udrea, O., and Bhattacharjee, B. (2013). Building an efficient RDF store
over a relational database. In Proc. ACM SIGMOD Int. Conf. on
Management of Data, pages 121–132.
Galarraga, L., Hose, K., and Schenkel, R. (2014). Partout: a distributed engine
for efficient RDF processing. In Proc. 23rd Int. World Wide Web Conf.
(Companion Volume), pages 267–268.
G¨orlitz, O. and Staab, S. (2011). SPLENDID: SPARQL endpoint federation
exploiting VOID descriptions. In Proc. 2nd Int. Workshop on Consuming
Linked Data.
55ICDCS’17/2017-06-07
References III
Gurajada, S., Seufert, S., Miliaraki, I., and Theobald, M. (2014). TriAD: A
distributed shared-nothing RDF engine based on asynchronous message
passing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages
289–300.
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., and Umbrich, J.
(2010). Data summaries for on-demand queries over linked data. In Proc.
19th Int. World Wide Web Conf., pages 411–420. Available from:
http://doi.acm.org/10.1145/1772690.1772733.
Hartig, O. (2012). SPARQL for a web of linked data: Semantics and
computability. In Proc. 9th Extended Semantic Web Conf., pages 8–23.
Hartig, O. (2013a). An overview on execution strategies for linked data queries.
Datenbank-Spektrum, 13(2):89–99. Available from:
http://dx.doi.org/10.1007/s13222-013-0122-1.
Hartig, O. (2013b). SQUIN: a traversal based query execution system for the
web of linked data. In Proc. ACM SIGMOD Int. Conf. on Management of
Data, pages 1081–1084.
56ICDCS’17/2017-06-07
References IV
Hose, K. and Schenkel, R. (2013). WARP: Workload-aware replication and
partitioning for RDF. In Proc. Workshops of 29th Int. Conf. on Data
Engineering, pages 1–6.
Huang, J., Abadi, D. J., and Ren, K. (2011). Scalable SPARQL querying of
large RDF graphs. Proc. VLDB Endowment, 4(11):1123–1134.
Husain, M. F., McGlothlin, J., Masud, M. M., Khan, L. R., and Thuraisingham,
B. (2011). Heuristics-based query processing for large RDF graphs using
cloud computing. IEEE Trans. Knowl. and Data Eng., 23(9):1312–1327.
Kaoudi, Z. and Manolescu, I. (2015). RDF in the clouds: A survey. VLDB J.,
24:67–91.
Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. M., and Castagna, P.
(2012). Jena-HBase: A distributed, scalable and efficient RDF triple store.
In Proc. International Semantic Web Conference Posters & Demos Track.
Ladwig, G. and Tran, T. (2010). Linked data query processing strategies. In
Proc. 9th Int. Semantic Web Conf., pages 453–469.
Ladwig, G. and Tran, T. (2011). SIHJoin: Querying remote and local linked
data. In Proc. 8th Extended Semantic Web Conf., pages 139–153.
57ICDCS’17/2017-06-07
References V
Lee, K. and Liu, L. (2013). Scaling queries over big RDF graphs with semantic
hash partitioning. Proc. VLDB Endowment, 6(14):1894–1905. Available
from: http://www.vldb.org/pvldb/vol6/p1894-lee.pdf.
Neumann, T. and Weikum, G. (2008). RDF-3X: a RISC-style engine for RDF.
Proc. VLDB Endowment, 1(1):647–659.
Neumann, T. and Weikum, G. (2009). The RDF-3X engine for scalable
management of RDF data. VLDB J., 19(1):91–113.
¨Ozsu, M. T. (2016). A survey of RDF data management systems. Front.
Comput. Sci., 10(3):418–432.
Peng, P., Zou, L., ¨Ozsu, M. T., Chen, L., and Zhao, D. (2016). Processing
SPARQL queries over distributed RDF graphs. VLDB J., 25(2):243–268.
Prasser, F., Kemper, A., and Kuhn, K. A. (2012). Efficient distributed query
processing for autonomous rdf databases. In Proc. 15th Int. Conf. on
Extending Database Technology, pages 372–383.
Quilitz, B. and Leser, U. (2008). Querying distributed RDF data sources with
SPARQL. In Proc. 5th European Semantic Web Conf., pages 524–538.
58ICDCS’17/2017-06-07
References VI
Rohloff, K. and Schantz, R. E. (2010). High-performance, massively scalable
distributed systems using the mapreduce software framework: the shard
triple-store. In Proc. Int. Workshop on Programming Support Innovations
for Emerging Distributed Applications. Article No. 4.
Saleem, M. and Ngomo, A. N. (2014). HiBISCuS: Hypergraph-based source
selection for SPARQL endpoint federation. In Proc. 11th Extended Semantic
Web Conf., pages 176–191. Available from:
http://dx.doi.org/10.1007/978-3-319-07443-6_13.
Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011a).
FedX: A federation layer for distributed query processing on linked open
data. In Proc. 8th Extended Semantic Web Conf., pages 481–486.
Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011b).
Fedx: Optimization techniques for federated query processing on linked
data. In Proc. 10th Int. Semantic Web Conf., pages 601–616. Available
from: https://doi.org/10.1007/978-3-642-25073-6_38.
Umbrich, J., Hose, K., Karnstedt, M., Harth, A., and Polleres, A. (2011).
Comparing data summaries for processing live queries over linked data.
World Wide Web J., 14(5-6):495–544.
59ICDCS’17/2017-06-07
References VII
Weiss, C., Karras, P., and Bernstein, A. (2008). Hexastore: sextuple indexing
for semantic web data management. Proc. VLDB Endowment,
1(1):1008–1019.
Wilkinson, K. (2006). Jena property table implementation. Technical Report
HPL-2006-140, HP Laboratories Palo Alto.
Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., and Liu, L. (2013). TripleBit: a
fast and compact system for large scale RDF data. Proc. VLDB
Endowment, 6(7):517–528. Available from:
http://www.vldb.org/pvldb/vol6/p517-yuan.pdf.
Zhang, X., Chen, L., Tong, Y., and Wang, M. (2013). EAGRE: Towards
scalable I/O efficient SPARQL query evaluation on the cloud. In Proc. 29th
Int. Conf. on Data Engineering, pages 565–576.
Zou, L., Mo, J., Chen, L., ¨Ozsu, M. T., and Zhao, D. (2011). gStore:
answering SPARQL queries via subgraph matching. Proc. VLDB
Endowment, 4(8):482–493.
Zou, L. and ¨Ozsu, M. T. (2017). Graph-based RDF data management. Data
Science and Engineering, 2(1):56–70. Available from:
https://dx.doi.org/10.1007/s41019-016-0029-6.
60ICDCS’17/2017-06-07
References VIII
Zou, L., ¨Ozsu, M. T., Chen, L., Shen, X., Huang, R., and Zhao, D. (2014).
gStore: A graph-based SPARQL query engine. VLDB J., 23(4):565–590.
61ICDCS’17/2017-06-07

Mais conteúdo relacionado

Mais procurados

Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgAI4BD GmbH
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Mathieu d'Aquin
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Linked Data at ISAW: How and Why
Linked Data at ISAW: How and WhyLinked Data at ISAW: How and Why
Linked Data at ISAW: How and Whyparegorios
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...Alison Hitchens
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Jonathan Yu
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5Nadine Ludwig
 
Open hpi semweb-06-part8
Open hpi semweb-06-part8Open hpi semweb-06-part8
Open hpi semweb-06-part8Nadine Ludwig
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madridStefan Gradmann
 
An introduction to Linked Open Data
An introduction to Linked Open DataAn introduction to Linked Open Data
An introduction to Linked Open DataAli Khalili
 

Mais procurados (19)

Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St Petersburg
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Linked Data at ISAW: How and Why
Linked Data at ISAW: How and WhyLinked Data at ISAW: How and Why
Linked Data at ISAW: How and Why
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017Australian Open government and research data pilot survey 2017
Australian Open government and research data pilot survey 2017
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5
 
Open hpi semweb-06-part8
Open hpi semweb-06-part8Open hpi semweb-06-part8
Open hpi semweb-06-part8
 
Clark - Metadata is the Message
Clark - Metadata is the MessageClark - Metadata is the Message
Clark - Metadata is the Message
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
An introduction to Linked Open Data
An introduction to Linked Open DataAn introduction to Linked Open Data
An introduction to Linked Open Data
 

Semelhante a Web Data Management in the RDF Age

2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open DataIvan Herman
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)robin fay
 
Pragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebPragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebMike Bergman
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to PracticeAdrian Stevenson
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 

Semelhante a Web Data Management in the RDF Age (20)

2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
The Future of LOD
The Future of LODThe Future of LOD
The Future of LOD
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Linked Data
Linked DataLinked Data
Linked Data
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Pragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebPragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic Web
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Going for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked MetadataGoing for GOLD - Adventures in Open Linked Metadata
Going for GOLD - Adventures in Open Linked Metadata
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to Practice
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 

Último

Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...SUHANI PANDEY
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...nirzagarg
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...SUHANI PANDEY
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceEscorts Call Girls
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 

Último (20)

Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 

Web Data Management in the RDF Age

  • 1. Web Data Management in RDF Age M. Tamer ¨Ozsu University of Waterloo David R. Cheriton School of Computer Science 1ICDCS’17/2017-06-07
  • 2. Acknowledgements This presentation draws upon collaborative research and discussions with the following colleagues (in alphabetical order) G¨une¸s Alu¸c, University of Waterloo; now at SAP Khuzaima Daudjee, University of Waterloo Olaf Hartig, University of Waterloo; now at Link¨oping Univ. Lei Chen, Hong Kong University of Science & Technology Lei Zou, Peking University 2ICDCS’17/2017-06-07
  • 3. Web Data Management A long term research interest in the DB community 2000 2004 2011 2011 3ICDCS’17/2017-06-07
  • 4. Interest Due to Properties of Web Data Lack of a schema Data is at best “semi-structured” Missing data, additional attributes, similar data but not identical Volatility Changes frequently May conform to one schema now, but not later Scale Does it make sense to talk about a schema for Web? How do you capture “everything”? Querying difficulty What is the user language? What are the primitives? Arent search engines or metasearch engines sufficient? 4ICDCS’17/2017-06-07
  • 5. More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization 5ICDCS’17/2017-06-07
  • 6. More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure <list title="MOVIES"> <film> <title>The Shining</title> <director>Stanley Kubrick</director> <actor>Jack Nicholson</actor> </film> <film> <title>Spartacus</title> <director>Stanley Kubrick</director> </film> <film> <title>The Passenger</title> <actor>Jack Nicholson</actor> </film> ... </list> root film title “The Shining” director “Stanley Kubrick” actor “Jack Nicholson” film ... film title “The Passenger” actor “Jack Nicholson” 5ICDCS’17/2017-06-07
  • 7. More Recent Approaches to Web Querying Fusion Tables Users contribute data in spreadsheet, CVS, KML format Possible joins between multiple data sets Extensive visualization XML Data exchange language Primarily tree based structure Linked Open Data (LOD) W3C work; community effort Maintains autonomy of data sources Low barrier to entry 5ICDCS’17/2017-06-07
  • 8. Traditional Hypertext-based Web Access IMDb World Book Data exposed to the Web via HTML 6ICDCS’17/2017-06-07
  • 9. Linked Data Publishing Principles IMDb World Book (http://...linkedmdb.../Shining,releaseDate, 23 May 1980) (http://...linkedmdb.../Shining, filmLocation, http://cia.../UK) (http://...linkedmdb.../29704,actedIn, http://...linkedmdb.../Shining) ... (http://cia.../UK, hasPopulation, 63230000) ... Shining UK Data model: RDF Global identifier: URI Access mechanism: HTTP Connection: data links 7ICDCS’17/2017-06-07
  • 10. Linked Object Data – Closer Look 8ICDCS’17/2017-06-07
  • 11. LOD Data Volumes . . . . . . are growing – and fast Linked data cloud currently consists of 3000 datasets with >84B triples Size almost doubling every year 9ICDCS’17/2017-06-07
  • 12. LOD Data Volumes . . . . . . are growing – and fast Linked data cloud currently consists of 3000 datasets with >84B triples Size almost doubling every year As of March 2009 LinkedCT Reactome Taxonomy KEGG PubMed GeneID Pfam UniProt OMIM PDB Symbol ChEBI Daily Med Disea- some CAS HGNC Inter Pro Drug Bank UniParc UniRef ProDom PROSITE Gene Ontology Homolo Gene Pub Chem MGI UniSTS GEO Species Jamendo BBC Programm es Music- brainz Magna- tune BBC Later + TOTP Surge Radio MySpace Wrapper Audio- Scrobbler Linked MDB BBC John Peel BBC Playcount Data Gov- Track US Census Data riese Geo- names lingvoj World Fact- book Euro- stat IRIT Toulouse SW Conference Corpus RDF Book Mashup Project Guten- berg DBLP Hannover DBLP Berlin LAAS- CNRS Buda- pest BME IEEE IBM Resex Pisa New- castle RAE 2001 CiteSeer ACM DBLP RKB Explorer eprints LIBRIS Semantic Web.org Eurécom ECS South- ampton RevyuSIOC Sites Doap- space Flickr exporter FOAF profiles flickr wrappr Crunch Base Sem- Web- Central Open- Guides Wiki- company QDOS Pub Guide Open Calais RDF ohloh W3C WordNet Open Cyc UMBEL Yago DBpedia Freebase Virtuoso Sponger March ’09: 89 datasets 9ICDCS’17/2017-06-07 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 13. LOD Data Volumes . . . . . . are growing – and fast Linked data cloud currently consists of 3000 datasets with >84B triples Size almost doubling every year As of September 2010 Music Brainz (zitgist) P20 YAGO World Fact- book (FUB) WordNet (W3C) WordNet (VUA) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UMBEL UK Post- codes legislation .gov.uk Uberblic UB Mann- heim TWC LOGD Twarql transport data.gov .uk totl.net Tele- graphis TCM Gene DIT Taxon Concept The Open Library (Talis) t4gm Surge Radio STW RAMEAU SH statistics data.gov .uk St. Andrews Resource Lists ECS South- ampton EPrints Semantic Crunch Base semantic web.org Semantic XBRL SW Dog Food rdfabout US SEC Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF New- castle LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Course- ware CORDIS CiteSeer Budapest ACM riese Revyu research data.gov .uk reference data.gov .uk Recht- spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup PSH Product DB PBAC Poké- pédia Ord- nance Survey Openly Local The Open Library Open Cyc Open Calais OpenEI New York Times NTU Resource Lists NDL subjects MARC Codes List Man- chester Reading Lists Lotico The London Gazette LOIUS lobid Resources lobid Organi- sations Linked MDB Linked LCCN Linked GeoData Linked CT Linked Open Numbers lingvoj LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Good- win Family Jamendo iServe NSZL Catalog GovTrack GESIS Geo Species Geo Names Geo Linked Data (es) GTAA STITCH SIDER Project Guten- berg (FUB) Medi Care Euro- stat (FUB) Drug Bank Disea- some DBLP (FU Berlin) Daily Med Freebase flickr wrappr Fishes of Texas FanHubz Event- Media EUTC Produc- tions Eurostat EUNIS ESD stan- dards Popula- tion (En- AKTing) NHS (EnAKTing) Mortality (En- AKTing) Energy (En- AKTing) CO2 (En- AKTing) education data.gov .uk ECS South- ampton Gem. Norm- datei data dcs MySpace (DBTune) Music Brainz (DBTune) Magna- tune John Peel (DB Tune) classical (DB Tune) Audio- scrobbler (DBTune) Last.fm Artists (DBTune) DB Tropes dbpedia lite DBpedia Pokedex Airports NASA (Data Incu- bator) Music Brainz (Data Incubator) Moseley Folk Discogs (Data In- cubator) Climbing Linked Data for Intervals Cornetto Chronic- ling America Chem2 Bio2RDF biz. data. gov.uk UniSTS UniRef Uni Path- way UniParc Taxo- nomy UniProt SGD Reactome PubMed Pub Chem PRO- SITE ProDom Pfam PDB OMIM OBO MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Cpd InterPro Homolo Gene HGNC Gene Ontology GeneID Gen Bank ChEBI CAS Affy- metrix BibBase BBC Wildlife Finder BBC Program mes BBC Music rdfabout US Census Media Geographic Publications Government Cross-domain Life sciences User-generated content September ’10: 203 datasets 9ICDCS’17/2017-06-07 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 14. LOD Data Volumes . . . . . . are growing – and fast Linked data cloud currently consists of 3000 datasets with >84B triples Size almost doubling every year As of September 2011 Music Brainz (zitgist) P20 Turismo de Zaragoza yovisto Yahoo! Geo Planet YAGO World Fact- book El Viajero Tourism WordNet (W3C) WordNet (VUA) VIVO UF VIVO Indiana VIVO Cornell VIAF URI Burner Sussex Reading Lists Plymouth Reading Lists UniRef UniProt UMBEL UK Post- codes legislation data.gov.uk Uberblic UB Mann- heim TWC LOGD Twarql transport data.gov. uk Traffic Scotland theses. fr Thesau- rus W totl.net Tele- graphis TCM Gene DIT Taxon Concept Open Library (Talis) tags2con delicious t4gm info Swedish Open Cultural Heritage Surge Radio Sudoc STW RAMEAU SH statistics data.gov. uk St. Andrews Resource Lists ECS South- ampton EPrints SSW Thesaur us Smart Link Slideshare 2RDF semantic web.org Semantic Tweet Semantic XBRL SW Dog Food Source Code Ecosystem Linked Data US SEC (rdfabout) Sears Scotland Geo- graphy Scotland Pupils & Exams Scholaro- meter WordNet (RKB Explorer) Wiki UN/ LOCODE Ulm ECS (RKB Explorer) Roma RISKS RESEX RAE2001 Pisa OS OAI NSF New- castle LAAS KISTI JISC IRIT IEEE IBM Eurécom ERA ePrints dotAC DEPLOY DBLP (RKB Explorer) Crime Reports UK Course- ware CORDIS (RKB Explorer) CiteSeer Budapest ACM riese Revyu research data.gov. ukRen. Energy Genera- tors reference data.gov. uk Recht- spraak. nl RDF ohloh Last.FM (rdfize) RDF Book Mashup Rådata nå! PSH Product Types Ontology Product DB PBAC Poké- pédia patents data.go v.uk Ox Points Ord- nance Survey Openly Local Open Library Open Cyc Open Corpo- rates Open Calais OpenEI Open Election Data Project Open Data Thesau- rus Ontos News Portal OGOLOD Janus AMP Ocean Drilling Codices New York Times NVD ntnusc NTU Resource Lists Norwe- gian MeSH NDL subjects ndlna my Experi- ment Italian Museums medu- cator MARC Codes List Man- chester Reading Lists Lotico Weather Stations London Gazette LOIUS Linked Open Colors lobid Resources lobid Organi- sations LEM Linked MDB LinkedL CCN Linked GeoData LinkedCT Linked User Feedback LOV Linked Open Numbers LODE Eurostat (Ontology Central) Linked EDGAR (Ontology Central) Linked Crunch- base lingvoj Lichfield Spen- ding LIBRIS Lexvo LCSH DBLP (L3S) Linked Sensor Data (Kno.e.sis) Klapp- stuhl- club Good- win Family National Radio- activity JP Jamendo (DBtune) Italian public schools ISTAT Immi- gration iServe IdRef Sudoc NSZL Catalog Hellenic PD Hellenic FBD Piedmont Accomo- dations GovTrack GovWILD Google Art wrapper gnoss GESIS GeoWord Net Geo Species Geo Names Geo Linked Data GEMET GTAA STITCH SIDER Project Guten- berg Medi Care Euro- stat (FUB) EURES Drug Bank Disea- some DBLP (FU Berlin) Daily Med CORDIS (FUB) Freebase flickr wrappr Fishes of Texas Finnish Munici- palities ChEMBL FanHubz Event Media EUTC Produc- tions Eurostat Europeana EUNIS EU Insti- tutions ESD stan- dards EARTh Enipedia Popula- tion (En- AKTing) NHS (En- AKTing) Mortality (En- AKTing) Energy (En- AKTing) Crime (En- AKTing) CO2 Emission (En- AKTing) EEA SISVU educatio n.data.g ov.uk ECS South- ampton ECCO- TCP GND Didactal ia DDC Deutsche Bio- graphie data dcs Music Brainz (DBTune) Magna- tune John Peel (DBTune) Classical (DB Tune) Audio Scrobbler (DBTune) Last.FM artists (DBTune) DB Tropes Portu- guese DBpedia dbpedia lite Greek DBpedia DBpedia data- open- ac-uk SMC Journals Pokedex Airports NASA (Data Incu- bator) Music Brainz (Data Incubator) Moseley Folk Metoffice Weather Forecasts Discogs (Data Incubator) Climbing data.gov.uk intervals Data Gov.ie data bnf.fr Cornetto reegle Chronic- ling America Chem2 Bio2RDF Calames business data.gov. uk Bricklink Brazilian Poli- ticians BNB UniSTS UniPath way UniParc Taxono my UniProt (Bio2RDF) SGD Reactome PubMed Pub Chem PRO- SITE ProDom Pfam PDB OMIM MGI KEGG Reaction KEGG Pathway KEGG Glycan KEGG Enzyme KEGG Drug KEGG Com- pound InterPro Homolo Gene HGNC Gene Ontology GeneID Affy- metrix bible ontology BibBase FTS BBC Wildlife Finder BBC Program mes BBC Music Alpine Ski Austria LOCAH Amster- dam Museum AGROV OC AEMET US Census (rdfabout) Media Geographic Publications Government Cross-domain Life sciences User-generated content September ’11: 295 datasets, 25B triples 9ICDCS’17/2017-06-07 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 15. LOD Data Volumes . . . . . . are growing – and fast Linked data cloud currently consists of 3000 datasets with >84B triples Size almost doubling every year April ’14: 570 datasets, ??? triples 9ICDCS’17/2017-06-07 Max Schmachtenberg, Christian Bizer, and Heiko Paulheim: Adoption of Linked Data Best Practices in Different Topical Domains. In Proc. ISWC, 2014.
  • 16. Globally Distributed Network of Data 10ICDCS’17/2017-06-07
  • 17. Outline 1 RDF Technology [¨Ozsu, 2016] Data Warehousing Approach Distributed RDF Processing 2 Federated RDF Systems SPARQL Endpoint Federation General RDF Federation 3 LOD – Live Querying Approach [Hartig, 2013a] Traversal-based approaches Index-based approaches Hybrid approaches 4 Conclusions 11ICDCS’17/2017-06-07
  • 18. Outline 1 RDF Technology [¨Ozsu, 2016] Data Warehousing Approach Distributed RDF Processing 2 Federated RDF Systems SPARQL Endpoint Federation General RDF Federation 3 LOD – Live Querying Approach [Hartig, 2013a] Traversal-based approaches Index-based approaches Hybrid approaches 4 Conclusions 12ICDCS’17/2017-06-07
  • 19. RDF Introduction Everything is an uniquely named resource http://data.linkedmdb.org/resource/actor/JN29704 13ICDCS’17/2017-06-07
  • 20. RDF Introduction Everything is an uniquely named resource Prefixes can be used to shorten the names xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 13ICDCS’17/2017-06-07
  • 21. RDF Introduction Everything is an uniquely named resource Prefixes can be used to shorten the names Properties of resources can be defined xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName “Jack Nicholson” y:JN29704:BornOnDate “1937-04-22” 13ICDCS’17/2017-06-07
  • 22. RDF Introduction Everything is an uniquely named resource Prefixes can be used to shorten the names Properties of resources can be defined Relationships with other resources can be defined xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName “Jack Nicholson” y:JN29704:BornOnDate “1937-04-22” y:TS2014:title “The Shining” y:TS2014:releaseDate “1980-05-23” y:TS2014 JN29704:movieActor 13ICDCS’17/2017-06-07
  • 23. RDF Introduction Everything is an uniquely named resource Prefixes can be used to shorten the names Properties of resources can be defined Relationships with other resources can be defined Resource descriptions can be contributed by different people/groups and can be located anywhere in the web Integrated web “database” xmlns:y=http://data.linkedmdb.org/resource/actor/ y:JN29704 y:JN29704:hasName “Jack Nicholson” y:JN29704:BornOnDate “1937-04-22” y:TS2014:title “The Shining” y:TS2014:releaseDate “1980-05-23” y:TS2014 JN29704:movieActor 13ICDCS’17/2017-06-07
  • 24. RDF Data Model Triple: Subject, Predicate (Property), Object (s, p, o) Subject: the entity that is described (URI or blank node) Predicate: a feature of the entity (URI) Object: value of the feature (URI, blank node or literal) (s, p, o) ∈ (U ∪ B) × U × (U ∪ B ∪ L) Set of RDF triples is called an RDF graph U Subject Object U B U B L U: set of URIs B: set of blank nodes L: set of literals Predicate Subject Predicate Object http://...imdb.../film/2014 rdfs:label “The Shining” http://...imdb.../film/2014 movie:releaseDate “1980-05-23” http://...imdb.../29704 movie:actor name “Jack Nicholson” . . . . . . . . . 14ICDCS’17/2017-06-07
  • 25. RDF Example Instance Prefixes: mdb=http://data.linkedmdb.org/resource/; geo=http://sws.geonames.org/ bm=http://wifo5-03.informatik.uni-mannheim.de/bookmashup/ lexvo=http://lexvo.org/id/;wp=http://en.wikipedia.org/wiki/ Subject Predicate Object mdb: film/2014 rdfs:label “The Shining” mdb:film/2014 movie:initial release date “1980-05-23”’ mdb:film/2014 movie:director mdb:director/8476 mdb:film/2014 movie:actor mdb:actor/29704 mdb:film/2014 movie:actor mdb: actor/30013 mdb:film/2014 movie:music contributor mdb: music contributor/4110 mdb:film/2014 foaf:based near geo:2635167 mdb:film/2014 movie:relatedBook bm:0743424425 mdb:film/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name “Stanley Kubrick” mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:film/424 movie:director mdb:director/8476 mdb:film/424 rdfs:label “Spartacus” mdb:actor/29704 movie:actor name “Jack Nicholson” mdb:film/1267 movie:actor mdb:actor/29704 mdb:film/1267 rdfs:label “The Last Tycoon” mdb:film/3418 movie:actor mdb:actor/29704 mdb:film/3418 rdfs:label “The Passenger” geo:2635167 gn:name “United Kingdom” geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer lexvo:iso639-3/eng rdfs:label “English” lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn URI Literal URI 15ICDCS’17/2017-06-07
  • 26. RDF Graph mdb:film/2014 “1980-05-23” movie:initial release date “The Shining” refs:label mob:music contributor music contributor lexvo:iso639 3/eng language bm:books/0743424425 4.7 rev:rating bm:persons/StephenKing dc:creator bm:offers/0743424425amazonOffer geo:2635167 “United Kingdom” gn:name 62348447 gn:population wp:UnitedKingdom gn:wikipediaArticle mdb:actor/29704 “Jack Nicholson” movie:actor name mdb:film/3418 “The Passenger” refs:label mdb:film/1267 “The Last Tycoon” refs:label mdb:director/8476 “Stanley Kubrick” movie:director name mdb:film/2685 “A Clockwork Orange” refs:label mdb:film/424 “Spartacus” refs:label mdb:actor/30013 “Shelley Duvall” movie:actor name “English” rdf:label lexvo:iso3166/CA lvont:usedIn lexvo:script/latin lvont:usesScript movie:relatedBook scam:hasOffer foaf:based near movie:actor movie:director movie:actor movie:actor movie:actor movie:director movie:director 16ICDCS’17/2017-06-07
  • 27. UniProt in RDF http://www.uniprot.org UniProt collects data from >150 biological resources Claim: “lack of a common standard to represent and link information makes data integration an expensive business” ⇒ RDF can help 17ICDCS’17/2017-06-07
  • 28. UniProt in RDF – What does the data look like? UniProt accession for the human CYP51 protein – Q16850 Encode it as RDF: 18ICDCS’17/2017-06-07 http://purl.uniprot.org/uniprot/Q16850.rdf
  • 29. UniProt in RDF – What does the data look like? UniProt accession for the human CYP51 protein – Q16850 Encode it as RDF: XML/RDF format <rdf:Description 2-¿rdf:about=”http://purl.uniprot.org/citations/8619637”> <rdf:type 2-¿rdf:resource=”http://purl.uniprot.org/core/Journal Citation”/> <title>The ubiquitously expressed human CYP51 encodes lanosterol 14 alpha-demethylase, a cytochrome P450 whose expression is regulated by oxysterols.</title> <author>Stroemstedt M.</author> <author>Rozman D.</author> <author>Waterman M.R.</author> <skos:exactMatch rdf:resource=”http://purl.uniprot.org/pubmed/8619637”/> <foaf:primaryTopicOf rdf:resource=”https://www.ncbi.nlm.nih.gov/pubmed/8619637”/> <dcterms:identifier>doi:10.1006/abbi.1996.0193</dcterms:identifier> <date rdf:datatype=”http://www.w3.org/2001/XMLSchema#gYear”>1996</date> <name>Arch. Biochem. Biophys.</name> <volume>329</volume> <pages>73-81</pages> </rdf:Description> Subject Predicate Object 18ICDCS’17/2017-06-07 http://purl.uniprot.org/uniprot/Q16850.rdf
  • 30. UniProt in RDF – What does the data look like? UniProt accession for the human CYP51 protein – Q16850 Encode it as RDF: XML/RDF format <rdf:Description 2-¿rdf:about=”http://purl.uniprot.org/citations/8619637”> <rdf:type 2-¿rdf:resource=”http://purl.uniprot.org/core/Journal Citation”/> <title>The ubiquitously expressed human CYP51 encodes lanosterol 14 alpha-demethylase, a cytochrome P450 whose expression is regulated by oxysterols.</title> <author>Stroemstedt M.</author> <author>Rozman D.</author> <author>Waterman M.R.</author> <skos:exactMatch rdf:resource=”http://purl.uniprot.org/pubmed/8619637”/> <foaf:primaryTopicOf rdf:resource=”https://www.ncbi.nlm.nih.gov/pubmed/8619637”/> <dcterms:identifier>doi:10.1006/abbi.1996.0193</dcterms:identifier> <date rdf:datatype=”http://www.w3.org/2001/XMLSchema#gYear”>1996</date> <name>Arch. Biochem. Biophys.</name> <volume>329</volume> <pages>73-81</pages> </rdf:Description> This can be shown as a table <Subject, Predicate, Object > Subject Predicate Object 18ICDCS’17/2017-06-07 http://purl.uniprot.org/uniprot/Q16850.rdf
  • 31. RDF Query Model – SPARQL Query Model - SPARQL Protocol and RDF Query Language Given U (set of URIs), L (set of literals), and V (set of variables), a SPARQL expression is defined recursively: an atomic triple pattern, which is an element of (U ∪ V ) × (U ∪ V ) × (U ∪ V ∪ L) ?x rdfs:label “The Shining” P FILTER R, where P is a graph pattern expression and R is a built-in SPARQL condition (i.e., analogous to a SQL predicate) ?x rev:rating ?p FILTER(?p > 3.0) P1 AND/OPT/UNION P2, where P1 and P2 are graph pattern expressions 19ICDCS’17/2017-06-07
  • 32. RDF Query Model – SPARQL Query Model - SPARQL Protocol and RDF Query Language Given U (set of URIs), L (set of literals), and V (set of variables), a SPARQL expression is defined recursively: an atomic triple pattern, which is an element of (U ∪ V ) × (U ∪ V ) × (U ∪ V ∪ L) ?x rdfs:label “The Shining” P FILTER R, where P is a graph pattern expression and R is a built-in SPARQL condition (i.e., analogous to a SQL predicate) ?x rev:rating ?p FILTER(?p > 3.0) P1 AND/OPT/UNION P2, where P1 and P2 are graph pattern expressions Example: SELECT ?name WHERE { ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : director name ” Stanley Kubrick ” . ?m movie : relatedBook ?b . ?b rev : r a t i n g ? r . FILTER(? r > 4.0) } 19ICDCS’17/2017-06-07
  • 33. SPARQL Queries SELECT ?name WHERE { ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : director name ” Stanley Kubrick ” . ?m movie : relatedBook ?b . ?b rev : r a t i n g ? r . FILTER(? r > 4.0) } ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook “Stanley Kubrick” movie:director name ?r rev:rating FILTER(?r > 4.0) 20ICDCS’17/2017-06-07
  • 34. UniProt in RDF – The Data Can Be Queried RDF encoded UniProt data can be queried using SPARQL: http://sparql.uniprot.org/sparql 21ICDCS’17/2017-06-07
  • 35. UniProt in RDF – The Data Can Be Queried RDF encoded UniProt data can be queried using SPARQL: http://sparql.uniprot.org/sparql Get the GO function for Q16850 (from UniProt SPARQL endpoint) PREFIX upc:< http :// p u r l . u n i p r o t . org / core/> PREFIX r d f : <http ://www. w3 . org /1999/02/22− rdf −syntax−ns#> SELECT ? goid ? g o l a b e l WHERE { <http :// p u r l . u n i p r o t . org / u n i p r o t /Q16850> a upc : Protein ; upc : c l a s s i f i e d W i t h ? keyword . ? keyword r d f s : seeAlso ? goid . ? goid r d f s : l a b e l ? g o l a b e l . } Find the differential expression of probes and the p value that map to Q16850 (from Expression Atlas SPARQL endpoint) PREFIX r d f s : <http ://www. w3 . org /2000/01/ rdf −schema#> PREFIX a t l a s t e r m s : <http :// r d f . ebi . ac . uk/ terms / a t l a s /> SELECT d i s t i n c t ? valueLabel ? pvalue WHERE { ? value r d f s : l a b e l ? valueLabel . ? value a t l a s t e r m s : pValue ? pvalue . ? value a t l a s t e r m s : isMeasurementOf ? probe . ? probe a t l a s t e r m s : dbXref <http :// p u r l . u n i p r o t . org / u n i p r o t /Q16850> . } ORDER BY ASC(? pvalue ) 21ICDCS’17/2017-06-07
  • 36. Na¨ıve Triple Store Design SELECT ?name WHERE { ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : director name ” Stanley Kubrick ” . ?m movie : relatedBook ?b . ?b rev : r a t i n g ? r . FILTER(? r > 4.0) } Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:initial release date “1980-05-23” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2014 movie:actor mdb:actor/29704 mdb:film/2014 movie:actor mdb: actor/30013 mdb:film/2014 movie:music contributor mdb: music contributor/4110 mdb:film/2014 foaf:based near geo:2635167 mdb:film/2014 movie:relatedBook bm:0743424425 mdb:film/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name “Stanley Kubrick” mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:film/424 movie:director mdb:director/8476 mdb:film/424 rdfs:label “Spartacus” mdb:actor/29704 movie:actor name “Jack Nicholson” mdb:film/1267 movie:actor mdb:actor/29704 mdb:film/1267 rdfs:label “The Last Tycoon” mdb:film/3418 movie:actor mdb:actor/29704 mdb:film/3418 rdfs:label “The Passenger” geo:2635167 gn:name “United Kingdom” geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer lexvo:iso639-3/eng rdfs:label “English” lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn 22ICDCS’17/2017-06-07
  • 37. Na¨ıve Triple Store Design SELECT ?name WHERE { ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : director name ” Stanley Kubrick ” . ?m movie : relatedBook ?b . ?b rev : r a t i n g ? r . FILTER(? r > 4.0) } Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:initial release date “1980-05-23” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2014 movie:actor mdb:actor/29704 mdb:film/2014 movie:actor mdb: actor/30013 mdb:film/2014 movie:music contributor mdb: music contributor/4110 mdb:film/2014 foaf:based near geo:2635167 mdb:film/2014 movie:relatedBook bm:0743424425 mdb:film/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name “Stanley Kubrick” mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:film/424 movie:director mdb:director/8476 mdb:film/424 rdfs:label “Spartacus” mdb:actor/29704 movie:actor name “Jack Nicholson” mdb:film/1267 movie:actor mdb:actor/29704 mdb:film/1267 rdfs:label “The Last Tycoon” mdb:film/3418 movie:actor mdb:actor/29704 mdb:film/3418 rdfs:label “The Passenger” geo:2635167 gn:name “United Kingdom” geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer lexvo:iso639-3/eng rdfs:label “English” lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p=” r d f s : l a b e l ” AND T2 . p=” movie : relatedBook ” AND T3 . p=” movie : d i r e c t o r ” AND T4 . p=” rev : r a t i n g ” AND T5 . p=” movie : d i r e c t o r n a m e ” AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o > 4.0 AND T5 . o=” S t a n l e y Kubrick ” 22ICDCS’17/2017-06-07
  • 38. Na¨ıve Triple Store Design SELECT ?name WHERE { ?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d . ?d movie : director name ” Stanley Kubrick ” . ?m movie : relatedBook ?b . ?b rev : r a t i n g ? r . FILTER(? r > 4.0) } Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:initial release date “1980-05-23” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2014 movie:actor mdb:actor/29704 mdb:film/2014 movie:actor mdb: actor/30013 mdb:film/2014 movie:music contributor mdb: music contributor/4110 mdb:film/2014 foaf:based near geo:2635167 mdb:film/2014 movie:relatedBook bm:0743424425 mdb:film/2014 movie:language lexvo:iso639-3/eng mdb:director/8476 movie:director name “Stanley Kubrick” mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:film/424 movie:director mdb:director/8476 mdb:film/424 rdfs:label “Spartacus” mdb:actor/29704 movie:actor name “Jack Nicholson” mdb:film/1267 movie:actor mdb:actor/29704 mdb:film/1267 rdfs:label “The Last Tycoon” mdb:film/3418 movie:actor mdb:actor/29704 mdb:film/3418 rdfs:label “The Passenger” geo:2635167 gn:name “United Kingdom” geo:2635167 gn:population 62348447 geo:2635167 gn:wikipediaArticle wp:United Kingdom bm:books/0743424425 dc:creator bm:persons/Stephen+King bm:books/0743424425 rev:rating 4.7 bm:books/0743424425 scom:hasOffer bm:offers/0743424425amazonOffer lexvo:iso639-3/eng rdfs:label “English” lexvo:iso639-3/eng lvont:usedIn lexvo:iso3166/CA lexvo:iso639-3/eng lvont:usesScript lexvo:script/Latn SELECT T1 . o b j e c t FROM T as T1 , T as T2 , T as T3 , T as T4 , T as T5 WHERE T1 . p=” r d f s : l a b e l ” AND T2 . p=” movie : relatedBook ” AND T3 . p=” movie : d i r e c t o r ” AND T4 . p=” rev : r a t i n g ” AND T5 . p=” movie : d i r e c t o r n a m e ” AND T1 . s=T2 . s AND T1 . s=T3 . s AND T2 . o=T4 . s AND T3 . o=T5 . s AND T4 . o > 4.0 AND T5 . o=” S t a n l e y Kubrick ” Easy to implement but too many self-joins! 22ICDCS’17/2017-06-07
  • 39. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Create indexes for permutations of the three columns: SPO, SOP, PSO, POS, OPS, OSP Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb: film/2014 1 rdfs:label 2 “The Shining” 3 movie:initial release date 4 “1980-05-23” 5 mdb:director/8476 6 movie:director name 7 “Stanley Kubrick” 8 mdb:film/2685 9 movie:director23ICDCS’17/2017-06-07
  • 40. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Create indexes for permutations of the three columns: SPO, SOP, PSO, POS, OPS, OSP Each triple pattern can be answered by a range query Joins between triple patterns computed using merge join Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb: film/2014 1 rdfs:label 2 “The Shining” 3 movie:initial release date 4 “1980-05-23” 5 mdb:director/8476 6 movie:director name 7 “Stanley Kubrick” 8 mdb:film/2685 9 movie:director23ICDCS’17/2017-06-07
  • 41. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Create indexes for permutations of the three columns: SPO, SOP, PSO, POS, OPS, OSP Each triple pattern can be answered by a range query Joins between triple patterns computed using merge join Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb: film/2014 1 rdfs:label 2 “The Shining” 3 movie:initial release date 4 “1980-05-23” 5 mdb:director/8476 6 movie:director name 7 “Stanley Kubrick” 8 mdb:film/2685 9 movie:director Advantages Eliminates some of the joins – they become range queries Merge join is easy and fast 23ICDCS’17/2017-06-07
  • 42. Exhaustive Indexing RDF-3X [Neumann and Weikum, 2008, 2009], Hexastore [Weiss et al., 2008] Strings are mapped to ids using a mapping table Create indexes for permutations of the three columns: SPO, SOP, PSO, POS, OPS, OSP Each triple pattern can be answered by a range query Joins between triple patterns computed using merge join Subject Property Object 0 1 2 0 3 4 5 6 7 8 9 5 ... ... ... ID Value 0 mdb: film/2014 1 rdfs:label 2 “The Shining” 3 movie:initial release date 4 “1980-05-23” 5 mdb:director/8476 6 movie:director name 7 “Stanley Kubrick” 8 mdb:film/2685 9 movie:director Advantages Eliminates some of the joins – they become range queries Merge join is easy and fast Disadvantages Space usage Expensive updates 23ICDCS’17/2017-06-07
  • 43. Property Tables Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:actor/29704 movie:actor name “Jack Nicholson” . . . . . . . . . Subject refs:label movie:director mob:film/2014 “The Shining” mob:director/8476 mob:film/2685 “The Clockwork Orange” mob:director/8476 Subject movie:actor name mdb:actor “Jack Nicholson” 24ICDCS’17/2017-06-07
  • 44. Property Tables Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:actor/29704 movie:actor name “Jack Nicholson” . . . . . . . . . Subject refs:label movie:director mob:film/2014 “The Shining” mob:director/8476 mob:film/2685 “The Clockwork Orange” mob:director/8476 Subject movie:actor name mdb:actor “Jack Nicholson” Advantages Fewer joins If the data is structured, we have a relational system – similar to normalized relations 24ICDCS’17/2017-06-07
  • 45. Property Tables Grouping by entities; Jena [Wilkinson, 2006], DB2-RDF [Bornea et al., 2013] Clustered property table: group together the properties that tend to occur in the same (or similar) subjects Property-class table: cluster the subjects with the same type of property into one property table Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:actor/29704 movie:actor name “Jack Nicholson” . . . . . . . . . Subject refs:label movie:director mob:film/2014 “The Shining” mob:director/8476 mob:film/2685 “The Clockwork Orange” mob:director/8476 Subject movie:actor name mdb:actor “Jack Nicholson” Advantages Fewer joins If the data is structured, we have a relational system – similar to normalized relations Disadvantages Potentially a lot of NULLs Clustering is not trivial Multi-valued properties are complicated 24ICDCS’17/2017-06-07
  • 46. Vertical Partitioning Binary Tables [Abadi et al., 2007, 2009]: Grouping by properties – for each property, build a two-column table, containing both subject and object, ordered by subjects n two column tables (n is the number of unique properties in the data) Subject Property Object mdb:film/2014 rdfs:label “The Shining” mdb:film/2014 movie:director mdb:director/8476 mdb:film/2685 movie:director mdb:director/8476 mdb:film/2685 rdfs:label “A Clockwork Orange” mdb:actor/29704 movie:actor name “Jack Nicholson” . . . . . . . . . Subject Object mdb:film/2014 mdb:director/8476 mdb:film/2685 mdb:director/8476 movie:director Subject Object mob:film/2014 “The Shining” mob:film/2685 “The Clockwork Orange” refs:label Subject Object mdb:actor/29704 “Jack Nicholson” movie:actor name 25ICDCS’17/2017-06-07
  • 47. Vertical Partitioning Binary Tables [Abadi et al., 2007, 2009]: Grouping by properties – for each property, build a two-column table, containing both subject and object, ordered by subjects n two column tables (n is the number of unique properties in the data) Advantages Supports multi-valued properties No NULLs No clustering Read only needed attributes (i.e. less I/O) Good performance for subject-subject joins 25ICDCS’17/2017-06-07
  • 48. Vertical Partitioning Binary Tables [Abadi et al., 2007, 2009]: Grouping by properties – for each property, build a two-column table, containing both subject and object, ordered by subjects n two column tables (n is the number of unique properties in the data) Advantages Supports multi-valued properties No NULLs No clustering Read only needed attributes (i.e. less I/O) Good performance for subject-subject joins Disadvantages Not useful for subject-object joins Expensive inserts 25ICDCS’17/2017-06-07
  • 49. Vertical Partitioning Binary Tables [Abadi et al., 2007, 2009]: Grouping by properties – for each property, build a two-column table, containing both subject and object, ordered by subjects n two column tables (n is the number of unique properties in the data) TripleBit [Yuan et al., 2013]: Create a table with |triple| columns, |objects| + |subjects| rows with “1” if object/subject exists in triple; groups columns by predicate Compress columns (since they are sparse); partition by predicate, then partition into chunks (P,S,O) and (P,O,S) indexes on the chunks 25ICDCS’17/2017-06-07
  • 50. Graph-based Approach [Zou and ¨Ozsu, 2017] Answering SPARQL query ≡ subgraph matching using homomorphism gStore [Zou et al., 2011, 2014], chameleon-db [Alu¸c et al., 2013] ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook “Stanley Kubrick” movie:director name ?r rev:rating FILTER(?r > 4.0) mdb:film/2014 “1980-05-23” movie:initial release date “The Shining” refs:label mob:music contributor music contributor lexvo:iso639 3/eng language bm:books/0743424425 4.7 rev:rating bm:persons/StephenKing dc:creator bm:offers/0743424425amazonOffer geo:2635167 “United Kingdom” gn:name 62348447 gn:population wp:UnitedKingdom gn:wikipediaArticle mdb:actor/29704 “Jack Nicholson” movie:actor name mdb:film/3418 “The Passenger” refs:label mdb:film/1267 “The Last Tycoon” refs:label mdb:director/8476 “Stanley Kubrick” movie:director name mdb:film/2685 “A Clockwork Orange” refs:label mdb:film/424 “Spartacus” refs:label mdb:actor/30013 “Shelley Duvall” movie:actor name “English” rdf:label lexvo:iso3166/CA lvont:usedIn lexvo:script/latin lvont:usesScript movie:relatedBook scam:hasOffer foaf:based near movie:actor movie:director movie:actor movie:actor movie:actor movie:director movie:director Subgraph M atching 26ICDCS’17/2017-06-07
  • 51. Graph-based Approach [Zou and ¨Ozsu, 2017] Answering SPARQL query ≡ subgraph matching using homomorphism gStore [Zou et al., 2011, 2014], chameleon-db [Alu¸c et al., 2013] ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook “Stanley Kubrick” movie:director name ?r rev:rating FILTER(?r > 4.0) mdb:film/2014 “1980-05-23” movie:initial release date “The Shining” refs:label mob:music contributor music contributor lexvo:iso639 3/eng language bm:books/0743424425 4.7 rev:rating bm:persons/StephenKing dc:creator bm:offers/0743424425amazonOffer geo:2635167 “United Kingdom” gn:name 62348447 gn:population wp:UnitedKingdom gn:wikipediaArticle mdb:actor/29704 “Jack Nicholson” movie:actor name mdb:film/3418 “The Passenger” refs:label mdb:film/1267 “The Last Tycoon” refs:label mdb:director/8476 “Stanley Kubrick” movie:director name mdb:film/2685 “A Clockwork Orange” refs:label mdb:film/424 “Spartacus” refs:label mdb:actor/30013 “Shelley Duvall” movie:actor name “English” rdf:label lexvo:iso3166/CA lvont:usedIn lexvo:script/latin lvont:usesScript movie:relatedBook scam:hasOffer foaf:based near movie:actor movie:director movie:actor movie:actor movie:actor movie:director movie:director Subgraph M atching Advantages Maintains the graph structure Full set of queries can be handled 26ICDCS’17/2017-06-07
  • 52. Graph-based Approach [Zou and ¨Ozsu, 2017] Answering SPARQL query ≡ subgraph matching using homomorphism gStore [Zou et al., 2011, 2014], chameleon-db [Alu¸c et al., 2013] ?m ?d movie:director ?name rdfs:label ?b movie:relatedBook “Stanley Kubrick” movie:director name ?r rev:rating FILTER(?r > 4.0) mdb:film/2014 “1980-05-23” movie:initial release date “The Shining” refs:label mob:music contributor music contributor lexvo:iso639 3/eng language bm:books/0743424425 4.7 rev:rating bm:persons/StephenKing dc:creator bm:offers/0743424425amazonOffer geo:2635167 “United Kingdom” gn:name 62348447 gn:population wp:UnitedKingdom gn:wikipediaArticle mdb:actor/29704 “Jack Nicholson” movie:actor name mdb:film/3418 “The Passenger” refs:label mdb:film/1267 “The Last Tycoon” refs:label mdb:director/8476 “Stanley Kubrick” movie:director name mdb:film/2685 “A Clockwork Orange” refs:label mdb:film/424 “Spartacus” refs:label mdb:actor/30013 “Shelley Duvall” movie:actor name “English” rdf:label lexvo:iso3166/CA lvont:usedIn lexvo:script/latin lvont:usesScript movie:relatedBook scam:hasOffer foaf:based near movie:actor movie:director movie:actor movie:actor movie:actor movie:director movie:director Subgraph M atching Advantages Maintains the graph structure Full set of queries can be handled Disadvantages Graph pattern matching is expensive 26ICDCS’17/2017-06-07
  • 53. Two Systems gStoreSystem Architecture Offline Online Storage Input Input RDF Parser RDF Graph Builder Encoding Module VS*-tree builder RDF data RDF Triples RDF Graph Signature Graph Key-Value Store VS*-tree Store SPARQL Parser SPARQL Query Encoding Module VS*-tree Query Graph Filter Module Join Module Signature Graph Node Candidate Results Fig. 4. System Architecture bitstrings, denoted as vS ig(u). We encode query Q with the same encoding method. Consequently, the match between Q and G can be verified by simply checking the match between corresponding encoded bitstrings. Given a vertex u, we encode each of its adjacent edges e(eLabel, nLabel) into a bitstring, where eLabel is the edge chameleon-db Structural Index ... Vertex Index Spill Index ClusterIndexStorageSystem StorageAdvisor Query Engine Plan Generation Evaluation 27ICDCS’17/2017-06-07
  • 54. Two Systems gStoreSystem Architecture Offline Online Storage Input Input RDF Parser RDF Graph Builder Encoding Module VS*-tree builder RDF data RDF Triples RDF Graph Signature Graph Key-Value Store VS*-tree Store SPARQL Parser SPARQL Query Encoding Module VS*-tree Query Graph Filter Module Join Module Signature Graph Node Candidate Results Fig. 4. System Architecture bitstrings, denoted as vS ig(u). We encode query Q with the same encoding method. Consequently, the match between Q and G can be verified by simply checking the match between corresponding encoded bitstrings. Given a vertex u, we encode each of its adjacent edges e(eLabel, nLabel) into a bitstring, where eLabel is the edge 12,000 lines of C++ code under Linux (plus code for SPARQL parser) Encode each vertex of RDF graph as a bit array capturing the neighbourhood relationship (G∗ ) Build a multilevel summary tree index (VS∗ -tree) to capture “connections” 1111 1111 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 0000 1000 0000 0100 0000 0010 0010 1000 0100 0001 1000 0001 0000 1001 0100 1000 1001 1000 0001 0100 0001 0001 005 004 006 001 002 003 007 011 008 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 G2 G1 11101 10010 10001 01100 10000 00001 01100 00010 10000 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 01000 01000 27ICDCS’17/2017-06-07
  • 55. Two Systems gStoreSystem Architecture Offline Online Storage Input Input RDF Parser RDF Graph Builder Encoding Module VS*-tree builder RDF data RDF Triples RDF Graph Signature Graph Key-Value Store VS*-tree Store SPARQL Parser SPARQL Query Encoding Module VS*-tree Query Graph Filter Module Join Module Signature Graph Node Candidate Results Fig. 4. System Architecture bitstrings, denoted as vS ig(u). We encode query Q with the same encoding method. Consequently, the match between Q and G can be verified by simply checking the match between corresponding encoded bitstrings. Given a vertex u, we encode each of its adjacent edges e(eLabel, nLabel) into a bitstring, where eLabel is the edge Encode the query graph similarly (Q∗ ) Find candidate matching nodes of Q∗ in G∗ using VS*-tree Multiway join of the candidates 12,000 lines of C++ code under Linux (plus code for SPARQL parser) Encode each vertex of RDF graph as a bit array capturing the neighbourhood relationship (G∗ ) Build a multilevel summary tree index (VS∗ -tree) to capture “connections” 1111 1111 0110 1111 1101 1101 0000 1110 0110 1001 1100 1001 1001 1101 0000 1000 0000 0100 0000 0010 0010 1000 0100 0001 1000 0001 0000 1001 0100 1000 1001 1000 0001 0100 0001 0001 005 004 006 001 002 003 007 011 008 009 010 d1 1 d2 1 d2 2 d3 1 d3 2 d3 3 d3 4 G3 G2 G1 11101 10010 10001 01100 10000 00001 01100 00010 10000 01000 01000 10000 10000 10000 10000 00010 00100 01000 01000 01000 01000 27ICDCS’17/2017-06-07
  • 56. Two Systems 35,000 lines of C++ code under Linux (plus code for SPARQL 1.0 parser) Adaptivity to workload due to variability of Web workloads and the variability of composition of SPARQL triple patterns An experiment [Alu¸c et al., 2014a] No single system is a sole winner across all queries No single system is the sole loser across all queries, either 2–5 orders of magnitude difference in the performance between the best and the worst system for a given query The winner in one query may timeout in another Performance difference widens as dataset size increases Group-by-query approach [Alu¸c et al., 2014b] chameleon-db Structural Index ... Vertex Index Spill Index ClusterIndexStorageSystem StorageAdvisor Query Engine Plan Generation Evaluation 27ICDCS’17/2017-06-07
  • 57. Remember the Environment Distributed environment Some (not all) of the data sites can process SPARQL queries – SPARQL endpoints See next section 28ICDCS’17/2017-06-07
  • 58. Remember the Environment Distributed environment Some (not all) of the data sites can process SPARQL queries – SPARQL endpoints See next section Alternatives Cloud-based approaches Data re-distribution + query decomposition Data re-distribution + partial evaluation 28ICDCS’17/2017-06-07
  • 59. Cloud-based Solutions [Kaoudi and Manolescu, 2015] RDF data warehouse D is partitioned ({D1, . . . , Dn}) and placed on cloud platforms (such as HDFS, HBase) 29ICDCS’17/2017-06-07
  • 60. Cloud-based Solutions [Kaoudi and Manolescu, 2015] RDF data warehouse D is partitioned ({D1, . . . , Dn}) and placed on cloud platforms (such as HDFS, HBase) SPARQL query is run through MapReduce jobs Data parallel execution 29ICDCS’17/2017-06-07
  • 61. Cloud-based Solutions [Kaoudi and Manolescu, 2015] RDF data warehouse D is partitioned ({D1, . . . , Dn}) and placed on cloud platforms (such as HDFS, HBase) SPARQL query is run through MapReduce jobs Data parallel execution Examples: HARD [Rohloff and Schantz, 2010] , HadoopRDF [Husain et al., 2011] , EAGRE [Zhang et al., 2013] and JenaHBase [Khadilkar et al., 2012] 29ICDCS’17/2017-06-07
  • 62. Cloud-based Solutions [Kaoudi and Manolescu, 2015] RDF data warehouse D is partitioned ({D1, . . . , Dn}) and placed on cloud platforms (such as HDFS, HBase) SPARQL query is run through MapReduce jobs Data parallel execution Examples: HARD [Rohloff and Schantz, 2010] , HadoopRDF [Husain et al., 2011] , EAGRE [Zhang et al., 2013] and JenaHBase [Khadilkar et al., 2012] High scalability and fault-tolerance Possibly low performance since MapReduce is not suitable for graph processing 29ICDCS’17/2017-06-07
  • 63. Partition-based Approaches (Offline) Partition an RDF data warehouse (graph) into several fragments that are distributed to sites RDF data D = {D1, . . . , Dn} Allocate each Di to a site 30ICDCS’17/2017-06-07
  • 64. Partition-based Approaches (Offline) Partition an RDF data warehouse (graph) into several fragments that are distributed to sites RDF data D = {D1, . . . , Dn} Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) 30ICDCS’17/2017-06-07
  • 65. Partition-based Approaches (Offline) Partition an RDF data warehouse (graph) into several fragments that are distributed to sites RDF data D = {D1, . . . , Dn} Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) (Online) SPARQL query decomposed Q = {Q1, . . . , Qk} ⇒ query graph is decomposed Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} 30ICDCS’17/2017-06-07
  • 66. Partition-based Approaches (Offline) Partition an RDF data warehouse (graph) into several fragments that are distributed to sites RDF data D = {D1, . . . , Dn} Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) (Online) SPARQL query decomposed Q = {Q1, . . . , Qk} ⇒ query graph is decomposed Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} Examples: GraphPartition [Huang et al., 2011], WARP [Hose and Schenkel, 2013] , Partout [Galarraga et al., 2014] , Vertex-block [Lee and Liu, 2013] 30ICDCS’17/2017-06-07
  • 67. Partition-based Approaches (Offline) Partition an RDF data warehouse (graph) into several fragments that are distributed to sites RDF data D = {D1, . . . , Dn} Allocate each Di to a site Partitioning alternatives Table-based (e.g., [Husain et al., 2011]) Graph-based (e.g., [Huang et al., 2011; Zhang et al., 2013]) Unit-based (e.g., [Gurajada et al., 2014; Lee and Liu, 2013]) (Online) SPARQL query decomposed Q = {Q1, . . . , Qk} ⇒ query graph is decomposed Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} Examples: GraphPartition [Huang et al., 2011], WARP [Hose and Schenkel, 2013] , Partout [Galarraga et al., 2014] , Vertex-block [Lee and Liu, 2013] High performance Great for parallelizing centralized RDF data May not be possible to re-partition and re-allocate Web data (i.e., LOD) Each approach requires a specific partitioning strategy – no generic partitioning Query decomposition may not be easy 30ICDCS’17/2017-06-07
  • 68. Partial Query Evaluation (PQE) RDF data warehouse is partitioned and distributed as before RDF data D = {D1, . . . , Dn} Allocate each Di to a site SPARQL query is not decomposed Partial query evaluation – Distributed gStore [Peng et al., 2016] 31ICDCS’17/2017-06-07
  • 69. Partial Query Evaluation (PQE) RDF data warehouse is partitioned and distributed as before RDF data D = {D1, . . . , Dn} Allocate each Di to a site SPARQL query is not decomposed Partial query evaluation – Distributed gStore [Peng et al., 2016] f (x) ⇒ f (s, d) ⇒ f (f (s), d)) ⇒ Final Answerf (s, d) known inputs unknown inputs 31ICDCS’17/2017-06-07
  • 70. Partial Query Evaluation (PQE) RDF data warehouse is partitioned and distributed as before RDF data D = {D1, . . . , Dn} Allocate each Di to a site SPARQL query is not decomposed Partial query evaluation – Distributed gStore [Peng et al., 2016] f (x) ⇒ f (s, d) ⇒ f (f (s), d)) ⇒ Final Answerf (s, d) known inputs unknown inputs f (f (s), d)) partial results 31ICDCS’17/2017-06-07
  • 71. Partial Query Evaluation (PQE) RDF data warehouse is partitioned and distributed as before RDF data D = {D1, . . . , Dn} Allocate each Di to a site SPARQL query is not decomposed Partial query evaluation – Distributed gStore [Peng et al., 2016] f (x) ⇒ f (s, d) ⇒ f (f (s), d)) ⇒ Final Answerf (s, d) known inputs unknown inputs f (f (s), d)) partial results Query is the function and each Di is the known input 31ICDCS’17/2017-06-07
  • 72. Distributed SPARQL Using PQE [Peng et al., 2016] Two steps: 1. Evaluate a query at each site to find local matches These are local partial matches D1 D2 D3 D4 32ICDCS’17/2017-06-07
  • 73. Distributed SPARQL Using PQE [Peng et al., 2016] Two steps: 1. Evaluate a query at each site to find local matches These are local partial matches 2. Assemble the partial matches to get final result Crossing match Centralized assembly Distributed assembly D1 D2 D3 D4 Crossing match 32ICDCS’17/2017-06-07
  • 74. Distributed SPARQL Using PQE [Peng et al., 2016] Two steps: 1. Evaluate a query at each site to find local matches These are local partial matches 2. Assemble the partial matches to get final result Crossing match Centralized assembly Distributed assembly D1 D2 D3 D4 Crossing match High performance due to parallelization Do not have to deal with query decomposition May not be possible to re-partition and re-allocate Web data (i.e., LOD) RDF storage sites need to be modified to handle partial query processing 32ICDCS’17/2017-06-07
  • 75. Outline 1 RDF Technology [¨Ozsu, 2016] Data Warehousing Approach Distributed RDF Processing 2 Federated RDF Systems SPARQL Endpoint Federation General RDF Federation 3 LOD – Live Querying Approach [Hartig, 2013a] Traversal-based approaches Index-based approaches Hybrid approaches 4 Conclusions 33ICDCS’17/2017-06-07
  • 76. Remember the Environment Distributed environment SPARQL endpoints can process SPARQL queries Non-SPARQL endpoints require additional components 34ICDCS’17/2017-06-07
  • 77. Remember the Environment Distributed environment SPARQL endpoints can process SPARQL queries Non-SPARQL endpoints require additional components Issues Query decomposition Localization (source selection) Result composition 34ICDCS’17/2017-06-07
  • 78. SPARQL Endpoint Federation No data re-partitioning/re-distribution Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint SPARQL query decomposed Q = {Q1, . . . , Qk} Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes RDF Sources 35ICDCS’17/2017-06-07
  • 79. SPARQL Endpoint Federation No data re-partitioning/re-distribution Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint SPARQL query decomposed Q = {Q1, . . . , Qk} Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes RDF Sources Control Site Metadata 35ICDCS’17/2017-06-07
  • 80. SPARQL Endpoint Federation No data re-partitioning/re-distribution Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint SPARQL query decomposed Q = {Q1, . . . , Qk} Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes RDF Sources Control Site Metadata Data included at the source Supported access patterns Statistical information · · · 35ICDCS’17/2017-06-07
  • 81. SPARQL Endpoint Federation No data re-partitioning/re-distribution Consider D = D1 ∪ D2 ∪ . . . ∪ Dn; Di : SPARQL endpoint SPARQL query decomposed Q = {Q1, . . . , Qk} Distributed execution of {Q1, . . . , Qk} over {D1, . . . , Dn} E.g.: SPLENDID [G¨orlitz and Staab, 2011], ANAPSID [Acosta et al., 2011] Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes RDF Sources Control Site Metadata Data integration approach May be the only way to proceed if RDF data is already distributed with autonomous owners Not all RDF data storage points are SPARQL endpoints 35ICDCS’17/2017-06-07
  • 82. Not All RDF Storage Sites are SPARQL Endpoints Use the mediator-wrapper paradigm Wrappers provide SPARQL endpoint functionality Mediators may be introduced if wrappers are thin E.g.: DARQ [Quilitz and Leser, 2008], FedX [Schwarte et al., 2011b] Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes RDF Storage A Wrapper Wrapper Mediator RDF Storage B RDF Sources Control Site Metadata 36ICDCS’17/2017-06-07
  • 83. Federated Query Processing Query Decomposition & Source Selection SPARQL queries Local Evaluation Join Partial Matches SPARQL matches Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes RDF Sources Control Site 37ICDCS’17/2017-06-07
  • 84. Query Decomposition Each triple pattern has to a set of RDF sources based on the values of its subject, property, and object. SELECT ?x ?n WHERE { ?x g : parentFeature ?k . ?k g : name ”Canada” . ?y sameAs ?x . ?y n : topicPage ?n . } 38ICDCS’17/2017-06-07
  • 85. Query Decomposition Each triple pattern has to a set of RDF sources based on the values of its subject, property, and object. SELECT ?x ?n WHERE { ?x g : parentFeature ?k . ?k g : name ”Canada” . ?y sameAs ?x . ?y n : topicPage ?n . } {GeoNames} {GeoNames} {DBPedia,GeoNames,NYTimes, SWDogFood,LinkedMDB} {NYTimes} 38ICDCS’17/2017-06-07
  • 86. Query Decomposition Each triple pattern has to a set of RDF sources based on the values of its subject, property, and object. SELECT ?x ?n WHERE { ?x g : parentFeature ?k . ?k g : name ”Canada” . ?y sameAs ?x . ?y n : topicPage ?n . } {GeoNames} {GeoNames} {DBPedia,GeoNames,NYTimes, SWDogFood,LinkedMDB} {NYTimes} q1 @{GeoNames} q2 @{. . .} q3 @{NYTimes} SELECT ?x WHERE { ?x g : parentFeature ?k . ?k g : name ”Canada” . } SELECT ?y ?x WHERE { ?y sameAs ?x . } SELECT ?y ?n WHERE { ?y n : topicPage ?n . } 38ICDCS’17/2017-06-07
  • 87. Data Localization Metadata-based approaches Use the information in the metadata repository to determine which sources are relevant DARQ [Quilitz and Leser, 2008] QTree [Harth et al., 2010; Prasser et al., 2012] HiBISCus [Saleem and Ngomo, 2014] . . . ASK query-based approach Asking whether or not a triple pattern has an answer at a source FedX [Schwarte et al., 2011a,b] 39ICDCS’17/2017-06-07
  • 88. Query Processing over Federated RDF Systems Jamendo SWDogFood GeoNames LinkedMDB DBPedia NYTimes Metadata SELECT ?x ?n WHERE { ?x g : parentFeature ?k . ?k g : name ”Canada” . ?y sameAs ?x . ?y n : topicPage ?n . } SELECT ?x WHERE { ?x g : parentFeature ?k . ?k g : name ”Canada” . } SELECT ?y ?x WHERE { ?y sameAs ?x . } SELECT ?y ?x WHERE { ?y sameAs ?x . } SELECT ?y ?x WHERE { ?y sameAs ?x . } SELECT ?y ?x WHERE { ?y sameAs ?x . } SELECT ?y ?x WHERE { ?y sameAs ?x . } SELECT ?y ?n WHERE { ?y n : topicPage ?n . } 40ICDCS’17/2017-06-07
  • 89. UniProt Federation – EBI RDF Platform Curated computational models of biological pro- cesses Sample information for reference samples and sam- ples for which data exist in one of the EBI’s assay databases Curated chemical database of bioactive molecules with drug-like properties Genome databases for vertebrates and other eukary- otic species Gene expression data from the Gene Expression Atlas Curated and peer-reviewed pathways 41ICDCS’17/2017-06-07
  • 90. Federated Access to UniProt Collection Get the Reactome pathways where Q16850 is associated, then get all the other proteins in that pathway and pull out their expression from the atlas, along with the GO annotations from UniProt PREFIX r d f : <http ://www. w3 . org /1999/02/22− rdf −syntax−ns#> PREFIX r d f s : <http ://www. w3 . org /2000/01/ rdf −schema#> PREFIX biopax3 : <http ://www. biopax . org / r e l e a s e / biopax−l e v e l 3 . owl#> PREFIX a t l a s t e r m s : <http :// r d f . ebi . ac . uk/ terms / a t l a s /> PREFIX upc:< http :// p u r l . u n i p r o t . org / core/> SELECT DISTINCT ?pathwayname ? e x p r e s s i o n V a l u e ? g o l a b e l WHERE { # Get the pathways that r e f e r e n c e Q16850 ? pathway r d f : type biopax3 : Pathway . ? pathway biopax3 : displayName ?pathwayname . ? pathway biopax3 : pathwayComponent [? r e l [ biopax3 : e n t i t y R e f e r e n c e ? dbXref ] ] . ? pathway biopax3 : pathwayComponent [? r e l [ biopax3 : e n t i t y R e f e r e n c e <http :// p u r l . u n i p r o t . org / u n i p r o t /Q16850 >]] . # Get the e x p r e s s i o n f o r those p r o t e i n s SERVICE <http ://www. e bi . ac . uk/ r d f / s e r v i c e s / a t l a s / sparql > { ? value r d f s : l a b e l ? e x p r e s s i o n V a l u e . ? value a t l a s t e r m s : pValue ? pvalue . ? value a t l a s t e r m s : isMeasurementOf ? probe . ? probe a t l a s t e r m s : dbXref ? dbXref . } # get the GO f u n c t i o n s from Uniprot SERVICE <http :// u n i p r o t . org / sparql > { ? dbXref a upc : Protein ; upc : c l a s s i f i e d W i t h ? keyword . ? keyword r d f s : seeAlso ? goid . ? goid r d f s : l a b e l ? g o l a b e l . } } 42ICDCS’17/2017-06-07
  • 91. Outline 1 RDF Technology [¨Ozsu, 2016] Data Warehousing Approach Distributed RDF Processing 2 Federated RDF Systems SPARQL Endpoint Federation General RDF Federation 3 LOD – Live Querying Approach [Hartig, 2013a] Traversal-based approaches Index-based approaches Hybrid approaches 4 Conclusions 43ICDCS’17/2017-06-07
  • 92. Live Query Processing Not all data resides at SPARQL endpoints Freshness of access to data important Potentially countably infinite data sources Live querying On-line execution Only rely on linked data principles Alternatives Traversal-based approaches Index-based approaches Hybrid approaches 44ICDCS’17/2017-06-07
  • 93. Linked Data Model [Hartig, 2012] Web of Linked Data Given a finite or countably infinite set D of Linked Documents, a Web of Linked Data is a tuple W = (D, adoc, data) where: D ⊆ D, adoc is a partial mapping from URIs to D, and data is a total mapping from D to finite sets of RDF triples. 45ICDCS’17/2017-06-07
  • 94. Linked Data Model [Hartig, 2012] Web of Linked Data Given a finite or countably infinite set D of Linked Documents, a Web of Linked Data is a tuple W = (D, adoc, data) where: D ⊆ D, adoc is a partial mapping from URIs to D, and data is a total mapping from D to finite sets of RDF triples. Data Links A Web of Linked Data W = (D, adoc, data) contains a data link from document d ∈ D to document d ∈ D if there exists a URI u such that: u is mentioned in an RDF triple t ∈ data(d), and d = adoc(u). 45ICDCS’17/2017-06-07
  • 95. SPARQL Query Semantics in Live Querying Full-web semantics Scope of evaluating a SPARQL expression is all Linked Data Query result completeness cannot be guaranteed by any (terminating) execution 46ICDCS’17/2017-06-07
  • 96. SPARQL Query Semantics in Live Querying Full-web semantics Scope of evaluating a SPARQL expression is all Linked Data Query result completeness cannot be guaranteed by any (terminating) execution Reachability-based query semantics Query consists of a SPARQL expression, a set of seed URIs S, and a reachability condition c Scope: all data along paths of data links that satisfy the condition Computationally feasible 46ICDCS’17/2017-06-07
  • 97. Traversal Approaches Discover relevant URIs recursively by traversing (specific) data links at query execution runtime [Hartig, 2013b; Ladwig and Tran, 2011] Implements reachability-based query semantics Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result 47ICDCS’17/2017-06-07
  • 98. Traversal Approaches Discover relevant URIs recursively by traversing (specific) data links at query execution runtime [Hartig, 2013b; Ladwig and Tran, 2011] Implements reachability-based query semantics Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result Advantages Easy to implement. No data structure to maintain. 47ICDCS’17/2017-06-07
  • 99. Traversal Approaches Discover relevant URIs recursively by traversing (specific) data links at query execution runtime [Hartig, 2013b; Ladwig and Tran, 2011] Implements reachability-based query semantics Start from a set of seed URIs Recursively follow and discover new URIs Important issue is selection of seed URIs Retrieved data serves to discover new URIs and to construct result Advantages Easy to implement. No data structure to maintain. Disadvantages Possibilities for parallelized data retrieval are limited Repeated data retrieval introduces significant query latency. 47ICDCS’17/2017-06-07
  • 100. Index Approaches Use pre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Different index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Key: tp Entry: {uri1, uri2, , urin} GET urii 48ICDCS’17/2017-06-07
  • 101. Index Approaches Use pre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Different index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Key: tp Entry: {uri1, uri2, , urin} GET urii Advantages Data retrieval can be fully parallelized Reduces the impact of data retrieval on query execution time 48ICDCS’17/2017-06-07
  • 102. Index Approaches Use pre-populated index to determine relevant URIs (and to avoid as many irrelevant ones as possible) Different index keys possible; e.g., triple patterns [Umbrich et al., 2011] Index entries a set of URIs Indexed URIs may appear multiple times (i.e., associated with multiple index keys) Each URI in such an entry may be paired with a cardinality (utilized for source ranking) Key: tp Entry: {uri1, uri2, , urin} GET urii Advantages Data retrieval can be fully parallelized Reduces the impact of data retrieval on query execution time Disadvantages Querying can only start after index construction Depends on what has been selected for the index Freshness may be an issue Index maintenance 48ICDCS’17/2017-06-07
  • 103. Hybrid Approach Perform a traversal-based execution using a prioritized list of URIs to look up [Ladwig and Tran, 2010] Initial seed from the pre-populated index Non-seed URIs are ranked by a function based on information in the index New discovered URIs that are not in the index are ranked according to number of referring documents 49ICDCS’17/2017-06-07
  • 104. Outline 1 RDF Technology [¨Ozsu, 2016] Data Warehousing Approach Distributed RDF Processing 2 Federated RDF Systems SPARQL Endpoint Federation General RDF Federation 3 LOD – Live Querying Approach [Hartig, 2013a] Traversal-based approaches Index-based approaches Hybrid approaches 4 Conclusions 50ICDCS’17/2017-06-07
  • 105. Conclusions RDF and Linked Object Data seem to have considerable promise for Web data management There are prototype systems that provide alternative solutions There are commercial systems as well See https://www.w3.org/wiki/SparqlImplementations for a list More work needs to be done Query semantics Adaptive system design Optimizations – both in data warehousing and distributed environments Live querying requires significant thought to reduce latency 51ICDCS’17/2017-06-07
  • 106. Conclusions What I did not talk about: Not much on general distributed/parallel processing Not much on SPARQL semantics Nothing about RDFS – no schema stuff Nothing about entailment regimes > 0 ⇒ no reasoning 52ICDCS’17/2017-06-07
  • 107. Thank you! Research supported by 53ICDCS’17/2017-06-07
  • 108. References I Abadi, D. J., Marcus, A., Madden, S., and Hollenbach, K. (2009). SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J., 18(2):385–406. Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2007). Scalable semantic web data management using vertical partitioning. In Proc. 33rd Int. Conf. on Very Large Data Bases, pages 411–422. Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., and Ruckhaus, E. (2011). ANAPSID: an adaptive query processing engine for SPARQL endpoints. In Proc. 10th Int. Semantic Web Conf., pages 18–34. Alu¸c, G., Hartig, O., ¨Ozsu, M. T., and Daudjee, K. (2014a). Diversified stress testing of RDF data management systems. In Proc. 13th Int. Semantic Web Conf., pages 197–212. Alu¸c, G., ¨Ozsu, M. T., and Daudjee, K. (2014b). Workload matters: Why RDF databases need a new design. Proc. VLDB Endowment, 7(10):837–840. 54ICDCS’17/2017-06-07
  • 109. References II Alu¸c, G., ¨Ozsu, M. T., Daudjee, K., and Hartig, O. (2013). chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10, University of Waterloo. Available at https://cs.uwaterloo.ca/sites/ca.computer-science/files/ uploads/files/CS-2013-10.pdf. Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., and Bhattacharjee, B. (2013). Building an efficient RDF store over a relational database. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 121–132. Galarraga, L., Hose, K., and Schenkel, R. (2014). Partout: a distributed engine for efficient RDF processing. In Proc. 23rd Int. World Wide Web Conf. (Companion Volume), pages 267–268. G¨orlitz, O. and Staab, S. (2011). SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proc. 2nd Int. Workshop on Consuming Linked Data. 55ICDCS’17/2017-06-07
  • 110. References III Gurajada, S., Seufert, S., Miliaraki, I., and Theobald, M. (2014). TriAD: A distributed shared-nothing RDF engine based on asynchronous message passing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 289–300. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., and Umbrich, J. (2010). Data summaries for on-demand queries over linked data. In Proc. 19th Int. World Wide Web Conf., pages 411–420. Available from: http://doi.acm.org/10.1145/1772690.1772733. Hartig, O. (2012). SPARQL for a web of linked data: Semantics and computability. In Proc. 9th Extended Semantic Web Conf., pages 8–23. Hartig, O. (2013a). An overview on execution strategies for linked data queries. Datenbank-Spektrum, 13(2):89–99. Available from: http://dx.doi.org/10.1007/s13222-013-0122-1. Hartig, O. (2013b). SQUIN: a traversal based query execution system for the web of linked data. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1081–1084. 56ICDCS’17/2017-06-07
  • 111. References IV Hose, K. and Schenkel, R. (2013). WARP: Workload-aware replication and partitioning for RDF. In Proc. Workshops of 29th Int. Conf. on Data Engineering, pages 1–6. Huang, J., Abadi, D. J., and Ren, K. (2011). Scalable SPARQL querying of large RDF graphs. Proc. VLDB Endowment, 4(11):1123–1134. Husain, M. F., McGlothlin, J., Masud, M. M., Khan, L. R., and Thuraisingham, B. (2011). Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans. Knowl. and Data Eng., 23(9):1312–1327. Kaoudi, Z. and Manolescu, I. (2015). RDF in the clouds: A survey. VLDB J., 24:67–91. Khadilkar, V., Kantarcioglu, M., Thuraisingham, B. M., and Castagna, P. (2012). Jena-HBase: A distributed, scalable and efficient RDF triple store. In Proc. International Semantic Web Conference Posters & Demos Track. Ladwig, G. and Tran, T. (2010). Linked data query processing strategies. In Proc. 9th Int. Semantic Web Conf., pages 453–469. Ladwig, G. and Tran, T. (2011). SIHJoin: Querying remote and local linked data. In Proc. 8th Extended Semantic Web Conf., pages 139–153. 57ICDCS’17/2017-06-07
  • 112. References V Lee, K. and Liu, L. (2013). Scaling queries over big RDF graphs with semantic hash partitioning. Proc. VLDB Endowment, 6(14):1894–1905. Available from: http://www.vldb.org/pvldb/vol6/p1894-lee.pdf. Neumann, T. and Weikum, G. (2008). RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endowment, 1(1):647–659. Neumann, T. and Weikum, G. (2009). The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1):91–113. ¨Ozsu, M. T. (2016). A survey of RDF data management systems. Front. Comput. Sci., 10(3):418–432. Peng, P., Zou, L., ¨Ozsu, M. T., Chen, L., and Zhao, D. (2016). Processing SPARQL queries over distributed RDF graphs. VLDB J., 25(2):243–268. Prasser, F., Kemper, A., and Kuhn, K. A. (2012). Efficient distributed query processing for autonomous rdf databases. In Proc. 15th Int. Conf. on Extending Database Technology, pages 372–383. Quilitz, B. and Leser, U. (2008). Querying distributed RDF data sources with SPARQL. In Proc. 5th European Semantic Web Conf., pages 524–538. 58ICDCS’17/2017-06-07
  • 113. References VI Rohloff, K. and Schantz, R. E. (2010). High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. In Proc. Int. Workshop on Programming Support Innovations for Emerging Distributed Applications. Article No. 4. Saleem, M. and Ngomo, A. N. (2014). HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In Proc. 11th Extended Semantic Web Conf., pages 176–191. Available from: http://dx.doi.org/10.1007/978-3-319-07443-6_13. Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011a). FedX: A federation layer for distributed query processing on linked open data. In Proc. 8th Extended Semantic Web Conf., pages 481–486. Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011b). Fedx: Optimization techniques for federated query processing on linked data. In Proc. 10th Int. Semantic Web Conf., pages 601–616. Available from: https://doi.org/10.1007/978-3-642-25073-6_38. Umbrich, J., Hose, K., Karnstedt, M., Harth, A., and Polleres, A. (2011). Comparing data summaries for processing live queries over linked data. World Wide Web J., 14(5-6):495–544. 59ICDCS’17/2017-06-07
  • 114. References VII Weiss, C., Karras, P., and Bernstein, A. (2008). Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment, 1(1):1008–1019. Wilkinson, K. (2006). Jena property table implementation. Technical Report HPL-2006-140, HP Laboratories Palo Alto. Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., and Liu, L. (2013). TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endowment, 6(7):517–528. Available from: http://www.vldb.org/pvldb/vol6/p517-yuan.pdf. Zhang, X., Chen, L., Tong, Y., and Wang, M. (2013). EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In Proc. 29th Int. Conf. on Data Engineering, pages 565–576. Zou, L., Mo, J., Chen, L., ¨Ozsu, M. T., and Zhao, D. (2011). gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment, 4(8):482–493. Zou, L. and ¨Ozsu, M. T. (2017). Graph-based RDF data management. Data Science and Engineering, 2(1):56–70. Available from: https://dx.doi.org/10.1007/s41019-016-0029-6. 60ICDCS’17/2017-06-07
  • 115. References VIII Zou, L., ¨Ozsu, M. T., Chen, L., Shen, X., Huang, R., and Zhao, D. (2014). gStore: A graph-based SPARQL query engine. VLDB J., 23(4):565–590. 61ICDCS’17/2017-06-07