SlideShare uma empresa Scribd logo
1 de 36
Mustafa Jarrar
Lecture Notes, Web Data Management (MCOM7348)
University of Birzeit, Palestine
1st Semester, 2013

RDF Stores
Challenges and Solutions
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Jarrar © 2013

1
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html

Jarrar © 2013

2
Lecture Outline
Part I: Querying RDF(S P O) tables using SQL
Part 2: Practical Session (RDF graphs )
Part 3: SQL-based RDF Stores

Keywords: RDF, RDF Stores, querying SPO Table, Graph Databases, Querying Graph, self joins,
RDF3X, Oracle Semantic Technology, C-Store, vertical patronizing, MashQL, Graph
Signature, Semantic Web, Data Web,

Jarrar © 2013

3
Part 1
Querying RDF(S P O) tables using SQL

Jarrar © 2013

4
RDF as a Data Model
RDF (Resource Description Framework) is the Data Model
behind the Semantic/Data Web.
RDF is a graph-based data model.
In RDF, Data is represented as triples: <Subject, Predicate,
Object>, or in short: <S,P,O>.

S

P
Jarrar © 2013

O
5
RDF as a Data Model
RDF’s way of representing data as triples is more
elementary than Databases or XML.
–  This enables easy data integration and interoperability
between systems (as will be demonstrated later).
•  EXAMPLE: the fact that the movie Sicko (2007) is directed
by Michael Moore from the USA can be represented by the
following triples which form a directed labeled graph:
<
<
<
<
<
<

M1,
M1,
M1,
D1,
D1,
C1,

name, Sicko >
year, 2007 >
directedBy, D1 >
name, Michael Moore >
country, C1 >
name, USA>

2007
Sicko

name

M1

directe
dBy

Jarrar © 2013

D1

name

Michael
Moore
6
RDF graphs as an SPO table
Typically, an RDF Graph is stored
as one <S,P,O> Table in a
Relational Database Management
System.
2007
Sicko
M1

directedBy

Michael Moore
Emmy Awards

D1
P1

2009

M2

1995

Capitalism
1995
M3

directedBy
actedIn

Washington DC

D2
P2
Brave Heart

M4

2007
directedBy
actedIn

USA

C1

Oscars

Mel Gibson

1996

Nadine Labaki
D3
Caramel

Lebanon

C2
location
C3

P3

Beirut
name

Sweden

Stockholm

Stockholm Festival

2007

Jarrar © 2013

S
M1
M1
M1
M2
M2
M2
M3
M3
M3
M4
M4
M4
D1
D1
D1
D2
D2
D2
D2
D3
D3
D3
D3
…

P
year
Name
directedBy
directedBy
Year
Name
Year
directedBy
Name
Year
directedBy
Name
Name
hasWonPrizeIn
Country
Counrty
hasWonPrizeIn
Name
actedIn
Name
Country
hasWonPrizeIn
actedIn
…

O
2007
Sicko
D1
D1
2009
Capitalism
1995
D2
Brave Heart
2007
D3
Caramel
Michael Moore
P1
C1
C1
P2
Mel Gibson
M3
Nadine Labaki
C2
P3
M4
…

7
RDF graphs as an SPO table
How do we query graph-shaped data
stored in one relational table?
How do we traverse a graph stored
in one relational table?
à Self Joins!

Jarrar © 2013

S
M1
M1
M1
M2
M2
M2
M3
M3
M3
M4
M4
M4
D1
D1
D1
D2
D2
D2
D2
D3
D3
D3
D3
…

P
year
Name
directedBy
directedBy
Year
Name
Year
directedBy
Name
Year
directedBy
Name
Name
hasWonPrizeIn
Country
Counrty
hasWonPrizeIn
Name
actedIn
Name
Country
hasWonPrizeIn
actedIn
…

O
2007
Sicko
D1
D1
2009
Capitalism
1995
D2
Brave Heart
2007
D3
Caramel
Michael Moore
P1
C1
C1
P2
Mel Gibson
M3
Nadine Labaki
C2
P3
M4
…

8
How to Query RDF data stored in SPO table?
S
M1
M1
M1
M2
M2
M2
M3
M3
M3
M4
M4
M4
D1
D1
…

P
year
Name
directedBy
directedBy
Year
Name
Year
directedBy
Name
Year
directedBy
Name
Name
hasWonPrizeIn
…

O
2007
Sicko
D1
D1
2009
Capitalism
1995
D2
Brave Heart
2007
D3
Caramel
Michael Moore
P1
…

2007
Sicko
M1

directedBy

Michael Moore
Emmy Awards

D1
P1

2009

M2

1995

Capitalism
1995
M3

directedBy
actedIn

Washington DC

D2
P2
Brave Heart

M4

2007
directedBy
actedIn

USA

C1

Oscars

Mel Gibson

1996

Nadine Labaki
D3
Caramel

Exercises:

Lebanon

C2
location
C3

P3

Beirut
name

Sweden

Stockholm Festival

2007

Stockholm
(1) What is the name of director D3?
(2) What is the name of the director of the movie M1?
(3) List all the movies who have directors from the USA and their directors.
(4) List all the names of the directors from Lebanon who have won prizes
and the prizes they have won.

Jarrar © 2013

9
How to Query RDF data stored in SPO table?

Exercise 1: A simple query.
–  What is the name of director D3?
–  Consider the table MoviesTable(s,p,o).
–  SQL:
Select o
From MoviesTable T
Where T.s = ‘D3’ AND T.p = ‘Name’;

Answer: Nadine Labaki
Jarrar © 2013

S
M1
M1
M1
M2
M2
M2
M3
M3
M3
M4
M4
M4
D1
D1
D1
D2
D2
D2
D2
D3
D3
D3
D3
…

P
year
Name
directedBy
directedBy
Year
Name
Year
directedBy
Name
Year
directedBy
Name
Name
hasWonPrizeIn
Country
Counrty
hasWonPrizeIn
Name
actedIn
Name
Country
hasWonPrizeIn
actedIn
…

O
2007
Sicko
D1
D1
2009
Capitalism
1995
D2
Brave Heart
2007
D3
Caramel
Michael Moore
P1
C1
C1
P2
Mel Gibson
M3
Nadine Labaki
C2
P3
M4
…
10
How to Query RDF data stored in SPO table?

Exercise 2: A path query with 1 join.
–  What is the name of the director of the
movie M1.
–  Consider the table MoviesTable (s,p,o).
–  SQL:
Select T2.o
From MoviesTable T1, MoviesTable T2
Where {T1.s = ‘M1’ AND
T1.p = ‘directedBy’ AND
T1.o = T2.s AND
T2.p = ‘name’};

Answer: Michael Moore

Jarrar © 2013

S
M1
M1
M1
M2
M2
M2
M3
M3
M3
M4
M4
M4
D1
D1
D1
D2
D2
D2
D2
D3
D3
D3
D3
…

P
year
Name
directedBy
directedBy
Year
Name
Year
directedBy
Name
Year
directedBy
Name
Name
hasWonPrizeIn
Country
Counrty
hasWonPrizeIn
Name
actedIn
Name
Country
hasWonPrizeIn
actedIn
…

O
2007
Sicko
D1
D1
2009
Capitalism
1995
D2
Brave Heart
2007
D3
Caramel
Michael Moore
P1
C1
C1
P2
Mel Gibson
M3
Nadine Labaki
C2
P3
M4
…
11
How to Query RDF data stored in SPO table?
Exercise 3: A path query with 2 joins.
–  List all the movies who have directors from
the USA and their directors.
–  Consider the table MoviesTable(s,p,o).
–  SQL:
Select T1.s, T1.o
From MoviesTable T1, MoviesTable T2,
MoviesTable T3
Where {T1.o = T2.s AND
T2.o = T3.s AND
T1.p = ‘directedBy’ AND
T2.p = ‘country’ AND
T3.p = ‘name’ AND
T3.o = ‘USA’};
Answer: M1 D1; M2 D1; M3 D2

Jarrar © 2013

S
M1
M1
M1
M2
M2
M2
M3
M3
M3
…
D1
D1
D1
D2
D2
D2
D2
…
C1
C1
C2
C2
…

P

O

year
Name
directedBy
directedBy
Year
Name
Year
directedBy
Name
…
Name
hasWonPrizeIn
Country
Counrty
hasWonPrizeIn
Name
actedIn
…
Name
Capital
Name
Capital
…

2007
Sicko
D1
D1
2009
Capitalism
1995
D2
Brave Heart
…
Michael Moore
P1
C1
C1
P2
Mel Gibson
M3
…
USA
Washington DC
Lebanon
Beirut
…
12
How to Query RDF data stored in SPO table?
Exercise 4: Star query.
–  List all the names of the directors from Lebanon who
have won prizes and the prizes they have won.
–  Consider the table MoviesTable (S,P,O). S
P
Select T1.o, T4.o
From MoviesTable T1, MoviesTable T2,
MoviesTable T3, MoviesTable T4
Where {T1.p = ‘name’ AND
T2.s = T1.s AND
T2.p = ‘country’ AND
T2.o = T3.s AND
T3.p = ‘name’ AND
T3.o = ‘Lebanon’ AND
T4.s = T1.s AND
T4.p = ‘hasWonPrizeIn’};
Answer: ‘Nadine Labaki’ , P3

Jarrar © 2013

D1
D1
D1
D2
D2
D2
D2
D3
D3
D3
D3
…
C1
C1
C2
C2
…

O
Name
Michael Moore
hasWonPrizeIn P1
Country
C1
Counrty
C1
hasWonPrizeIn P2
Name
Mel Gibson
actedIn
M3
Name
Nadine Labaki
Country
C2
hasWonPrizeIn P3
actedIn
M4
…
…
Name
USA
Capital
Washington DC
Name
Lebanon
Capital
Beirut
…
…
13
Part 2:
Querying SPO Tables using SQL

Jarrar © 2013

14
Practical Session
Given the RDF graph in the next slide, do the following:
(1) Insert the data graph as <S,P,O> triples in a 3-column table in any
DBMS.
Schema of the table: Books (Subject, Predicate, Object)
(2) Write the following queries in SQL and execute them over the
newly created table:
•  List all the authors born in a country which has the name
Palestine.
•  List the names of all authors with the name of their affiliation
who are born in a country whose capital’s population is 14M.
Note that the author must have an affiliation.
•  List the names of all books whose authors are born in Lebanon
along with the name of the author.

Jarrar © 2013

15
Palestine
Said

BK1

BK2

CN1

AU1
CU
Author

BK3

The	
  P rophet

BK4

Author

Name

Name

Jerusalem

Colombia	
  University
14 .0M

India

CN2
AU3

CA1

Viswanathan

AU2

Wamadat

Author

Capital

7.6K

Capital

CA2

Name

New	
  Delhi

Naima

Lebanon
Gibran

CN3

AU4

2.0M

CA3

Beirut

This data graph is about books. It talks about four books (BK1-BK4).
Information recorded about a book includes data such as; its author,
affiliation, country of birth including its capital and the population of its
capital.
Jarrar © 2013

16
Practical Session
• Each student should work alone.
• The student is encouraged to first answer each query mentally from the
graph before writing and executing the SQL query, and to compare the
results of the query execution with his/her expected results.
• The student is strongly recommended to write two additional queries,
execute them on the data graph, and hand them along with the required
queries.
• The student must highlight problems and challenges of executing queries
on graph-shaped data and propose solutions to the problem.
• Each student must expect to present his/her queries at class and, more
importantly, he/she must be ready to discuss the problems of querying
graph-shaped data and his/her proposed solutions.
• The final delivery should include (i) a snapshot of the built table, (ii) the
SQL queries, (iii) The results of the queries, and (iv) a one-page discussion
of the problems of querying graph-shaped data and the proposed possible
solutions. These must be handed in a report form in PDF Format.
Jarrar © 2013

17
Part 3
SQL-based RDF Stores
Querying Graph-Shaped Data using SQL

Jarrar © 2013

18
Complexity of querying graph-shaped data
Where does the complexity of querying graph-shaped data
stored in a relational table lie?
The complexity of querying a data graph lies in the many selfjoins that are to be performed on the table.
Accessing multiple properties for a resource requires subjectsubject joins (Star-shaped queries).
Path expressions require subject-object joins.
In general, a query with n edges on an SPO table requires
n-1 self-joins of that table.
Jarrar © 2013

19
Solution-1: Subject-Property Matrixes
Clustered Property Table:

àOracle Solution [3]
implemented in 11g

•  A clustered property table, contains clusters of
properties tend to be defined together.
•  Multiple property tables with different clusters of
properties may be created to optimize queries.
•  A key requirement for this type of property table is
that a particular property may only appear in at most
one property table.
•  Advantage: Some queries can be answered directly
from the property table with no joins.
- Ex: retrieve the title of the book authored in 2001.
Jarrar © 2013

20
Solution-2: Vertically Partitioning
Six Vertically Partitioned Tables:

•  Triples table is rewritten into n two column tables
where n is the number of unique properties.
•  In each of these tables, the first column contains the
subjects that define that property and the second
column contains the object values for those subjects.
•  An experimental implementation of this approach
was performed using an open source columnoriented database system (C-Store).

àC-Store Solution [4]

Jarrar © 2013

21
Solution-3: Indexes and Dictionaries
Build Indexes.
–  On each column: S, P, O
–  On Permutations of two columns: SP, SO, PS, PO, …
–  On Permutation of three columns: SPO, SOP, PSO, POS,
OSP, OPS, …
Dictionary Encoding String Data?
–  Replacing all literals by IDs.
–  Triple stores are compressed.
–  Allows for faster comparison and matching operations.

à RDF3X Solution [5]

Jarrar © 2013

22
Discussion
Advantages and Disadvantages of storing RDF in
other schemas?

What do you prefer?

Any other suggestions to optimize queries over
graph-shaped data?

Jarrar © 2013

23
MashQL (Query Formulation & indexing Graphs)
(Research started at the University of Cyprus, then at Birzeit University)

What about summarizing the data graph, and instead of querying the
original data, we query the summary and get the same results?
What about summarizing 15M SPO triples into less then 500,000?
This is called Graph Signature Indexing. It is currently in research at
Birzeit University.
Initial comparison
between GS performance
and Oracle’s Semantic
Technologies

Jarrar © 2013

24
MashQL (Query Formulation & indexing Graphs)
A graphical query formulation
language for the Data Web
A general structured-data
retrieval language (not merely
an interface)

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

Query
Everything
Artist “^Dion”
YearProdYear > 2003
Title

RAlbumTitle

Addresses all of the following challenges:
ü  The user does not know the schema.
ü  There is no offline or inline schemaontology.
ü  The query may involve multiple sources.
ü  The query language is expressive (not a single-purpose interface)
Jarrar © 2013

25
MashQL Example
Celine Dion’s
Albums after
2003?

www.site1.com/rdf
:A1
:A1
:A1
:A1
:A2
:A2

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

:Title “Taking Chances”
:Artist “Dion C.”
:ProdYear 2007
:Producer “Peer Astrom”
:Title “Monodose”
:Artist “Ziad Rahbani”

www.site2.com/rdf

Query

:B1
:B1
:B1
:B1
:B1
:B2
:B2
:B2
:B2

Everything
Artist “^Dion”
YearProdYear > 2003
Title

RAlbumTitle

rdf:Type bibo:Album
:Title “The prayer”
:Artist “Celine Dion”
:Artist “Josh Groban”
:Year
2008
rdf:Type bibo:Album
:Title “Miracle”
:Author “Celine Dion”
:Year
2004

§  Interactive Query Formulation.
§  MashQL queries are translated into and executed as SPARQL.
Jarrar © 2013

26
MashQL Example
Celine Dion’s
Albums after
2003?

www.site1.com/rdf
:A1
:A1
:A1
:A1
:A2
:A2

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

:Title “Taking Chances”
:Artist “Dion C.”
:ProdYear 2007
:Producer “Peer Astrom”
:Title “Monodose”
:Artist “Ziad Rahbani”

www.site2.com/rdf

Query

:B1
:B1
:B1
:B1
:B1
:B2
:B2
:B2
:B2

Everything

Jarrar © 2013

rdf:Type bibo:Album
:Title “The prayer”
:Artist “Celine Dion”
:Artist “Josh Groban”
:Year
2008
rdf:Type bibo:Album
:Title “Miracle”
:Author “Celine Dion”
:Year
2004

27
MashQL Example
Celine Dion’s
Albums after
2003?

www.site1.com/rdf
:A1
:A1
:A1
:A1
:A2
:A2

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

:Title “Taking Chances”
:Artist “Dion C.”
:ProdYear 2007
:Producer “Peer Astrom”
:Title “Monodose”
:Artist “Ziad Rahbani”

www.site2.com/rdf

Query

:B1
:B1
:B1
:B1
:B1
:B2
:B2
:B2
:B2

Everything
Everything
Album
A1
A2
B1
Types Individual
s

rdf:Type bibo:Album
:Title “The prayer”
:Artist “Celine Dion”
:Artist “Josh Groban”
:Year
2008
rdf:Type bibo:Album
:Title “Miracle”
:Author “Celine Dion”
:Year
2004

Background queries
SELECT X WHERE {?X ?P ?O}
Union
SELECT X WHERE {?S ?P ?X}
SELECT X WHERE {?S rdf:Type ?X}

Jarrar © 2013

28
MashQL Example
Celine Dion’s
Albums after
2003?

www.site1.com/rdf
:A1
:A1
:A1
:A1
:A2
:A2

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

:Title “Taking Chances”
:Artist “Dion C.”
:ProdYear 2007
:Producer “Peer Astrom”
:Title “Monodose”
:Artist “Ziad Rahbani”

www.site2.com/rdf

Query
Everything
Artist
Artist
Author
Title
ProdYear
Type
Year

Contains Dion
Equals
Equals
Contains
Contains
OneOf
Not
Between
LessTha
n
MoreTha
n

:B1
:B1
:B1
:B1
:B1
:B2
:B2
:B2
:B2

rdf:Type bibo:Album
:Title “The prayer”
:Artist “Celine Dion”
:Artist “Josh Groban”
:Year
2008
rdf:Type bibo:Album
:Title “Miracle”
:Author “Celine Dion”
:Year
2004

Background query
SELECT P WHERE {?Everything ?P ?O}

Jarrar © 2013

29
MashQL Example
Celine Dion’s
Albums after
2003?

www.site1.com/rdf
:A1
:A1
:A1
:A1
:A2
:A2

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

:Title “Taking Chances”
:Artist “Dion C.”
:ProdYear 2007
:Producer “Peer Astrom”
:Title “Monodose”
:Artist “Ziad Rahbani”

www.site2.com/rdf

Query
Everything
Artist “^Dion
Year ProdYear
Artist
Author
Title
ProdYear
Type
Year

MoreTh 2003
Equals
Equals
Contains
OneOf
Not
Between
LessTha
MoreThan
n
MoreTha
n

:B1
:B1
:B1
:B1
:B1
:B2
:B2
:B2
:B2

rdf:Type bibo:Album
:Title “The prayer”
:Artist “Celine Dion”
:Artist “Josh Groban”
:Year
2008
rdf:Type bibo:Album
:Title “Miracle”
:Author “Celine Dion”
:Year
2004

Background query
SELECT P WHERE {?Everything ?P ?O}

Jarrar © 2013

30
MashQL Example
Celine Dion’s
Albums after
2003?

www.site1.com/rdf
:A1
:A1
:A1
:A1
:A2
:A2

RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

:Title “Taking Chances”
:Artist “Dion C.”
:ProdYear 2007
:Producer “Peer Astrom”
:Title “Monodose”
:Artist “Ziad Rahbani”

www.site2.com/rdf

Query
Everything
Artist “^Dion
YearProdYear > 2003
Title
Artist
Author
Title
Title
ProdYear
Type
Year

AlbumTitle

:B1
:B1
:B1
:B1
:B1
:B2
:B2
:B2
:B2

rdf:Type bibo:Album
:Title “The prayer”
:Artist “Celine Dion”
:Artist “Josh Groban”
:Year
2008
rdf:Type bibo:Album
:Title “Miracle”
:Author “Celine Dion”
:Year
2004

Background query
SELECT P WHERE {?Everything ?P ?O}

Jarrar © 2013

31
MashQL Example
Celine Dion’s
Albums after
2003?
RDF Input
From:
http://www.site1.com/rdf
http://www.site2.com/rdf

Query
Everything
Artist “^Dion”
YearProdYear > 2003
Title

RAlbumTitle

PREFIX S1: <http://site1.com/rdf>
PREFIX S2: <http://site1.com/rdf>
SELECT ?AlbumTitle
FROM <http://site1.com/rdf>
FROM <http://site2.com/rdf>
WHERE {
{?X S1:Artist ?X1} UNION {?X S2:Artist ?X1}
{?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2}
{{?X S1:Title ?AlbumTitle} UNION {?X
S2:Title ?AlbumTitle}}
FILTER regex(?X1, “^Dion”)
FILTER (?X2 > 2003)}

Jarrar © 2013

32
MashQL Query Optimization
•  The major challenge in MashQL is interactivity!
•  User interaction must be within 100ms.
•  However, querying RDF is typically slow.
•  How about dealing with huge datasets!?
•  Implemented Solutions (i.e., Oracle) proved to perform
weakly specially in MashQL background queries.

A query optimization solution was developed.,
called Graph Signature Index.

Jarrar © 2013

33
MashQL Query Optimization: Graph Signature
‘Graph Signature’ optimizes RDF queries by :
Summarizing the RDF dataset
(Algorithms were developed)

Then, executing the queries on the summary
instead of the original dataset
(A new execution plan was developed)

Jarrar © 2013

34
Experiments
•  Experiments performed on 15M SPO triples (rows) of the Yago Dataset.
•  Graph Signature Index of less than 500K was built on the data.
•  Two sets of queries were performed, and the results were as follows:
Extreme-case linear queries of depth 1-5:
Query
GS
Oracle

A1
0.344
110.390

A2
1.953
302.672

A3
5.250
525.844

A4
10.234
702.969

A5
24.672
>20min

Queries that cover all MashQL Background queries some queries that
might occur as the final queries generated in MashQL:
Query
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
GS
0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469
Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703
Jarrar © 2013

35
References
1.  Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar:
Towards Query Optimization for the Data Web - Two Disk-Based
algorithms: Trace Equivalence and Bisimilarity.
2.  Mustafa Jarrar and Marios D. Dikaiakos:
A Query Formulation Language for the Data Web. IEEE Transactions
on Knowledge and Data Engineering.IEEE Computer Society.
3.  Chong E, Das S, Eadon G, Srinivasan J: An efficient SQL-based RDF
querying scheme. VLDB’05.
4.  Abadi et al, Scalable Semantic Web Data Management Using Vertical
Partitioning. In VLDB 2007.
5.  Neumann T, Weikum G: RDF3X: RISC style engine for RDF.
VLDB’08.

Jarrar © 2013

36

Mais conteúdo relacionado

Destaque

Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsMustafa Jarrar
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsMustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationMustafa Jarrar
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFMustafa Jarrar
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineMustafa Jarrar
 
Jarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationJarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationMustafa Jarrar
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFMustafa Jarrar
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaMustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionMustafa Jarrar
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageMustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebMustafa Jarrar
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Mustafa Jarrar
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesMarin Dimitrov
 

Destaque (16)

Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and Constraints
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data Integration
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course Outline
 
Jarrar: Introduction to Data Integration
Jarrar: Introduction to Data IntegrationJarrar: Introduction to Data Integration
Jarrar: Introduction to Data Integration
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDF
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF Schema
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology Language
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic Repositories
 

Semelhante a Jarrar: RDF Stores: Challenges and Solutions

Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)weiw_oz
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphTigerGraph
 
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRMDH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRMFrederic Kaplan
 
JHU Data Science MOOCs - Behind the Scenes
JHU Data Science MOOCs - Behind the ScenesJHU Data Science MOOCs - Behind the Scenes
JHU Data Science MOOCs - Behind the Scenesjtleek
 
Euruko 2009 - Software Craftsmanship
Euruko 2009 - Software CraftsmanshipEuruko 2009 - Software Craftsmanship
Euruko 2009 - Software CraftsmanshipPhillip Oertel
 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataBuild Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataJean Ihm
 
Programming+Lesson+Green+Presentation+Green+variant.pptx
Programming+Lesson+Green+Presentation+Green+variant.pptxProgramming+Lesson+Green+Presentation+Green+variant.pptx
Programming+Lesson+Green+Presentation+Green+variant.pptxmzsenno
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesBasil Ell
 
Yahoo7 Semantic Web Presentation
Yahoo7 Semantic Web PresentationYahoo7 Semantic Web Presentation
Yahoo7 Semantic Web Presentationpaulwilliamwhite
 
Knowledg graphs yosi mass
Knowledg graphs yosi massKnowledg graphs yosi mass
Knowledg graphs yosi massdiannepatricia
 

Semelhante a Jarrar: RDF Stores: Challenges and Solutions (12)

Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databases
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRMDH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
DH101 2013/2014 course 6 - Semantic coding, RDF, CIDOC-CRM
 
DBpedia Mobile Explorer
DBpedia Mobile ExplorerDBpedia Mobile Explorer
DBpedia Mobile Explorer
 
JHU Data Science MOOCs - Behind the Scenes
JHU Data Science MOOCs - Behind the ScenesJHU Data Science MOOCs - Behind the Scenes
JHU Data Science MOOCs - Behind the Scenes
 
Euruko 2009 - Software Craftsmanship
Euruko 2009 - Software CraftsmanshipEuruko 2009 - Software Craftsmanship
Euruko 2009 - Software Craftsmanship
 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataBuild Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
 
Programming+Lesson+Green+Presentation+Green+variant.pptx
Programming+Lesson+Green+Presentation+Green+variant.pptxProgramming+Lesson+Green+Presentation+Green+variant.pptx
Programming+Lesson+Green+Presentation+Green+variant.pptx
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
Yahoo7 Semantic Web Presentation
Yahoo7 Semantic Web PresentationYahoo7 Semantic Web Presentation
Yahoo7 Semantic Web Presentation
 
Knowledg graphs yosi mass
Knowledg graphs yosi massKnowledg graphs yosi mass
Knowledg graphs yosi mass
 

Último

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 

Último (20)

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 

Jarrar: RDF Stores: Challenges and Solutions

  • 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 RDF Stores Challenges and Solutions Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  • 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  • 3. Lecture Outline Part I: Querying RDF(S P O) tables using SQL Part 2: Practical Session (RDF graphs ) Part 3: SQL-based RDF Stores Keywords: RDF, RDF Stores, querying SPO Table, Graph Databases, Querying Graph, self joins, RDF3X, Oracle Semantic Technology, C-Store, vertical patronizing, MashQL, Graph Signature, Semantic Web, Data Web, Jarrar © 2013 3
  • 4. Part 1 Querying RDF(S P O) tables using SQL Jarrar © 2013 4
  • 5. RDF as a Data Model RDF (Resource Description Framework) is the Data Model behind the Semantic/Data Web. RDF is a graph-based data model. In RDF, Data is represented as triples: <Subject, Predicate, Object>, or in short: <S,P,O>. S P Jarrar © 2013 O 5
  • 6. RDF as a Data Model RDF’s way of representing data as triples is more elementary than Databases or XML. –  This enables easy data integration and interoperability between systems (as will be demonstrated later). •  EXAMPLE: the fact that the movie Sicko (2007) is directed by Michael Moore from the USA can be represented by the following triples which form a directed labeled graph: < < < < < < M1, M1, M1, D1, D1, C1, name, Sicko > year, 2007 > directedBy, D1 > name, Michael Moore > country, C1 > name, USA> 2007 Sicko name M1 directe dBy Jarrar © 2013 D1 name Michael Moore 6
  • 7. RDF graphs as an SPO table Typically, an RDF Graph is stored as one <S,P,O> Table in a Relational Database Management System. 2007 Sicko M1 directedBy Michael Moore Emmy Awards D1 P1 2009 M2 1995 Capitalism 1995 M3 directedBy actedIn Washington DC D2 P2 Brave Heart M4 2007 directedBy actedIn USA C1 Oscars Mel Gibson 1996 Nadine Labaki D3 Caramel Lebanon C2 location C3 P3 Beirut name Sweden Stockholm Stockholm Festival 2007 Jarrar © 2013 S M1 M1 M1 M2 M2 M2 M3 M3 M3 M4 M4 M4 D1 D1 D1 D2 D2 D2 D2 D3 D3 D3 D3 … P year Name directedBy directedBy Year Name Year directedBy Name Year directedBy Name Name hasWonPrizeIn Country Counrty hasWonPrizeIn Name actedIn Name Country hasWonPrizeIn actedIn … O 2007 Sicko D1 D1 2009 Capitalism 1995 D2 Brave Heart 2007 D3 Caramel Michael Moore P1 C1 C1 P2 Mel Gibson M3 Nadine Labaki C2 P3 M4 … 7
  • 8. RDF graphs as an SPO table How do we query graph-shaped data stored in one relational table? How do we traverse a graph stored in one relational table? à Self Joins! Jarrar © 2013 S M1 M1 M1 M2 M2 M2 M3 M3 M3 M4 M4 M4 D1 D1 D1 D2 D2 D2 D2 D3 D3 D3 D3 … P year Name directedBy directedBy Year Name Year directedBy Name Year directedBy Name Name hasWonPrizeIn Country Counrty hasWonPrizeIn Name actedIn Name Country hasWonPrizeIn actedIn … O 2007 Sicko D1 D1 2009 Capitalism 1995 D2 Brave Heart 2007 D3 Caramel Michael Moore P1 C1 C1 P2 Mel Gibson M3 Nadine Labaki C2 P3 M4 … 8
  • 9. How to Query RDF data stored in SPO table? S M1 M1 M1 M2 M2 M2 M3 M3 M3 M4 M4 M4 D1 D1 … P year Name directedBy directedBy Year Name Year directedBy Name Year directedBy Name Name hasWonPrizeIn … O 2007 Sicko D1 D1 2009 Capitalism 1995 D2 Brave Heart 2007 D3 Caramel Michael Moore P1 … 2007 Sicko M1 directedBy Michael Moore Emmy Awards D1 P1 2009 M2 1995 Capitalism 1995 M3 directedBy actedIn Washington DC D2 P2 Brave Heart M4 2007 directedBy actedIn USA C1 Oscars Mel Gibson 1996 Nadine Labaki D3 Caramel Exercises: Lebanon C2 location C3 P3 Beirut name Sweden Stockholm Festival 2007 Stockholm (1) What is the name of director D3? (2) What is the name of the director of the movie M1? (3) List all the movies who have directors from the USA and their directors. (4) List all the names of the directors from Lebanon who have won prizes and the prizes they have won. Jarrar © 2013 9
  • 10. How to Query RDF data stored in SPO table? Exercise 1: A simple query. –  What is the name of director D3? –  Consider the table MoviesTable(s,p,o). –  SQL: Select o From MoviesTable T Where T.s = ‘D3’ AND T.p = ‘Name’; Answer: Nadine Labaki Jarrar © 2013 S M1 M1 M1 M2 M2 M2 M3 M3 M3 M4 M4 M4 D1 D1 D1 D2 D2 D2 D2 D3 D3 D3 D3 … P year Name directedBy directedBy Year Name Year directedBy Name Year directedBy Name Name hasWonPrizeIn Country Counrty hasWonPrizeIn Name actedIn Name Country hasWonPrizeIn actedIn … O 2007 Sicko D1 D1 2009 Capitalism 1995 D2 Brave Heart 2007 D3 Caramel Michael Moore P1 C1 C1 P2 Mel Gibson M3 Nadine Labaki C2 P3 M4 … 10
  • 11. How to Query RDF data stored in SPO table? Exercise 2: A path query with 1 join. –  What is the name of the director of the movie M1. –  Consider the table MoviesTable (s,p,o). –  SQL: Select T2.o From MoviesTable T1, MoviesTable T2 Where {T1.s = ‘M1’ AND T1.p = ‘directedBy’ AND T1.o = T2.s AND T2.p = ‘name’}; Answer: Michael Moore Jarrar © 2013 S M1 M1 M1 M2 M2 M2 M3 M3 M3 M4 M4 M4 D1 D1 D1 D2 D2 D2 D2 D3 D3 D3 D3 … P year Name directedBy directedBy Year Name Year directedBy Name Year directedBy Name Name hasWonPrizeIn Country Counrty hasWonPrizeIn Name actedIn Name Country hasWonPrizeIn actedIn … O 2007 Sicko D1 D1 2009 Capitalism 1995 D2 Brave Heart 2007 D3 Caramel Michael Moore P1 C1 C1 P2 Mel Gibson M3 Nadine Labaki C2 P3 M4 … 11
  • 12. How to Query RDF data stored in SPO table? Exercise 3: A path query with 2 joins. –  List all the movies who have directors from the USA and their directors. –  Consider the table MoviesTable(s,p,o). –  SQL: Select T1.s, T1.o From MoviesTable T1, MoviesTable T2, MoviesTable T3 Where {T1.o = T2.s AND T2.o = T3.s AND T1.p = ‘directedBy’ AND T2.p = ‘country’ AND T3.p = ‘name’ AND T3.o = ‘USA’}; Answer: M1 D1; M2 D1; M3 D2 Jarrar © 2013 S M1 M1 M1 M2 M2 M2 M3 M3 M3 … D1 D1 D1 D2 D2 D2 D2 … C1 C1 C2 C2 … P O year Name directedBy directedBy Year Name Year directedBy Name … Name hasWonPrizeIn Country Counrty hasWonPrizeIn Name actedIn … Name Capital Name Capital … 2007 Sicko D1 D1 2009 Capitalism 1995 D2 Brave Heart … Michael Moore P1 C1 C1 P2 Mel Gibson M3 … USA Washington DC Lebanon Beirut … 12
  • 13. How to Query RDF data stored in SPO table? Exercise 4: Star query. –  List all the names of the directors from Lebanon who have won prizes and the prizes they have won. –  Consider the table MoviesTable (S,P,O). S P Select T1.o, T4.o From MoviesTable T1, MoviesTable T2, MoviesTable T3, MoviesTable T4 Where {T1.p = ‘name’ AND T2.s = T1.s AND T2.p = ‘country’ AND T2.o = T3.s AND T3.p = ‘name’ AND T3.o = ‘Lebanon’ AND T4.s = T1.s AND T4.p = ‘hasWonPrizeIn’}; Answer: ‘Nadine Labaki’ , P3 Jarrar © 2013 D1 D1 D1 D2 D2 D2 D2 D3 D3 D3 D3 … C1 C1 C2 C2 … O Name Michael Moore hasWonPrizeIn P1 Country C1 Counrty C1 hasWonPrizeIn P2 Name Mel Gibson actedIn M3 Name Nadine Labaki Country C2 hasWonPrizeIn P3 actedIn M4 … … Name USA Capital Washington DC Name Lebanon Capital Beirut … … 13
  • 14. Part 2: Querying SPO Tables using SQL Jarrar © 2013 14
  • 15. Practical Session Given the RDF graph in the next slide, do the following: (1) Insert the data graph as <S,P,O> triples in a 3-column table in any DBMS. Schema of the table: Books (Subject, Predicate, Object) (2) Write the following queries in SQL and execute them over the newly created table: •  List all the authors born in a country which has the name Palestine. •  List the names of all authors with the name of their affiliation who are born in a country whose capital’s population is 14M. Note that the author must have an affiliation. •  List the names of all books whose authors are born in Lebanon along with the name of the author. Jarrar © 2013 15
  • 16. Palestine Said BK1 BK2 CN1 AU1 CU Author BK3 The  P rophet BK4 Author Name Name Jerusalem Colombia  University 14 .0M India CN2 AU3 CA1 Viswanathan AU2 Wamadat Author Capital 7.6K Capital CA2 Name New  Delhi Naima Lebanon Gibran CN3 AU4 2.0M CA3 Beirut This data graph is about books. It talks about four books (BK1-BK4). Information recorded about a book includes data such as; its author, affiliation, country of birth including its capital and the population of its capital. Jarrar © 2013 16
  • 17. Practical Session • Each student should work alone. • The student is encouraged to first answer each query mentally from the graph before writing and executing the SQL query, and to compare the results of the query execution with his/her expected results. • The student is strongly recommended to write two additional queries, execute them on the data graph, and hand them along with the required queries. • The student must highlight problems and challenges of executing queries on graph-shaped data and propose solutions to the problem. • Each student must expect to present his/her queries at class and, more importantly, he/she must be ready to discuss the problems of querying graph-shaped data and his/her proposed solutions. • The final delivery should include (i) a snapshot of the built table, (ii) the SQL queries, (iii) The results of the queries, and (iv) a one-page discussion of the problems of querying graph-shaped data and the proposed possible solutions. These must be handed in a report form in PDF Format. Jarrar © 2013 17
  • 18. Part 3 SQL-based RDF Stores Querying Graph-Shaped Data using SQL Jarrar © 2013 18
  • 19. Complexity of querying graph-shaped data Where does the complexity of querying graph-shaped data stored in a relational table lie? The complexity of querying a data graph lies in the many selfjoins that are to be performed on the table. Accessing multiple properties for a resource requires subjectsubject joins (Star-shaped queries). Path expressions require subject-object joins. In general, a query with n edges on an SPO table requires n-1 self-joins of that table. Jarrar © 2013 19
  • 20. Solution-1: Subject-Property Matrixes Clustered Property Table: àOracle Solution [3] implemented in 11g •  A clustered property table, contains clusters of properties tend to be defined together. •  Multiple property tables with different clusters of properties may be created to optimize queries. •  A key requirement for this type of property table is that a particular property may only appear in at most one property table. •  Advantage: Some queries can be answered directly from the property table with no joins. - Ex: retrieve the title of the book authored in 2001. Jarrar © 2013 20
  • 21. Solution-2: Vertically Partitioning Six Vertically Partitioned Tables: •  Triples table is rewritten into n two column tables where n is the number of unique properties. •  In each of these tables, the first column contains the subjects that define that property and the second column contains the object values for those subjects. •  An experimental implementation of this approach was performed using an open source columnoriented database system (C-Store). àC-Store Solution [4] Jarrar © 2013 21
  • 22. Solution-3: Indexes and Dictionaries Build Indexes. –  On each column: S, P, O –  On Permutations of two columns: SP, SO, PS, PO, … –  On Permutation of three columns: SPO, SOP, PSO, POS, OSP, OPS, … Dictionary Encoding String Data? –  Replacing all literals by IDs. –  Triple stores are compressed. –  Allows for faster comparison and matching operations. à RDF3X Solution [5] Jarrar © 2013 22
  • 23. Discussion Advantages and Disadvantages of storing RDF in other schemas? What do you prefer? Any other suggestions to optimize queries over graph-shaped data? Jarrar © 2013 23
  • 24. MashQL (Query Formulation & indexing Graphs) (Research started at the University of Cyprus, then at Birzeit University) What about summarizing the data graph, and instead of querying the original data, we query the summary and get the same results? What about summarizing 15M SPO triples into less then 500,000? This is called Graph Signature Indexing. It is currently in research at Birzeit University. Initial comparison between GS performance and Oracle’s Semantic Technologies Jarrar © 2013 24
  • 25. MashQL (Query Formulation & indexing Graphs) A graphical query formulation language for the Data Web A general structured-data retrieval language (not merely an interface) RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf Query Everything Artist “^Dion” YearProdYear > 2003 Title RAlbumTitle Addresses all of the following challenges: ü  The user does not know the schema. ü  There is no offline or inline schemaontology. ü  The query may involve multiple sources. ü  The query language is expressive (not a single-purpose interface) Jarrar © 2013 25
  • 26. MashQL Example Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :A1 :A1 :A1 :A2 :A2 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf :Title “Taking Chances” :Artist “Dion C.” :ProdYear 2007 :Producer “Peer Astrom” :Title “Monodose” :Artist “Ziad Rahbani” www.site2.com/rdf Query :B1 :B1 :B1 :B1 :B1 :B2 :B2 :B2 :B2 Everything Artist “^Dion” YearProdYear > 2003 Title RAlbumTitle rdf:Type bibo:Album :Title “The prayer” :Artist “Celine Dion” :Artist “Josh Groban” :Year 2008 rdf:Type bibo:Album :Title “Miracle” :Author “Celine Dion” :Year 2004 §  Interactive Query Formulation. §  MashQL queries are translated into and executed as SPARQL. Jarrar © 2013 26
  • 27. MashQL Example Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :A1 :A1 :A1 :A2 :A2 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf :Title “Taking Chances” :Artist “Dion C.” :ProdYear 2007 :Producer “Peer Astrom” :Title “Monodose” :Artist “Ziad Rahbani” www.site2.com/rdf Query :B1 :B1 :B1 :B1 :B1 :B2 :B2 :B2 :B2 Everything Jarrar © 2013 rdf:Type bibo:Album :Title “The prayer” :Artist “Celine Dion” :Artist “Josh Groban” :Year 2008 rdf:Type bibo:Album :Title “Miracle” :Author “Celine Dion” :Year 2004 27
  • 28. MashQL Example Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :A1 :A1 :A1 :A2 :A2 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf :Title “Taking Chances” :Artist “Dion C.” :ProdYear 2007 :Producer “Peer Astrom” :Title “Monodose” :Artist “Ziad Rahbani” www.site2.com/rdf Query :B1 :B1 :B1 :B1 :B1 :B2 :B2 :B2 :B2 Everything Everything Album A1 A2 B1 Types Individual s rdf:Type bibo:Album :Title “The prayer” :Artist “Celine Dion” :Artist “Josh Groban” :Year 2008 rdf:Type bibo:Album :Title “Miracle” :Author “Celine Dion” :Year 2004 Background queries SELECT X WHERE {?X ?P ?O} Union SELECT X WHERE {?S ?P ?X} SELECT X WHERE {?S rdf:Type ?X} Jarrar © 2013 28
  • 29. MashQL Example Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :A1 :A1 :A1 :A2 :A2 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf :Title “Taking Chances” :Artist “Dion C.” :ProdYear 2007 :Producer “Peer Astrom” :Title “Monodose” :Artist “Ziad Rahbani” www.site2.com/rdf Query Everything Artist Artist Author Title ProdYear Type Year Contains Dion Equals Equals Contains Contains OneOf Not Between LessTha n MoreTha n :B1 :B1 :B1 :B1 :B1 :B2 :B2 :B2 :B2 rdf:Type bibo:Album :Title “The prayer” :Artist “Celine Dion” :Artist “Josh Groban” :Year 2008 rdf:Type bibo:Album :Title “Miracle” :Author “Celine Dion” :Year 2004 Background query SELECT P WHERE {?Everything ?P ?O} Jarrar © 2013 29
  • 30. MashQL Example Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :A1 :A1 :A1 :A2 :A2 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf :Title “Taking Chances” :Artist “Dion C.” :ProdYear 2007 :Producer “Peer Astrom” :Title “Monodose” :Artist “Ziad Rahbani” www.site2.com/rdf Query Everything Artist “^Dion Year ProdYear Artist Author Title ProdYear Type Year MoreTh 2003 Equals Equals Contains OneOf Not Between LessTha MoreThan n MoreTha n :B1 :B1 :B1 :B1 :B1 :B2 :B2 :B2 :B2 rdf:Type bibo:Album :Title “The prayer” :Artist “Celine Dion” :Artist “Josh Groban” :Year 2008 rdf:Type bibo:Album :Title “Miracle” :Author “Celine Dion” :Year 2004 Background query SELECT P WHERE {?Everything ?P ?O} Jarrar © 2013 30
  • 31. MashQL Example Celine Dion’s Albums after 2003? www.site1.com/rdf :A1 :A1 :A1 :A1 :A2 :A2 RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf :Title “Taking Chances” :Artist “Dion C.” :ProdYear 2007 :Producer “Peer Astrom” :Title “Monodose” :Artist “Ziad Rahbani” www.site2.com/rdf Query Everything Artist “^Dion YearProdYear > 2003 Title Artist Author Title Title ProdYear Type Year AlbumTitle :B1 :B1 :B1 :B1 :B1 :B2 :B2 :B2 :B2 rdf:Type bibo:Album :Title “The prayer” :Artist “Celine Dion” :Artist “Josh Groban” :Year 2008 rdf:Type bibo:Album :Title “Miracle” :Author “Celine Dion” :Year 2004 Background query SELECT P WHERE {?Everything ?P ?O} Jarrar © 2013 31
  • 32. MashQL Example Celine Dion’s Albums after 2003? RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf Query Everything Artist “^Dion” YearProdYear > 2003 Title RAlbumTitle PREFIX S1: <http://site1.com/rdf> PREFIX S2: <http://site1.com/rdf> SELECT ?AlbumTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {?X S1:Artist ?X1} UNION {?X S2:Artist ?X1} {?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2} {{?X S1:Title ?AlbumTitle} UNION {?X S2:Title ?AlbumTitle}} FILTER regex(?X1, “^Dion”) FILTER (?X2 > 2003)} Jarrar © 2013 32
  • 33. MashQL Query Optimization •  The major challenge in MashQL is interactivity! •  User interaction must be within 100ms. •  However, querying RDF is typically slow. •  How about dealing with huge datasets!? •  Implemented Solutions (i.e., Oracle) proved to perform weakly specially in MashQL background queries. A query optimization solution was developed., called Graph Signature Index. Jarrar © 2013 33
  • 34. MashQL Query Optimization: Graph Signature ‘Graph Signature’ optimizes RDF queries by : Summarizing the RDF dataset (Algorithms were developed) Then, executing the queries on the summary instead of the original dataset (A new execution plan was developed) Jarrar © 2013 34
  • 35. Experiments •  Experiments performed on 15M SPO triples (rows) of the Yago Dataset. •  Graph Signature Index of less than 500K was built on the data. •  Two sets of queries were performed, and the results were as follows: Extreme-case linear queries of depth 1-5: Query GS Oracle A1 0.344 110.390 A2 1.953 302.672 A3 5.250 525.844 A4 10.234 702.969 A5 24.672 >20min Queries that cover all MashQL Background queries some queries that might occur as the final queries generated in MashQL: Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 GS 0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469 Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703 Jarrar © 2013 35
  • 36. References 1.  Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query Optimization for the Data Web - Two Disk-Based algorithms: Trace Equivalence and Bisimilarity. 2.  Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation Language for the Data Web. IEEE Transactions on Knowledge and Data Engineering.IEEE Computer Society. 3.  Chong E, Das S, Eadon G, Srinivasan J: An efficient SQL-based RDF querying scheme. VLDB’05. 4.  Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. 5.  Neumann T, Weikum G: RDF3X: RISC style engine for RDF. VLDB’08. Jarrar © 2013 36