SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
‫أكاديمية الحكومة اإللكترونية الفلسطينية‬
              The Palestinian eGovernment Academy
                         www.egovacademy.ps



Tutorial II: Data Integration and Open Information Systems

                         Session 9
       RDF Stores: Challenges and Solutions


                    Dr. Mustafa Jarrar
                       University of Birzeit
                       mjarrar@birzeit.edu
                         www.jarrar.info


                             PalGov © 2011                1
About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
Project Consortium:

             Birzeit University, Palestine
                                                           University of Trento, Italy
             (Coordinator )


             Palestine Polytechnic University, Palestine   Vrije Universiteit Brussel, Belgium


             Palestine Technical University, Palestine
                                                           Université de Savoie, France

             Ministry of Telecom and IT, Palestine
                                                           University of Namur, Belgium
             Ministry of Interior, Palestine
                                                           TrueTrust, UK
             Ministry of Local Government, Palestine


Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011
                                                                                                 2
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.


No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.




                 Attribution-NonCommercial-ShareAlike
                              CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.

                                 PalGov © 2011                               3
Tutorial Map

                                                                                                           Topic                                     h
              Intended Learning Objectives
                                                                              Session 1: XML Basics and Namespaces                               3
A: Knowledge and Understanding
                                                                              Session 2: XML DTD’s                                               3
  2a1: Describe tree and graph data models.
                                                                              Session 3: XML Schemas                                             3
  2a2: Understand the notation of XML, RDF, RDFS, and OWL.
                                                                              Session 4: Lab-XML Schemas                                         3
  2a3: Demonstrate knowledge about querying techniques for data
  models as SPARQL and XPath.                                                 Session 5: RDF and RDFs                                            3

  2a4: Explain the concepts of identity management and Linked data.           Session 6: Lab-RDF and RDFs                                        3
  2a5: Demonstrate knowledge about Integration &fusion of                     Session 7: OWL (Ontology Web Language)                             3
  heterogeneous data.                                                         Session 8: Lab-OWL                                                 3
B: Intellectual Skills                                                        Session 9: Lab-RDF Stores -Challenges and Solutions                3
  2b1: Represent data using tree and graph data models (XML &                 Session 10: Lab-SPARQL                                             3
  RDF).                                                                       Session 11: Lab-Oracle Semantic Technology                         3
  2b2: Describe data semantics using RDFS and OWL.                            Session 12_1: The problem of Data Integration                      1.5
  2b3: Manage and query data represented in RDF, XML, OWL.                    Session 12_2: Architectural Solutions for the Integration Issues   1.5
  2b4: Integrate and fuse heterogeneous data.                                 Session 13_1: Data Schema Integration                              1
C: Professional and Practical Skills                                          Session 13_2: GAV and LAV Integration                              1
  2c1: Using Oracle Semantic Technology and/or Virtuoso to store              Session 13_3: Data Integration and Fusion using RDF                1
  and query RDF stores.                                                       Session 14: Lab-Data Integration and Fusion using RDF              3
D: General and Transferable Skills
  2d1: Working with team.                                                     Session 15_1: Data Web and Linked Data                             1.5
  2d2: Presenting and defending ideas.                                        Session 15_2: RDFa                                                 1.5
  2d3: Use of creativity and innovation in problem solving.
  2d4: Develop communication skills and logical reasoning abilities.          Session 16: Lab-RDFa                                               3

                                                                       PalGov © 2011                                                                     4
Module ILOs


After completing this module students will be
able to:
   - Demonstrate knowledge about querying techniques
   for the graph data model.
   - Manage and query data represented in RDF.




                        PalGov © 2011                  5
Data Integration and Open Information Systems (Tutorial II)
                                             The Palestinian e-Government Academy
                                                                     January, 2012




Tutorial II: Data Integration and Open Information Systems



          Part1: Querying SPO Tables




                          PalGov © 2011                                             6
Module Outline

Part I: Querying SPO Tables


Part 2: Querying SPO Tables using SQL


Part 3: Complexities and Solutions of Querying Graph
Shaped Data




                        PalGov © 2011                  7
RDF as a Data Model

• RDF (Resource Description Framework) is the Data Model
  behind the Semantic/Data Web.

• RDF is a graph-based data model.

• In RDF, Data is represented as triples: <Subject,
  Predicate, Object>, or in short: <S,P,O>.


                          P              O
                S
                         PalGov © 2011                 8
RDF as a Data Model

    • RDF’s way of representing Data as triples is more
      elementary than Databases or XML.
       – This enables easy data integration and interoperability between
         systems (as will be demonstrated later).
    • EXAMPLE: the fact that the movie Sicko (2007) is directed by Michael
      Moore from the USA can be represented by the following triples which
      form a directed labeled graph:

<   M1,   name, Sicko >                           2007
<   M1,   year, 2007 >
<   M1,   directedBy, D1 >
                                                               Sicko
<   D1,   name, Michael Moore >       M1
<   D1,   country, C1 >                                       name     Michael
                                                         D1
<   C1,   name, USA>                                                   Moore



                                  PalGov © 2011                                  9
Persistent Storage of RDF Graphs
                                                                                             S        P                O
• Typically, an RDF Graph is stored as                                                      M1   year            2007
  one <S,P,O> Table in a Relational                                                         M1
                                                                                            M1
                                                                                                 Name
                                                                                                 directedBy
                                                                                                                 Sicko
                                                                                                                 D1
  Database Management System.                                                               M2   directedBy      D1
                                                                                            M2   Year            2009
            2007
                      Sicko                                                                 M2   Name            Capitalism
                                          Michael Moore                                     M3   Year            1995
       directedBy
 M1                  D1                                              Emmy Awards            M3   directedBy      D2
                                                  P1                 1995                   M3   Name            Brave Heart
 M2                  2009
                                                                                            M4   Year            2007
                      Capitalism
                                                             USA                            M4   directedBy      D3
                                            C1
                    1995                                                                    M4   Name            Caramel
                                                              Washington DC
      directedBy                                                                            D1   Name            Michael Moore
 M3                  D2
        actedIn                                   P2                 Oscars                 D1   hasWonPrizeIn   P1
                      Brave Heart         Mel Gibson
                                                                                            D1   Country         C1
                                                                   1996
             2007                                                                           D2   Counrty         C1
                                         Nadine Labaki
      directedBy                                                                            D2   hasWonPrizeIn   P2
 M4                  D3                     C2               Lebanon
        actedIn                                                                             D2   Name            Mel Gibson
                                                            Beirut                          D2   actedIn         M3
                       Caramel       location            name
                                                  P3                   Stockholm Festival   D3   Name            Nadine Labaki
                                    C3
                                                  Sweden           2007                     D3   Country         C2
                                                 Stockholm                                  D3   hasWonPrizeIn   P3
                                                                                            D3   actedIn         M4
                                                                                            …    …               …
                                                                            PalGov © 2011                                        10
How to Query RDF data stored in one table?
                                             S        P                O
• How do we query graph-shaped              M1
                                            M1
                                                 year
                                                 Name
                                                                 2007
                                                                 Sicko
  data stored in one relational table?      M1   directedBy      D1
                                            M2   directedBy      D1
                                            M2   Year            2009
                                            M2   Name            Capitalism
• How do we traverse a graph stored         M3   Year            1995
                                            M3   directedBy      D2
  in one relational table?                  M3   Name            Brave Heart
                                            M4   Year            2007
                                            M4   directedBy      D3
                                            M4   Name            Caramel
 Self Joins!                               D1
                                            D1
                                                 Name
                                                 hasWonPrizeIn
                                                                 Michael Moore
                                                                 P1
                                            D1   Country         C1
                                            D2   Counrty         C1
                                            D2   hasWonPrizeIn   P2
                                            D2   Name            Mel Gibson
                                            D2   actedIn         M3
                                            D3   Name            Nadine Labaki
                                            D3   Country         C2
                                            D3   hasWonPrizeIn   P3
                                            D3   actedIn         M4
                                            …    …               …
                            PalGov © 2011                                        11
How to Query RDF data stored in one table?

   S          P              O                     2007
  M1    year            2007                                 Sicko
                                                                                 Michael Moore
  M1    Name            Sicko                 directedBy
                                        M1                  D1                                              Emmy Awards
  M1    directedBy      D1
  M2    directedBy      D1                                                               P1                 1995
                                        M2                  2009
  M2    Year            2009
  M2    Name            Capitalism                           Capitalism
                                                                                   C1              USA
  M3    Year            1995                               1995
  M3    directedBy      D2                                                                          Washington DC
                                             directedBy
  M3    Name            Brave Heart     M3                  D2
                                               actedIn                                   P2
  M4    Year            2007                                                                                Oscars
  M4    directedBy      D3                                   Brave Heart         Mel Gibson              1996
  M4    Name            Caramel                     2007                        Nadine Labaki
  D1    Name            Michael Moore        directedBy
                                        M4                  D3                     C2              Lebanon
  D1    hasWonPrizeIn   P1                     actedIn
  …     …               …                                                                          Beirut
                                                              Caramel       location            name
                                                                                         P3                   Stockholm Festival
                                                                           C3
Exercises:                                                                               Sweden          2007

(1) What is the name of director D3?                           Stockholm

(2) What is the name of the director of the movie M1?
(3) List all the movies who have directors from the USA and their directors.
(4) List all the names of the directors from Lebanon who have won prizes and the
    prizes they have won.                PalGov © 2011                                                                        12
How to Query RDF data stored in one table?
                                                 S        P                O
• Exercise 1: A simple query.                   M1
                                                M1
                                                     year
                                                     Name
                                                                     2007
                                                                     Sicko
   – What is the name of director D3?           M1   directedBy      D1
                                                M2   directedBy      D1
   – Consider the table MoviesTable (s,p,o).    M2   Year            2009
   – SQL:                                       M2   Name            Capitalism
                                                M3   Year            1995
                                                M3   directedBy      D2
Select o                                        M3   Name            Brave Heart
From MoviesTable T                              M4   Year            2007
                                                M4   directedBy      D3
Where T.s = „D3‟ AND T.p = „Name‟;              M4   Name            Caramel
                                                D1   Name            Michael Moore
                                                D1   hasWonPrizeIn   P1
                                                D1   Country         C1
                                                D2   Counrty         C1
                                                D2   hasWonPrizeIn   P2
                                                D2   Name            Mel Gibson
                                                D2   actedIn         M3
                                                D3   Name            Nadine Labaki
                                                D3   Country         C2
                                                D3   hasWonPrizeIn   P3
                                                D3   actedIn         M4
                                                …    …               …
                                PalGov © 2011   Ans: Nadine Labaki                   13
How to Query RDF data stored in one table?
                                                 S        P                O
• Exercise 2: A path query with 1 join.         M1
                                                M1
                                                     year
                                                     Name
                                                                     2007
                                                                     Sicko
   – What is the name of the director of the    M1   directedBy      D1
                                                M2   directedBy      D1
     movie M1.
                                                M2   Year            2009
   – Consider the table MoviesTable (s,p,o).    M2   Name            Capitalism
                                                M3   Year            1995
   – SQL:                                       M3   directedBy      D2
                                                M3   Name            Brave Heart
Select T2.o                                     M4   Year            2007
                                                M4   directedBy      D3
From MoviesTable T1, MoviesTable T2             M4   Name            Caramel
Where {T1.s = „M1‟ AND                          D1   Name            Michael Moore
                                                D1   hasWonPrizeIn   P1
       T1.p = „directedBy‟ AND                  D1   Country         C1
                                                D2   Counrty         C1
       T1.o = T2.s AND                          D2   hasWonPrizeIn   P2
       T2.p = „name‟};                          D2   Name            Mel Gibson
                                                D2   actedIn         M3
                                                D3   Name            Nadine Labaki
                                                D3   Country         C2
                                                D3   hasWonPrizeIn   P3
                                                D3   actedIn         M4
                                                …    …               …
                                PalGov © 2011   Ans: Michael Moore                   14
How to Query RDF data stored in one table?
                                                    S        P                O
• Exercise 3: A path query with 2                  M1
                                                   M1
                                                        year
                                                        Name
                                                                        2007
                                                                        Sicko
  joins.                                           M1   directedBy      D1
                                                   M2   directedBy      D1
   – List all the movies who have directors from   M2   Year            2009
     the USA and their directors.                  M2   Name            Capitalism
                                                   M3   Year            1995
   – Consider the table MoviesTable(s,p,o).        M3   directedBy      D2
   – SQL:                                          M3   Name            Brave Heart
                                                   …    …               …
                                                   D1   Name            Michael Moore
Select T1.s, T1.o                                  D1   hasWonPrizeIn   P1
From MoviesTable T1, MoviesTable T2,               D1   Country         C1
                                                   D2   Counrty         C1
     MoviesTable T3                                D2   hasWonPrizeIn   P2
                                                   D2   Name            Mel Gibson
Where {T1.o = T2.s AND
                                                   D2   actedIn         M3
       T2.o = T3.s AND                             …    …               …
                                                   C1   Name            USA
       T1.p = „directedBy‟ AND                     C1   Capital         Washington DC
       T2.p = „country‟ AND                        C2   Name            Lebanon
                                                   C2   Capital         Beirut
       T3.p = „name‟ AND                           …    …               …

       T3.o = „USA‟};                              Ans: M1 D1; M2 D1; M3 D2
                                   PalGov © 2011                                        15
How to Query RDF data stored in one table?
• Exercise 4: Star query.
   – List all the names of the directors from Lebanon who have
     won prizes and the prizes they have won.
                                                    S       P               O
   – Consider the table MoviesTable (S,P,O).       D1  Name          Michael Moore
                                                  D1   hasWonPrizeIn P1
Select T1.o, T4.o                                 D1   Country       C1
From MoviesTable T1, MoviesTable T2,              D2   Counrty       C1
                                                  D2   hasWonPrizeIn P2
     MoviesTable T3, MoviesTable T4
                                                  D2   Name          Mel Gibson
Where {T1.p = „name‟ AND                          D2   actedIn       M3
                                                  D3   Name          Nadine Labaki
       T2.s = T1.s AND                            D3   Country       C2
       T2.p = „country‟ AND                       D3   hasWonPrizeIn P3
                                                  D3   actedIn       M4
       T2.o = T3.s AND                            …    …             …
       T3.p = „name‟ AND                          C1   Name          USA
                                                  C1   Capital       Washington DC
       T3.o = „Lebanon‟ AND                       C2   Name          Lebanon
       T4.s = T1.s AND                            C2   Capital       Beirut
                                                  …    …             …
       T4.p = „hasWonPrizeIn‟};                   Ans: ‘Nadine Labaki’ , P3

                                  PalGov © 2011                                  16
Data Integration and Open Information Systems (Tutorial II)
                                             The Palestinian e-Government Academy
                                                                     January, 2012




Tutorial II: Data Integration and Open Information Systems


                   Part 2:
        Querying SPO Tables using SQL




                          PalGov © 2011                                             17
Module Outline

Part I: Querying SPO Tables


Part 2: Querying SPO Tables using SQL


Part 3: Complexities and Solutions of Querying Graph
Shaped Data




                        PalGov © 2011                  18
Practical Session

• Given the RDF graph in the next slide, do the following:
   (1) Insert the data graph as <S,P,O> triples in a 3-column table in
       any DBMS.
         Schema of the table: Books (Subject, Predicate, Object)
   (2) Write the following queries in SQL and execute them over the
       newly created table:
       • List all the authors born in a country which has the name Palestine.
       • List the names of all authors with the name of their affiliation who are
         born in a country whose capital’s population is14M. Note that the
         author must have an affiliation.
       • List the names of all books whose authors are born in Lebanon along
         with the name of the author.




                                 PalGov © 2011                                  19
Practical Session
                                                                    Palestine
                                                                                              7.6K
                                        Said
                                                              Capital                  Name
                                                    CN1                     CA1                 Jerusalem
                        AU1
 BK1
                                               CU                Colombia University

          Author                          Viswanathan                                         14.0M
 BK2                    AU2
                                                                          India

                                                              Capital                  Name      New Delhi
          Wamadat                                       CN2                 CA2
                              Name
             Author                       Naima
 BK3                    AU3                                             Lebanon
                                                                                               2.0M
          The Prophet                Gibran
                                                        CN3                CA3
 BK4                                                                                             Beirut
            Author
                        AU4

This data graph is about books. It talks about four books (BK1-BK4).
Information recorded about a book includes data such as; its author,
affiliation, country of birth including its capital and the population of
its capital.


                                      PalGov © 2011                                                          20
Practical Session

• Each student should work alone.
• The student is encouraged to first answer each query mentally from the graph
  before writing and executing the SQL query, and to compare the results of the
  query execution with his/her expected results.
• The student is strongly recommended to write two additional queries, execute
  them on the data graph, and hand them along with the required queries.
• The student must highlight problems and challenges of executing queries on
  graph-shaped data and propose solutions to the problem.
• Each student must expect to present his/her queries at class and, more
  importantly, he/she must be ready to discuss the problems of querying graph-
  shaped data and his/her proposed solutions.
• The final delivery should include (i) a snapshot of the built table, (ii) the SQL
  queries, (iii) The results of the queries, and (iv) a one-page discussion of the
  problems of querying graph-shaped data and the proposed possible solutions.
  These must be handed in a report form in PDF Format.


                                    PalGov © 2011                                     21
Data Integration and Open Information Systems (Tutorial II)
                                             The Palestinian e-Government Academy
                                                                     January, 2012




Tutorial II: Data Integration and Open Information Systems

                  Part 3
  Complexities and Solutions of Querying
           Graph Shaped Data




                          PalGov © 2011                                             22
Module Outline

Part 1: Querying SPO Tables


Part 2: Querying SPO Tables using SQL


Part 3: Complexities and Solutions of Querying Graph
Shaped Data




                        PalGov © 2011                  23
Complexity of querying graph-shaped data

• Where does the complexity of querying graph-shaped
  data stored in a relational table lie?

• The complexity of querying a data graph lies in the
  many self-joins that are to be performed on the table.

• Accessing multiple properties for a resource requires
  subject-subject joins (Star-shaped queries).

• Path expressions require subject-object joins.

• In general, a query with n edges on an SPO table
  requires n-1 self-joins of that table.

                        PalGov © 2011                     24
Solution: Indexes and Dictionaries

• What solutions do you propose to optimized queries on graph-
  shaped data?

• Build Indexes.
   – On each column: S, P, O
   – On Permutations of two columns: SP, SO, PS, PO, …
   – On Permutation of three columns: SPO, SOP, PSO, POS,
     OSP, OPS, …

• Dictionary Encoding String Data?
   – Replacing all literals by ids.
   – Triple stores are compressed.
   – Allows for faster comparison and matching operations.

    RDF3X Solution
                          PalGov © 2011                          25
Solution: Store RDF in a Different Schema
Property Tables: Clustered Property Tables
             Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007.



            Clustered Property Table:




          • A clustered property table, contains clusters of
            properties that tend to be defined together.
          • Multiple property tables with different clusters of
            properties may be created to optimize queries.
          • A key requirement for this type of property table is
            that a particular property may only appear in at most
            one property table.
          • Advantage: Some queries can be answered directly
            from the property table with no joins.
              - Ex: retrieve the title of the book authored in 2001.

                           PalGov © 2011                                                             26
Solution: Store RDF in a Different Schema
Property Tables: Property-Class Tables
            Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007.


          Property-Class Tables:




         • A property-class table exploits the Type property of
           subjects to cluster similar sets of subjects together in
           the same table.
         • Unlike the first type of property table, a property may
           exist in multiple property-class tables.
         • This solution is found in Jena2.
         • Oracle adopts a property table-like data structure (they
           call it a “subject-property matrix”). However, they are
           not used for primary storage, but rather as an auxiliary
           data structure.
                          PalGov © 2011                                                             27
Solution: Store RDF in a Different Schema
Vertically Partitioned Approach
            Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007.



          Six Vertically Partitioned Tables:




         • The triples table is rewritten into n two column tables
           where n is the number of unique properties in the data.
         • In each of these tables, the first column contains the
           subjects that define that property and the second
           column contains the object values for those subjects.
         • This method is called Vertical Partitioning.
         • An experimental implementation of this approach was
           performed using an open source column-oriented
           database system (C-Store).

                          PalGov © 2011                                                             28
Discussion

• Advantages and Disadvantages of storing RDF in other
  schemas?



• What do you prefer?



• Any other suggestions to optimize queries over graph-
  shaped data?




                        PalGov © 2011                     29
optional
        MashQL
        (under development at Birzeit University, started at the univeristy of Cyprus)



• What about summarizing the data graph, and instead of querying
  the original data, we query the summary and get the same results?
• What about summarizing 15M SPO triples into less then 500,000?
• This is called Graph Signature Indexing. It is currently in research
  at Birzeit University.



 Initial comparison
 between GS performance
 and Oracle’s Semantic
 Technologies




                                     PalGov © 2011                                       30
optional
       MashQL
       (under development at Birzeit University, started at the univeristy of Cyprus)


• A graphical query formulation language for the Data Web
• A general structured-data retrieval language (not merely an
  interface)         RDF Input
                            From:
                               http:www.site1.com/rdf
                               http:www.site2.comrdf



                          MashQL

                          Article
                                    Title ArticleTitle
                                    Author “^Hacker”
                                    YearPubYear > 2000


Addresses all of the following challenges:
    The user does not know the schema.
     There is no offline or inline schemaontology.
     The query may involve multiple sources.
     The query language is expressive (not a single-purpose
      interface)
                                           PalGov © 2011                                31
optional
      MashQL Example
Celine Dion’s
Albums after
                                                           www.site1.com/rdf
    2003?
          RDF Input                                        :A1   :Title “Taking Chances”
                                                           :A1   :Artist “Dion C.”
           From:                                           :A1   :ProdYear 2007
              http://www.site1.com/rdf                     :A1   :Producer “Peer Astrom”
                                                           :A2   :Title “Monodose”
                http://www.site2.com/rdf                   :A2   :Artist “Ziad Rahbani”


          Query
                                                           www.site2.com/rdf

            Everything                                     :B1   rdf:Type bibo:Album
                                                           :B1   :Title “The prayer”
                    Artist “^Dion”                         :B1   :Artist “Celine Dion”
                                                           :B1   :Artist “Josh Groban”
                    YearProdYear > 2003                   :B1   :Year   2008
                    Title   AlbumTitle                    :B2   rdf:Type bibo:Album
                                                           :B2   :Title “Miracle”
                                                           :B2   :Author “Celine Dion”
                                                           :B2   :Year   2004


   Interactive Query Formulation.
   MashQL queries are translated into and executed as SPARQL.

                                           PalGov © 2011                                   32
optional
       MashQL Example
Celine Dion’s
Albums after
                                                           www.site1.com/rdf
    2003?
          RDF Input                                        :A1   :Title “Taking Chances”
                                                           :A1   :Artist “Dion C.”
           From:                                           :A1   :ProdYear 2007
              http://www.site1.com/rdf                     :A1   :Producer “Peer Astrom”
                                                           :A2   :Title “Monodose”
                http://www.site2.com/rdf                   :A2   :Artist “Ziad Rahbani”


          Query
                                                           www.site2.com/rdf

            Everything                                     :B1   rdf:Type bibo:Album
                                                           :B1   :Title “The prayer”
                                                           :B1   :Artist “Celine Dion”
                                                           :B1   :Artist “Josh Groban”
                                                           :B1   :Year   2008
                                                           :B2   rdf:Type bibo:Album
                                                           :B2   :Title “Miracle”
                                                           :B2   :Author “Celine Dion”
                                                           :B2   :Year   2004




                                           PalGov © 2011                                   33
optional
       MashQL Example
Celine Dion’s
Albums after
                                                                www.site1.com/rdf
    2003?
          RDF Input                                             :A1   :Title “Taking Chances”
                                                                :A1   :Artist “Dion C.”
           From:                                                :A1   :ProdYear 2007
              http://www.site1.com/rdf                          :A1   :Producer “Peer Astrom”
                                                                :A2   :Title “Monodose”
                 http://www.site2.com/rdf                       :A2   :Artist “Ziad Rahbani”


          Query
                                                                www.site2.com/rdf

            Everything                                          :B1   rdf:Type bibo:Album
                Everything                                      :B1   :Title “The prayer”
                Album                                           :B1   :Artist “Celine Dion”
                A1                                              :B1   :Artist “Josh Groban”
                A2                                              :B1   :Year   2008
                B1                                              :B2   rdf:Type bibo:Album
                Types Individuals                               :B2   :Title “Miracle”
                                                                :B2   :Author “Celine Dion”
                                                                :B2   :Year   2004

                              Background queries
                              SELECT X WHERE {?X ?P ?O}
                              Union
                              SELECT X WHERE {?S ?P ?X}

                              SELECT X WHERE {?S rdf:Type ?X}
                                             PalGov © 2011                                      34
optional
       MashQL Example
Celine Dion’s
Albums after
                                                              www.site1.com/rdf
    2003?
          RDF Input                                           :A1   :Title “Taking Chances”
                                                              :A1   :Artist “Dion C.”
           From:                                              :A1   :ProdYear 2007
              http://www.site1.com/rdf                        :A1   :Producer “Peer Astrom”
                                                              :A2   :Title “Monodose”
                http://www.site2.com/rdf                      :A2   :Artist “Ziad Rahbani”


          Query
                                                              www.site2.com/rdf

            Everything                                        :B1   rdf:Type bibo:Album
                                                              :B1   :Title “The prayer”
                      Artist           Contains Dion          :B1   :Artist “Celine Dion”
                       Artist
                       Author           Equals
                                        Equals                :B1   :Artist “Josh Groban”
                                        Contains
                                        Contains
                       Title            OneOf                 :B1   :Year   2008
                       ProdYear         Not                   :B2   rdf:Type bibo:Album
                       Type
                                        Between               :B2   :Title “Miracle”
                                        LessThan
                       Year             MoreThan              :B2   :Author “Celine Dion”
                                                              :B2   :Year   2004



                        Background query
                         SELECT P WHERE {?Everything ?P ?O}


                                              PalGov © 2011                                   35
optional
       MashQL Example
Celine Dion’s
Albums after
                                                                    www.site1.com/rdf
    2003?
          RDF Input                                                 :A1   :Title “Taking Chances”
                                                                    :A1   :Artist “Dion C.”
           From:                                                    :A1   :ProdYear 2007
              http://www.site1.com/rdf                              :A1   :Producer “Peer Astrom”
                                                                    :A2   :Title “Monodose”
                http://www.site2.com/rdf                            :A2   :Artist “Ziad Rahbani”


          Query
                                                                    www.site2.com/rdf

            Everything                                              :B1   rdf:Type bibo:Album
                                                                    :B1   :Title “The prayer”
                    Artist “^Dion                                   :B1   :Artist “Celine Dion”
                                                                    :B1   :Artist “Josh Groban”
                     Year ProdYear        MoreTh     2003          :B1   :Year   2008
                      Artist               Equals
                                           Equals
                      Author
                                           Contains                 :B2   rdf:Type bibo:Album
                      Title                OneOf                    :B2   :Title “Miracle”
                      ProdYear             Not
                                           Between
                                                                    :B2   :Author “Celine Dion”
                      Type                 LessThan                 :B2   :Year   2004
                      Year                 MoreThan
                                           MoreThan




                        Background query
                         SELECT P WHERE {?Everything ?P ?O}


                                                    PalGov © 2011                                   36
optional
       MashQL Example
Celine Dion’s
Albums after
                                                           www.site1.com/rdf
    2003?
          RDF Input                                        :A1   :Title “Taking Chances”
                                                           :A1   :Artist “Dion C.”
           From:                                           :A1   :ProdYear 2007
              http://www.site1.com/rdf                     :A1   :Producer “Peer Astrom”
                                                           :A2   :Title “Monodose”
                http://www.site2.com/rdf                   :A2   :Artist “Ziad Rahbani”


          Query
                                                           www.site2.com/rdf

            Everything                                     :B1   rdf:Type bibo:Album
                                                           :B1   :Title “The prayer”
                    Artist “^Dion                          :B1   :Artist “Celine Dion”
                                                           :B1   :Artist “Josh Groban”
                    YearProdYear > 2003                   :B1   :Year   2008
                                                           :B2   rdf:Type bibo:Album
                      Title                AlbumTitle      :B2   :Title “Miracle”
                       Artist
                       Author                              :B2   :Author “Celine Dion”
                       Title
                       Title                               :B2   :Year   2004
                       ProdYear
                       Type
                       Year

                        Background query
                         SELECT P WHERE {?Everything ?P ?O}


                                           PalGov © 2011                                   37
optional
       MashQL Example
Celine Dion’s
Albums after
    2003?
          RDF Input
           From:
              http://www.site1.com/rdf
                http://www.site2.com/rdf



          Query

            Everything
                    Artist “^Dion”
                    YearProdYear > 2003
                    Title   AlbumTitle
                                                     PREFIX S1: <http://site1.com/rdf>
                                                     PREFIX S2: <http://site1.com/rdf>
                                                     SELECT ?AlbumTitle
                                                     FROM <http://site1.com/rdf>
                                                     FROM <http://site2.com/rdf>
                                                     WHERE {
                                                      {?X S1:Artist ?X1} UNION {?X S2:Artist ?X1}
                                                      {?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2}
                                                      {{?X S1:Title ?AlbumTitle} UNION {?X S2:Title
                                                     ?AlbumTitle}}
                                                      FILTER regex(?X1, “^Dion”)
                                                      FILTER (?X2 > 2003)}
                                           PalGov © 2011                                              38
optional
       MashQL Query Optimization


•   The major challenge in MashQL is interactivity!
•   User interaction must be within 100ms.
•   However, querying RDF is typically slow.
•   How about dealing with huge datasets!?
•   Implemented Solutions (i.e., Oracle) proved to perform
    weakly specially in MashQL background queries.


       A query optimization solution was
       developed. It is called Graph Signature
       Indexing.


                          PalGov © 2011                      39
optional
     MashQL Query Optimization: Graph Signature


‘Graph Signature’ optimizes RDF queries by :

• Summarizing the RDF dataset
  (Algorithms were developed)


• Then, executing the queries on the summary
  instead of the original dataset
  (A new execution plan was developed)




                           PalGov © 2011             40
optional
       Experiments

 • Experiments performed on 15M SPO triples (rows) of the Yago
   Dataset.
 • Graph Signature Index of less than 500K was built on the data.
 • Two sets of queries were performed, and the results were as follows:

 Extreme-case linear queries of depth 1-5:
      Query      A1       A2        A3       A4           A5
       GS       0.344    1.953     5.250   10.234       24.672
      Oracle   110.390 302.672 525.844 702.969          >20min

 Queries that cover all MashQL Background queries some queries
 that might occur as the final queries generated in MashQL:
Query    Q1    Q2     Q3     Q4     Q5     Q6     Q7     Q8     Q9
 GS     0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469
Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703


                               PalGov © 2011                              41
References

• Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query
  Optimization for the Data Web - Two Disk-Based algorithms: Trace
  Equivalence and Bisimilarity.

• Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation
  Language for the Data Web. IEEE Transactions on Knowledge and
  Data Engineering.IEEE Computer Society.

• Abadi et al, Scalable Semantic Web Data Management Using Vertical
  Partitioning. In VLDB 2007.




                             PalGov © 2011                             42
Thank you!




   PalGov © 2011   43

Mais conteúdo relacionado

Mais procurados

Pal gov.tutorial2.session16.lab rd-fa
Pal gov.tutorial2.session16.lab rd-faPal gov.tutorial2.session16.lab rd-fa
Pal gov.tutorial2.session16.lab rd-faMustafa Jarrar
 
Pal gov.tutorial2.session1.xml basics and namespaces
Pal gov.tutorial2.session1.xml basics and namespacesPal gov.tutorial2.session1.xml basics and namespaces
Pal gov.tutorial2.session1.xml basics and namespacesMustafa Jarrar
 
Pal gov.tutorial2.session5 2.rdfs_jarrar
Pal gov.tutorial2.session5 2.rdfs_jarrarPal gov.tutorial2.session5 2.rdfs_jarrar
Pal gov.tutorial2.session5 2.rdfs_jarrarMustafa Jarrar
 
Pal gov.tutorial2.session13 2.gav and lav integration
Pal gov.tutorial2.session13 2.gav and lav integrationPal gov.tutorial2.session13 2.gav and lav integration
Pal gov.tutorial2.session13 2.gav and lav integrationMustafa Jarrar
 
Pal gov.tutorial2.session8.lab owl
Pal gov.tutorial2.session8.lab owlPal gov.tutorial2.session8.lab owl
Pal gov.tutorial2.session8.lab owlMustafa Jarrar
 
Pal gov.tutorial2.session2.xml dtd's
Pal gov.tutorial2.session2.xml dtd'sPal gov.tutorial2.session2.xml dtd's
Pal gov.tutorial2.session2.xml dtd'sMustafa Jarrar
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarMustafa Jarrar
 
Pal gov.tutorial2.session12 1.the problem of data integration
Pal gov.tutorial2.session12 1.the problem of data integrationPal gov.tutorial2.session12 1.the problem of data integration
Pal gov.tutorial2.session12 1.the problem of data integrationMustafa Jarrar
 
Pal gov.tutorial2.session4.lab xml document and schemas
Pal gov.tutorial2.session4.lab xml  document and schemasPal gov.tutorial2.session4.lab xml  document and schemas
Pal gov.tutorial2.session4.lab xml document and schemasMustafa Jarrar
 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataMustafa Jarrar
 
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issuesPal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issuesMustafa Jarrar
 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationMustafa Jarrar
 
Pal gov.tutorial2.session0.outline
Pal gov.tutorial2.session0.outlinePal gov.tutorial2.session0.outline
Pal gov.tutorial2.session0.outlineMustafa Jarrar
 
Pal gov.tutorial3.session3.xpath & xquery (lab1)
Pal gov.tutorial3.session3.xpath & xquery (lab1)Pal gov.tutorial3.session3.xpath & xquery (lab1)
Pal gov.tutorial3.session3.xpath & xquery (lab1)Mustafa Jarrar
 
Pal gov.tutorial3.session2.xml ns and schema
Pal gov.tutorial3.session2.xml ns and schemaPal gov.tutorial3.session2.xml ns and schema
Pal gov.tutorial3.session2.xml ns and schemaMustafa Jarrar
 
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...François Belleau
 
LODUM talk at ifgi's Spatial @ WWU series
LODUM talk at ifgi's Spatial @ WWU seriesLODUM talk at ifgi's Spatial @ WWU series
LODUM talk at ifgi's Spatial @ WWU seriesCarsten Keßler
 
C++ plus data structures, 3rd edition (2003)
C++ plus data structures, 3rd edition (2003)C++ plus data structures, 3rd edition (2003)
C++ plus data structures, 3rd edition (2003)SHC
 

Mais procurados (20)

Pal gov.tutorial2.session16.lab rd-fa
Pal gov.tutorial2.session16.lab rd-faPal gov.tutorial2.session16.lab rd-fa
Pal gov.tutorial2.session16.lab rd-fa
 
Pal gov.tutorial2.session1.xml basics and namespaces
Pal gov.tutorial2.session1.xml basics and namespacesPal gov.tutorial2.session1.xml basics and namespaces
Pal gov.tutorial2.session1.xml basics and namespaces
 
Pal gov.tutorial2.session5 2.rdfs_jarrar
Pal gov.tutorial2.session5 2.rdfs_jarrarPal gov.tutorial2.session5 2.rdfs_jarrar
Pal gov.tutorial2.session5 2.rdfs_jarrar
 
Pal gov.tutorial2.session13 2.gav and lav integration
Pal gov.tutorial2.session13 2.gav and lav integrationPal gov.tutorial2.session13 2.gav and lav integration
Pal gov.tutorial2.session13 2.gav and lav integration
 
Pal gov.tutorial2.session8.lab owl
Pal gov.tutorial2.session8.lab owlPal gov.tutorial2.session8.lab owl
Pal gov.tutorial2.session8.lab owl
 
Pal gov.tutorial2.session2.xml dtd's
Pal gov.tutorial2.session2.xml dtd'sPal gov.tutorial2.session2.xml dtd's
Pal gov.tutorial2.session2.xml dtd's
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrar
 
Pal gov.tutorial2.session12 1.the problem of data integration
Pal gov.tutorial2.session12 1.the problem of data integrationPal gov.tutorial2.session12 1.the problem of data integration
Pal gov.tutorial2.session12 1.the problem of data integration
 
Pal gov.tutorial2.session4.lab xml document and schemas
Pal gov.tutorial2.session4.lab xml  document and schemasPal gov.tutorial2.session4.lab xml  document and schemas
Pal gov.tutorial2.session4.lab xml document and schemas
 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
 
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issuesPal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
 
Pal gov.tutorial2.session0.outline
Pal gov.tutorial2.session0.outlinePal gov.tutorial2.session0.outline
Pal gov.tutorial2.session0.outline
 
Pal gov.tutorial3.session3.xpath & xquery (lab1)
Pal gov.tutorial3.session3.xpath & xquery (lab1)Pal gov.tutorial3.session3.xpath & xquery (lab1)
Pal gov.tutorial3.session3.xpath & xquery (lab1)
 
Pal gov.tutorial3.session2.xml ns and schema
Pal gov.tutorial3.session2.xml ns and schemaPal gov.tutorial3.session2.xml ns and schema
Pal gov.tutorial3.session2.xml ns and schema
 
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
 
LODUM talk at ifgi's Spatial @ WWU series
LODUM talk at ifgi's Spatial @ WWU seriesLODUM talk at ifgi's Spatial @ WWU series
LODUM talk at ifgi's Spatial @ WWU series
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
C++ plus data structures, 3rd edition (2003)
C++ plus data structures, 3rd edition (2003)C++ plus data structures, 3rd edition (2003)
C++ plus data structures, 3rd edition (2003)
 
Fact forge aimsa2012
Fact forge aimsa2012Fact forge aimsa2012
Fact forge aimsa2012
 

Semelhante a Pal gov.tutorial2.session9.lab rdf-stores

Pal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemanticsPal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemanticsMustafa Jarrar
 
Pal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemanticsPal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemanticsMustafa Jarrar
 
Pal gov.tutorial3.session0.outline
Pal gov.tutorial3.session0.outlinePal gov.tutorial3.session0.outline
Pal gov.tutorial3.session0.outlineMustafa Jarrar
 
Pal gov.tutorial3.session6.soap
Pal gov.tutorial3.session6.soapPal gov.tutorial3.session6.soap
Pal gov.tutorial3.session6.soapMustafa Jarrar
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationJulien PLU
 
Pal gov.tutorial3.session12.lab5
Pal gov.tutorial3.session12.lab5Pal gov.tutorial3.session12.lab5
Pal gov.tutorial3.session12.lab5Mustafa Jarrar
 
Pal gov.tutorial3.session5.lab2
Pal gov.tutorial3.session5.lab2Pal gov.tutorial3.session5.lab2
Pal gov.tutorial3.session5.lab2Mustafa Jarrar
 
Pal gov.tutorial3.session14.lab6
Pal gov.tutorial3.session14.lab6Pal gov.tutorial3.session14.lab6
Pal gov.tutorial3.session14.lab6Mustafa Jarrar
 
Pal gov.tutorial4.outline
Pal gov.tutorial4.outlinePal gov.tutorial4.outline
Pal gov.tutorial4.outlineMustafa Jarrar
 
Pal gov.tutorial4.session11.lab zinnarontologybasedwebservices
Pal gov.tutorial4.session11.lab zinnarontologybasedwebservicesPal gov.tutorial4.session11.lab zinnarontologybasedwebservices
Pal gov.tutorial4.session11.lab zinnarontologybasedwebservicesMustafa Jarrar
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFOpenLink Software
 
Opening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataOpening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataGilbert Paquette
 
Pal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.restPal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.restMustafa Jarrar
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 

Semelhante a Pal gov.tutorial2.session9.lab rdf-stores (14)

Pal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemanticsPal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemantics
 
Pal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemanticsPal gov.tutorial4.session1 1.needforsharedsemantics
Pal gov.tutorial4.session1 1.needforsharedsemantics
 
Pal gov.tutorial3.session0.outline
Pal gov.tutorial3.session0.outlinePal gov.tutorial3.session0.outline
Pal gov.tutorial3.session0.outline
 
Pal gov.tutorial3.session6.soap
Pal gov.tutorial3.session6.soapPal gov.tutorial3.session6.soap
Pal gov.tutorial3.session6.soap
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting Information
 
Pal gov.tutorial3.session12.lab5
Pal gov.tutorial3.session12.lab5Pal gov.tutorial3.session12.lab5
Pal gov.tutorial3.session12.lab5
 
Pal gov.tutorial3.session5.lab2
Pal gov.tutorial3.session5.lab2Pal gov.tutorial3.session5.lab2
Pal gov.tutorial3.session5.lab2
 
Pal gov.tutorial3.session14.lab6
Pal gov.tutorial3.session14.lab6Pal gov.tutorial3.session14.lab6
Pal gov.tutorial3.session14.lab6
 
Pal gov.tutorial4.outline
Pal gov.tutorial4.outlinePal gov.tutorial4.outline
Pal gov.tutorial4.outline
 
Pal gov.tutorial4.session11.lab zinnarontologybasedwebservices
Pal gov.tutorial4.session11.lab zinnarontologybasedwebservicesPal gov.tutorial4.session11.lab zinnarontologybasedwebservices
Pal gov.tutorial4.session11.lab zinnarontologybasedwebservices
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Opening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked dataOpening up MOOCs for OER management on the Web of linked data
Opening up MOOCs for OER management on the Web of linked data
 
Pal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.restPal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.rest
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 

Mais de Mustafa Jarrar

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisMustafa Jarrar
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal OntologyMustafa Jarrar
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course OutlineMustafa Jarrar
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process ImplementationMustafa Jarrar
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineeringMustafa Jarrar
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsMustafa Jarrar
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs Mustafa Jarrar
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process ManagementMustafa Jarrar
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology Mustafa Jarrar
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesMustafa Jarrar
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORMMustafa Jarrar
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineMustafa Jarrar
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesMustafa Jarrar
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalMustafa Jarrar
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsMustafa Jarrar
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingMustafa Jarrar
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsMustafa Jarrar
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Mustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 

Mais de Mustafa Jarrar (20)

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment Analysis
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal Ontology
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course Outline
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process Implementation
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineering
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical Constructs
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process Management
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion Rules
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORM
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in Palestine
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online Courses
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 Calls
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 

Último

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 

Último (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 

Pal gov.tutorial2.session9.lab rdf-stores

  • 1. ‫أكاديمية الحكومة اإللكترونية الفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.ps Tutorial II: Data Integration and Open Information Systems Session 9 RDF Stores: Challenges and Solutions Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info PalGov © 2011 1
  • 2. About This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the Commission of the European Communities, grant agreement 511159-TEMPUS-1- 2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps Project Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, Palestine Coordinator: Dr. Mustafa Jarrar Birzeit University, P.O.Box 14- Birzeit, Palestine Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  • 3. © Copyright Notes Everyone is encouraged to use this material, or part of it, but should properly cite the project (logo and website), and the author of that part. No part of this tutorial may be reproduced or modified in any form or by any means, without prior written permission from the project, who have the full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SA This license lets others remix, tweak, and build upon your work non- commercially, as long as they credit you and license their new creations under the identical terms. PalGov © 2011 3
  • 4. Tutorial Map Topic h Intended Learning Objectives Session 1: XML Basics and Namespaces 3 A: Knowledge and Understanding Session 2: XML DTD’s 3 2a1: Describe tree and graph data models. Session 3: XML Schemas 3 2a2: Understand the notation of XML, RDF, RDFS, and OWL. Session 4: Lab-XML Schemas 3 2a3: Demonstrate knowledge about querying techniques for data models as SPARQL and XPath. Session 5: RDF and RDFs 3 2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3 2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3 heterogeneous data. Session 8: Lab-OWL 3 B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3 2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3 RDF). Session 11: Lab-Oracle Semantic Technology 3 2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5 2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5 2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1 C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1 2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1 and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3 D: General and Transferable Skills 2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5 2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5 2d3: Use of creativity and innovation in problem solving. 2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3 PalGov © 2011 4
  • 5. Module ILOs After completing this module students will be able to: - Demonstrate knowledge about querying techniques for the graph data model. - Manage and query data represented in RDF. PalGov © 2011 5
  • 6. Data Integration and Open Information Systems (Tutorial II) The Palestinian e-Government Academy January, 2012 Tutorial II: Data Integration and Open Information Systems Part1: Querying SPO Tables PalGov © 2011 6
  • 7. Module Outline Part I: Querying SPO Tables Part 2: Querying SPO Tables using SQL Part 3: Complexities and Solutions of Querying Graph Shaped Data PalGov © 2011 7
  • 8. RDF as a Data Model • RDF (Resource Description Framework) is the Data Model behind the Semantic/Data Web. • RDF is a graph-based data model. • In RDF, Data is represented as triples: <Subject, Predicate, Object>, or in short: <S,P,O>. P O S PalGov © 2011 8
  • 9. RDF as a Data Model • RDF’s way of representing Data as triples is more elementary than Databases or XML. – This enables easy data integration and interoperability between systems (as will be demonstrated later). • EXAMPLE: the fact that the movie Sicko (2007) is directed by Michael Moore from the USA can be represented by the following triples which form a directed labeled graph: < M1, name, Sicko > 2007 < M1, year, 2007 > < M1, directedBy, D1 > Sicko < D1, name, Michael Moore > M1 < D1, country, C1 > name Michael D1 < C1, name, USA> Moore PalGov © 2011 9
  • 10. Persistent Storage of RDF Graphs S P O • Typically, an RDF Graph is stored as M1 year 2007 one <S,P,O> Table in a Relational M1 M1 Name directedBy Sicko D1 Database Management System. M2 directedBy D1 M2 Year 2009 2007 Sicko M2 Name Capitalism Michael Moore M3 Year 1995 directedBy M1 D1 Emmy Awards M3 directedBy D2 P1 1995 M3 Name Brave Heart M2 2009 M4 Year 2007 Capitalism USA M4 directedBy D3 C1 1995 M4 Name Caramel Washington DC directedBy D1 Name Michael Moore M3 D2 actedIn P2 Oscars D1 hasWonPrizeIn P1 Brave Heart Mel Gibson D1 Country C1 1996 2007 D2 Counrty C1 Nadine Labaki directedBy D2 hasWonPrizeIn P2 M4 D3 C2 Lebanon actedIn D2 Name Mel Gibson Beirut D2 actedIn M3 Caramel location name P3 Stockholm Festival D3 Name Nadine Labaki C3 Sweden 2007 D3 Country C2 Stockholm D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 10
  • 11. How to Query RDF data stored in one table? S P O • How do we query graph-shaped M1 M1 year Name 2007 Sicko data stored in one relational table? M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism • How do we traverse a graph stored M3 Year 1995 M3 directedBy D2 in one relational table? M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel  Self Joins! D1 D1 Name hasWonPrizeIn Michael Moore P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 11
  • 12. How to Query RDF data stored in one table? S P O 2007 M1 year 2007 Sicko Michael Moore M1 Name Sicko directedBy M1 D1 Emmy Awards M1 directedBy D1 M2 directedBy D1 P1 1995 M2 2009 M2 Year 2009 M2 Name Capitalism Capitalism C1 USA M3 Year 1995 1995 M3 directedBy D2 Washington DC directedBy M3 Name Brave Heart M3 D2 actedIn P2 M4 Year 2007 Oscars M4 directedBy D3 Brave Heart Mel Gibson 1996 M4 Name Caramel 2007 Nadine Labaki D1 Name Michael Moore directedBy M4 D3 C2 Lebanon D1 hasWonPrizeIn P1 actedIn … … … Beirut Caramel location name P3 Stockholm Festival C3 Exercises: Sweden 2007 (1) What is the name of director D3? Stockholm (2) What is the name of the director of the movie M1? (3) List all the movies who have directors from the USA and their directors. (4) List all the names of the directors from Lebanon who have won prizes and the prizes they have won. PalGov © 2011 12
  • 13. How to Query RDF data stored in one table? S P O • Exercise 1: A simple query. M1 M1 year Name 2007 Sicko – What is the name of director D3? M1 directedBy D1 M2 directedBy D1 – Consider the table MoviesTable (s,p,o). M2 Year 2009 – SQL: M2 Name Capitalism M3 Year 1995 M3 directedBy D2 Select o M3 Name Brave Heart From MoviesTable T M4 Year 2007 M4 directedBy D3 Where T.s = „D3‟ AND T.p = „Name‟; M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 Ans: Nadine Labaki 13
  • 14. How to Query RDF data stored in one table? S P O • Exercise 2: A path query with 1 join. M1 M1 year Name 2007 Sicko – What is the name of the director of the M1 directedBy D1 M2 directedBy D1 movie M1. M2 Year 2009 – Consider the table MoviesTable (s,p,o). M2 Name Capitalism M3 Year 1995 – SQL: M3 directedBy D2 M3 Name Brave Heart Select T2.o M4 Year 2007 M4 directedBy D3 From MoviesTable T1, MoviesTable T2 M4 Name Caramel Where {T1.s = „M1‟ AND D1 Name Michael Moore D1 hasWonPrizeIn P1 T1.p = „directedBy‟ AND D1 Country C1 D2 Counrty C1 T1.o = T2.s AND D2 hasWonPrizeIn P2 T2.p = „name‟}; D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 Ans: Michael Moore 14
  • 15. How to Query RDF data stored in one table? S P O • Exercise 3: A path query with 2 M1 M1 year Name 2007 Sicko joins. M1 directedBy D1 M2 directedBy D1 – List all the movies who have directors from M2 Year 2009 the USA and their directors. M2 Name Capitalism M3 Year 1995 – Consider the table MoviesTable(s,p,o). M3 directedBy D2 – SQL: M3 Name Brave Heart … … … D1 Name Michael Moore Select T1.s, T1.o D1 hasWonPrizeIn P1 From MoviesTable T1, MoviesTable T2, D1 Country C1 D2 Counrty C1 MoviesTable T3 D2 hasWonPrizeIn P2 D2 Name Mel Gibson Where {T1.o = T2.s AND D2 actedIn M3 T2.o = T3.s AND … … … C1 Name USA T1.p = „directedBy‟ AND C1 Capital Washington DC T2.p = „country‟ AND C2 Name Lebanon C2 Capital Beirut T3.p = „name‟ AND … … … T3.o = „USA‟}; Ans: M1 D1; M2 D1; M3 D2 PalGov © 2011 15
  • 16. How to Query RDF data stored in one table? • Exercise 4: Star query. – List all the names of the directors from Lebanon who have won prizes and the prizes they have won. S P O – Consider the table MoviesTable (S,P,O). D1 Name Michael Moore D1 hasWonPrizeIn P1 Select T1.o, T4.o D1 Country C1 From MoviesTable T1, MoviesTable T2, D2 Counrty C1 D2 hasWonPrizeIn P2 MoviesTable T3, MoviesTable T4 D2 Name Mel Gibson Where {T1.p = „name‟ AND D2 actedIn M3 D3 Name Nadine Labaki T2.s = T1.s AND D3 Country C2 T2.p = „country‟ AND D3 hasWonPrizeIn P3 D3 actedIn M4 T2.o = T3.s AND … … … T3.p = „name‟ AND C1 Name USA C1 Capital Washington DC T3.o = „Lebanon‟ AND C2 Name Lebanon T4.s = T1.s AND C2 Capital Beirut … … … T4.p = „hasWonPrizeIn‟}; Ans: ‘Nadine Labaki’ , P3 PalGov © 2011 16
  • 17. Data Integration and Open Information Systems (Tutorial II) The Palestinian e-Government Academy January, 2012 Tutorial II: Data Integration and Open Information Systems Part 2: Querying SPO Tables using SQL PalGov © 2011 17
  • 18. Module Outline Part I: Querying SPO Tables Part 2: Querying SPO Tables using SQL Part 3: Complexities and Solutions of Querying Graph Shaped Data PalGov © 2011 18
  • 19. Practical Session • Given the RDF graph in the next slide, do the following: (1) Insert the data graph as <S,P,O> triples in a 3-column table in any DBMS. Schema of the table: Books (Subject, Predicate, Object) (2) Write the following queries in SQL and execute them over the newly created table: • List all the authors born in a country which has the name Palestine. • List the names of all authors with the name of their affiliation who are born in a country whose capital’s population is14M. Note that the author must have an affiliation. • List the names of all books whose authors are born in Lebanon along with the name of the author. PalGov © 2011 19
  • 20. Practical Session Palestine 7.6K Said Capital Name CN1 CA1 Jerusalem AU1 BK1 CU Colombia University Author Viswanathan 14.0M BK2 AU2 India Capital Name New Delhi Wamadat CN2 CA2 Name Author Naima BK3 AU3 Lebanon 2.0M The Prophet Gibran CN3 CA3 BK4 Beirut Author AU4 This data graph is about books. It talks about four books (BK1-BK4). Information recorded about a book includes data such as; its author, affiliation, country of birth including its capital and the population of its capital. PalGov © 2011 20
  • 21. Practical Session • Each student should work alone. • The student is encouraged to first answer each query mentally from the graph before writing and executing the SQL query, and to compare the results of the query execution with his/her expected results. • The student is strongly recommended to write two additional queries, execute them on the data graph, and hand them along with the required queries. • The student must highlight problems and challenges of executing queries on graph-shaped data and propose solutions to the problem. • Each student must expect to present his/her queries at class and, more importantly, he/she must be ready to discuss the problems of querying graph- shaped data and his/her proposed solutions. • The final delivery should include (i) a snapshot of the built table, (ii) the SQL queries, (iii) The results of the queries, and (iv) a one-page discussion of the problems of querying graph-shaped data and the proposed possible solutions. These must be handed in a report form in PDF Format. PalGov © 2011 21
  • 22. Data Integration and Open Information Systems (Tutorial II) The Palestinian e-Government Academy January, 2012 Tutorial II: Data Integration and Open Information Systems Part 3 Complexities and Solutions of Querying Graph Shaped Data PalGov © 2011 22
  • 23. Module Outline Part 1: Querying SPO Tables Part 2: Querying SPO Tables using SQL Part 3: Complexities and Solutions of Querying Graph Shaped Data PalGov © 2011 23
  • 24. Complexity of querying graph-shaped data • Where does the complexity of querying graph-shaped data stored in a relational table lie? • The complexity of querying a data graph lies in the many self-joins that are to be performed on the table. • Accessing multiple properties for a resource requires subject-subject joins (Star-shaped queries). • Path expressions require subject-object joins. • In general, a query with n edges on an SPO table requires n-1 self-joins of that table. PalGov © 2011 24
  • 25. Solution: Indexes and Dictionaries • What solutions do you propose to optimized queries on graph- shaped data? • Build Indexes. – On each column: S, P, O – On Permutations of two columns: SP, SO, PS, PO, … – On Permutation of three columns: SPO, SOP, PSO, POS, OSP, OPS, … • Dictionary Encoding String Data? – Replacing all literals by ids. – Triple stores are compressed. – Allows for faster comparison and matching operations.  RDF3X Solution PalGov © 2011 25
  • 26. Solution: Store RDF in a Different Schema Property Tables: Clustered Property Tables Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. Clustered Property Table: • A clustered property table, contains clusters of properties that tend to be defined together. • Multiple property tables with different clusters of properties may be created to optimize queries. • A key requirement for this type of property table is that a particular property may only appear in at most one property table. • Advantage: Some queries can be answered directly from the property table with no joins. - Ex: retrieve the title of the book authored in 2001. PalGov © 2011 26
  • 27. Solution: Store RDF in a Different Schema Property Tables: Property-Class Tables Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. Property-Class Tables: • A property-class table exploits the Type property of subjects to cluster similar sets of subjects together in the same table. • Unlike the first type of property table, a property may exist in multiple property-class tables. • This solution is found in Jena2. • Oracle adopts a property table-like data structure (they call it a “subject-property matrix”). However, they are not used for primary storage, but rather as an auxiliary data structure. PalGov © 2011 27
  • 28. Solution: Store RDF in a Different Schema Vertically Partitioned Approach Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. Six Vertically Partitioned Tables: • The triples table is rewritten into n two column tables where n is the number of unique properties in the data. • In each of these tables, the first column contains the subjects that define that property and the second column contains the object values for those subjects. • This method is called Vertical Partitioning. • An experimental implementation of this approach was performed using an open source column-oriented database system (C-Store). PalGov © 2011 28
  • 29. Discussion • Advantages and Disadvantages of storing RDF in other schemas? • What do you prefer? • Any other suggestions to optimize queries over graph- shaped data? PalGov © 2011 29
  • 30. optional MashQL (under development at Birzeit University, started at the univeristy of Cyprus) • What about summarizing the data graph, and instead of querying the original data, we query the summary and get the same results? • What about summarizing 15M SPO triples into less then 500,000? • This is called Graph Signature Indexing. It is currently in research at Birzeit University. Initial comparison between GS performance and Oracle’s Semantic Technologies PalGov © 2011 30
  • 31. optional MashQL (under development at Birzeit University, started at the univeristy of Cyprus) • A graphical query formulation language for the Data Web • A general structured-data retrieval language (not merely an interface) RDF Input From: http:www.site1.com/rdf http:www.site2.comrdf MashQL Article Title ArticleTitle Author “^Hacker” YearPubYear > 2000 Addresses all of the following challenges:  The user does not know the schema.  There is no offline or inline schemaontology.  The query may involve multiple sources.  The query language is expressive (not a single-purpose interface) PalGov © 2011 31
  • 32. optional MashQL Example Celine Dion’s Albums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist “^Dion” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” YearProdYear > 2003 :B1 :Year 2008 Title AlbumTitle :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004  Interactive Query Formulation.  MashQL queries are translated into and executed as SPARQL. PalGov © 2011 32
  • 33. optional MashQL Example Celine Dion’s Albums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 PalGov © 2011 33
  • 34. optional MashQL Example Celine Dion’s Albums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album Everything :B1 :Title “The prayer” Album :B1 :Artist “Celine Dion” A1 :B1 :Artist “Josh Groban” A2 :B1 :Year 2008 B1 :B2 rdf:Type bibo:Album Types Individuals :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 Background queries SELECT X WHERE {?X ?P ?O} Union SELECT X WHERE {?S ?P ?X} SELECT X WHERE {?S rdf:Type ?X} PalGov © 2011 34
  • 35. optional MashQL Example Celine Dion’s Albums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist Contains Dion :B1 :Artist “Celine Dion” Artist Author Equals Equals :B1 :Artist “Josh Groban” Contains Contains Title OneOf :B1 :Year 2008 ProdYear Not :B2 rdf:Type bibo:Album Type Between :B2 :Title “Miracle” LessThan Year MoreThan :B2 :Author “Celine Dion” :B2 :Year 2004 Background query SELECT P WHERE {?Everything ?P ?O} PalGov © 2011 35
  • 36. optional MashQL Example Celine Dion’s Albums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist “^Dion :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” Year ProdYear MoreTh 2003 :B1 :Year 2008 Artist Equals Equals Author Contains :B2 rdf:Type bibo:Album Title OneOf :B2 :Title “Miracle” ProdYear Not Between :B2 :Author “Celine Dion” Type LessThan :B2 :Year 2004 Year MoreThan MoreThan Background query SELECT P WHERE {?Everything ?P ?O} PalGov © 2011 36
  • 37. optional MashQL Example Celine Dion’s Albums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist “^Dion :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” YearProdYear > 2003 :B1 :Year 2008 :B2 rdf:Type bibo:Album Title AlbumTitle :B2 :Title “Miracle” Artist Author :B2 :Author “Celine Dion” Title Title :B2 :Year 2004 ProdYear Type Year Background query SELECT P WHERE {?Everything ?P ?O} PalGov © 2011 37
  • 38. optional MashQL Example Celine Dion’s Albums after 2003? RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf Query Everything Artist “^Dion” YearProdYear > 2003 Title AlbumTitle PREFIX S1: <http://site1.com/rdf> PREFIX S2: <http://site1.com/rdf> SELECT ?AlbumTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {?X S1:Artist ?X1} UNION {?X S2:Artist ?X1} {?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2} {{?X S1:Title ?AlbumTitle} UNION {?X S2:Title ?AlbumTitle}} FILTER regex(?X1, “^Dion”) FILTER (?X2 > 2003)} PalGov © 2011 38
  • 39. optional MashQL Query Optimization • The major challenge in MashQL is interactivity! • User interaction must be within 100ms. • However, querying RDF is typically slow. • How about dealing with huge datasets!? • Implemented Solutions (i.e., Oracle) proved to perform weakly specially in MashQL background queries. A query optimization solution was developed. It is called Graph Signature Indexing. PalGov © 2011 39
  • 40. optional MashQL Query Optimization: Graph Signature ‘Graph Signature’ optimizes RDF queries by : • Summarizing the RDF dataset (Algorithms were developed) • Then, executing the queries on the summary instead of the original dataset (A new execution plan was developed) PalGov © 2011 40
  • 41. optional Experiments • Experiments performed on 15M SPO triples (rows) of the Yago Dataset. • Graph Signature Index of less than 500K was built on the data. • Two sets of queries were performed, and the results were as follows: Extreme-case linear queries of depth 1-5: Query A1 A2 A3 A4 A5 GS 0.344 1.953 5.250 10.234 24.672 Oracle 110.390 302.672 525.844 702.969 >20min Queries that cover all MashQL Background queries some queries that might occur as the final queries generated in MashQL: Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 GS 0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469 Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703 PalGov © 2011 41
  • 42. References • Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query Optimization for the Data Web - Two Disk-Based algorithms: Trace Equivalence and Bisimilarity. • Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation Language for the Data Web. IEEE Transactions on Knowledge and Data Engineering.IEEE Computer Society. • Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. PalGov © 2011 42
  • 43. Thank you! PalGov © 2011 43