SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
LDOW, WWW’09
                                                                                          April 20, 2009,
                                                                                          Madrid, Spain
                                                                                          M d id S i




                                      MashQL
                 A Data Mashup Language for the
                           Data Web

                                 Dr. Mustafa Jarrar
                                      mjarrar@cs.ucy.ac.cy
                                  HPCLab, University of Cyprus



Published As:
Mustafa Jarrar and Marios D. Dikaiakos: A Data Mashup Language for the Data Web. Proceedings of
LDOW, part of WWW’09. ACM. 2009.
                                      University of Cyprus © 2009
Motivation (Querying the Data Web)

   I want you to taste                 But how?
     the Data Web?
                         Can I eat it as eating Web Feeds?

 Can we query and mash up the Data
        qy                p
    Web as simple as filtering and
         piping Web Feeds?


 We fundamentally investigate this problem from a
  Query Formulation viewpoint
                    viewpoint.

 A “data mashup” is a query.
Outline



 • Challenges

 • Related Work

 • MashQL (A Query Formulation Language)

 • Evaluation

 • Conclusions and Discussion




                University of Cyprus © 2009
Challenges
                                                                  www.site1.com/rdf
                                                          :B1   :Title “Linked Data”
I don’t know the schema!                                  :B1   :Author “Lara T.”
    (Black-box query)                                     :B1   :PubYear 2008
                                                          :B1   :Publisher “Springer”
                                                          :B2   :Title “Data on the Web”
                                                          :B2   :Author “Abiteboul S ”
                                                                         Abiteboul S.




   Problem Definition
   We need a way to allow end-users formulate queries over
    structured data assuming that:
                            g
        The user does not know the schema.                           (1)




                            University of Cyprus © 2009
Challenges
                                                                 www.site1.com/rdf
                                                         :B1   :Title “Linked Data”
                                                         :B1   :Author “Lara T.”
                                                         :B1   :PubYear 2008
                                                         :B1   :Publisher “Springer”
                                                         :B2   :Title “Data on the Web”
                                                         :B2   :Author “Abiteboul S ”
                                                                        Abiteboul S.




                             Does not adhere to
                            schema or ontology!



Problem Definition
How to allow end-users formulate queries over structured data
assuming that:
        g
     The user does not know the schema.                             (1)
     There is no offline or inline schemaontology.                 (2)




                           University of Cyprus © 2009
Challenges
                                                                    www.site1.com/rdf
                                                            :B1   :Title “Linked Data”
 Allow me to easily                                         :B1   :Author “Lara T.”
Compose what I need !                                       :B1   :PubYear 2008
                                                            :B1   :Publisher “Springer”
                                                            :B2   :Title “Data on the Web”
                                                            :B2   :Author “Abiteboul S ”
                                                                           Abiteboul S.

                                                                    www.site2.com/rdf
                                                            :A1   rdf:Type bibo:Article
                                                            :A1   :Title “Data Web”
                                                            :
                                                            :A1   : ut o “Tom Lara”
                                                                  :Author   o   aa
                                                            :A1   :Author “Bob Hacker”
                                                            :A1   :Year   2007
                                                            :A2   rdf:Type bibo:Article
                                                            :A2   :Title “Semantic Web”
                                                            :A2   :Author “Tom Lara”
                                                            :A2   :Year   2005
   Problem Definition
   How to allow end-users formulate queries over structured data
   assuming that:
           g
        The user does not know the schema.                             (1)
        There is no offline or inline schemaontology.                 (2)
        The query may involve multiple sources.                        (3)


                              University of Cyprus © 2009
Challenges
                                                                     www.site1.com/rdf
                                                             :B1   :Title “Linked Data”
                                                             :B1   :Author “Lara T.”
                                                             :B1   :PubYear 2008
General Solution!                                            :B1   :Publisher “Springer”
                                                             :B2   :Title “Data on the Web”
                                                             :B2   :Author “Abiteboul S ”
                                                                            Abiteboul S.

                                                                     www.site2.com/rdf
                                                             :A1   rdf:Type bibo:Article
                                                             :A1   :Title “Data Web”
                                                             :
                                                             :A1   : ut o “Tom Lara”
                                                                   :Author   o   aa
                                                             :A1   :Author “Bob Hacker”
                                                             :A1   :Year   2007
                                                             :A2   rdf:Type bibo:Article
                                                             :A2   :Title “Semantic Web”
                                                             :A2   :Author “Tom Lara”
                                                             :A2   :Year   2005
  Problem Definition
  How to allow end-users formulate queries over structured data
  assuming that:
          g
          The user does not know the schema.                            (1)
          There is no offline or inline schemaontology.                (2)
          The query may involve multiple sources.                       (3)
          The query language is sufficiently expressive                  (4)
             (not a single-purpose interface)
                               University of Cyprus © 2009
Related Work

     Query formulation is the art of accessing and consuming structured
     data easily. (interdisciplinary subject)


                                       Conceptual                       Interactive    Visual
               Query by   Query by                              NL
Assumption
     p                                  Queries                           Queries     Scripting
                                                                                          pg
                Form      Example                             Queries
                                         (ConQuer)                        (Lorel)     (DeriPipes)

Don't know
                                                                                     
the schema
Schema-free
                                                                                      
                                                                            -
data
Multiple
                                                                                     
sources
                                                                                      
Expressive                                                                  -
                                                                                     
Intuitive

                                University of Cyprus © 2009
MashQL


              A graphical query formulation Language.




                all of these assumptions:
                      The user does not know the schema
                                                      schema.
                      There is no offline or inline schemaontology.
Challenging
                      The query may involve multiple sources.
combination
                      The query language is expressive.
                                                expressive
                           (not a single-purpose interface)

                                University of Cyprus © 2009
MashQL

    A general structured-data retrieval solution.
              (not merely an interface)
Without loosing this generality, we

    Focus on RDF, as the most primitive query language, other
     data models can be mapped into it.

    Follow Yahoo Pipes’ visualization, in order to illustrate that
     Data Web can be queried and mashed up as Web Feeds.




                      University of Cyprus © 2009
Example 1
                                                                       www.site1.com/rdf
            “Lara’s articles after 2007?”                      :B1   :Title “Linked Data”
                                                               :B1   :Author “Lara T.”
                                                               :B1   :PubYear 2008
                                                               :B1   :Publisher “Springer”
RDF Input
                                                               :B2   :Title “Data on the Web”
 From:                                                         :B2   :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                       www.site2.com/rdf
                                                               :A1   rdf:Type bibo:Article
                                                               :A1   :Title “Data Web”
Query
                                                               :A1   :Author “Tom Lara”
                                                               :A1   :Author “Bob Hacker”
  Everything
                                                               :A1
                                                                 1   :Year   2007
                                                                             200
                                                               :A2   rdf:Type bibo:Article
         Author “^Lara”
                                                               :A2   :Title “Semantic Web”
         YearPubYear > 2007                                   :A2   :Author “Tom Lara”
                                                               :A2   :Year   2005
                 ArticleTitle
         Title




 Interactive Query Formulation.
 MashQL queries are translated into and executed as SPARQL.
                                 University of Cyprus © 2009
Example 1
                                                                     www.site1.com/rdf
            “Lara’s articles after 2007?”                    :B1   :Title “Linked Data”
                                                             :B1   :Author “Lara T.”
                                                             :B1   :PubYear 2008
                                                             :B1   :Publisher “Springer”
RDF Input
                                                             :B2   :Title “Data on the Web”
 From:                                                       :B2   :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                     www.site2.com/rdf
                                                             :A1   rdf:Type bibo:Article
                                                             :A1   :Title “Data Web”
Query
                                                             :A1   :Author “Tom Lara”
                                                             :A1   :Author “Bob Hacker”
  Everything
                                                             :A1
                                                               1   :Year   2007
                                                                           200
                                                             :A2   rdf:Type bibo:Article
                                                             :A2   :Title “Semantic Web”
                                                             :A2   :Author “Tom Lara”
                                                             :A2   :Year   2005




                               University of Cyprus © 2009
Example 1
                                                                       www.site1.com/rdf
            “Lara’s articles after 2007?”                      :B1   :Title “Linked Data”
                                                               :B1   :Author “Lara T.”
                                                               :B1   :PubYear 2008
                                                               :B1   :Publisher “Springer”
RDF Input
                                                               :B2   :Title “Data on the Web”
 From:                                                         :B2   :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                       www.site2.com/rdf
                                                               :A1   rdf:Type bibo:Article
                                                               :A1   :Title “Data Web”
Query
                                                               :A1   :Author “Tom Lara”
                                                               :A1   :Author “Bob Hacker”
  Everything
                                                               :A1
                                                                 1   :Year   2007
                                                                             200
   Everything
                                                               :A2   rdf:Type bibo:Article
   A1
                                                               :A2   :Title “Semantic Web”
   A2
                                                               :A2   :Author “Tom Lara”
   B1
                                                               :A2   :Year   2005
   B2
   Types Individuals




                         Background queries
                         SELECT X WHERE {?X ?P ?O}
                         Union
                         SELECT X WHERE {?S ?P ?X}

                         SELECT X WHERE {?S rdf:Type ?X}
                                 University of Cyprus © 2009
Example 1
                                                                        www.site1.com/rdf
            “Lara’s articles after 2007?”                       :B1   :Title “Linked Data”
                                                                :B1   :Author “Lara T.”
                                                                :B1   :PubYear 2008
                                                                :B1   :Publisher “Springer”
RDF Input
                                                                :B2   :Title “Data on the Web”
 From:                                                          :B2   :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                        www.site2.com/rdf
                                                                :A1   rdf:Type bibo:Article
                                                                :A1   :Title “Data Web”
Query
                                                                :A1   :Author “Tom Lara”
                                                                :A1   :Author “Bob Hacker”
  Everything
                                                                :A1
                                                                  1   :Year   2007
                                                                              200
                                                                :A2   rdf:Type bibo:Article
                           Contains Lara
         Author
                            Equals                              :A2   :Title “Semantic Web”
                            Equals
          Author
                            Contains
                            Contains
                                                                :A2   :Author “Tom Lara”
          Title             OneOf
                                                                :A2   :Year   2005
                            Not
          PubYear
                            Between
          Type              LessThan
                            L Th
                            MoreThan
          Year




                         Background query
                          SELECT P WHERE {?Everything ?P ?O}
                                  University of Cyprus © 2009
Example 1
                                                                        www.site1.com/rdf
            “Lara’s articles after 2007?”                       :B1   :Title “Linked Data”
                                                                :B1   :Author “Lara T.”
                                                                :B1   :PubYear 2008
                                                                :B1   :Publisher “Springer”
RDF Input
                                                                :B2   :Title “Data on the Web”
 From:                                                          :B2   :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                        www.site2.com/rdf
                                                                :A1   rdf:Type bibo:Article
                                                                :A1   :Title “Data Web”
Query
                                                                :A1   :Author “Tom Lara”
                                                                :A1   :Author “Bob Hacker”
  Everything
                                                                :A1
                                                                  1   :Year   2007
                                                                              200
                                                                :A2   rdf:Type bibo:Article
         Author “^Lara
                                                                :A2   :Title “Semantic Web”
                               MoreTh 2007
         Year PubYear                                          :A2   :Author “Tom Lara”
                               Equals
                               Equals
          Author                                                :A2   :Year   2005
                               Contains
          Title                OneOf
                               Not
          PubYear
                               Between
          Type                 LessThan
          Year                 MoreThan
                               MoreThan




                         Background query
                          SELECT P WHERE {?Everything ?P ?O}
                                  University of Cyprus © 2009
Example 1
                                                                        www.site1.com/rdf
             “Lara’s articles after 2007?”                      :B1   :Title “Linked Data”
                                                                :B1   :Author “Lara T.”
                                                                :B1   :PubYear 2008
                                                                :B1   :Publisher “Springer”
RDF Input
                                                                :B2   :Title “Data on the Web”
 From:                                                          :B2   :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                        www.site2.com/rdf
                                                                :A1   rdf:Type bibo:Article
                                                                :A1   :Title “Data Web”
Query
                                                                :A1   :Author “Tom Lara”
                                                                :A1   :Author “Bob Hacker”
  Everything
                                                                :A1
                                                                  1   :Year   2007
                                                                              200
                                                                :A2   rdf:Type bibo:Article
         Author “^Lara
                                                                :A2   :Title “Semantic Web”
         YearPubYear > 2007                                    :A2   :Author “Tom Lara”
                                                                :A2   :Year   2005
                                ArticleTitle
            Title
             Ah
             Author
             Author
             Ah
             Title
             Title
             PubYear
             Type
             Year


                         Background query
                          SELECT P WHERE {?Everything ?P ?O}
                                  University of Cyprus © 2009
Example 1
                                                                          www.site1.com/rdf
            “Lara’s articles after 2007?”                        :B1    :Title “Linked Data”
                                                                 :B1    :Author “Lara T.”
                                                                 :B1    :PubYear 2008
                                                                 :B1    :Publisher “Springer”
RDF Input
                                                                 :B2    :Title “Data on the Web”
 From:                                                           :B2    :Author “Abiteboul S.”
    http://www.site1.com/rdf
    http://www.site2.com/rdf
       p
                                                                          www.site2.com/rdf
                                                                  :A1   rdf:Type bibo:Article
                                                                  :A1   :Title “Data Web”
Query
                                                                  :A1   :Author “Tom Lara”
                                                                  :A1   :Author “Bob Hacker”
  Everything
                                                                  :A1
                                                                    1   :Year   2007
                                                                                200
                                                                  :A2   rdf:Type bibo:Article
         Author “^Lara
                                                                  :A2   :Title “Semantic Web”
         YearPubYear > 2007                                      :A2   :Author “Tom Lara”
                                                                  :A2   :Year   2005
                 ArticleTitle
         Title
                                               PREFIX S1: <http://site1.com/rdf>
                                                               p
                                               PREFIX S2: <http://site1.com/rdf>
                                               SELECT ?ArticleTitle
                                               FROM <http://site1.com/rdf>
                                               FROM <http://site2.com/rdf>
                                               WHERE {
                                                {?X S1:Author ?X1} UNION {?X S2:Author ?X1}
                                                {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2}
                                                {{?X S1:Title ?ArticleTitle} UNION {?X S2:Title
                                               ?ArticleTitle}}
                                                FILTER regex(?X1, “^Hacker”)
                                                FILTER (?X2 > 2000)}
                                 University of Cyprus © 2009
Example 2
                 “Recent articles from Cyprus?”




Query

 Article
 A ti l
        Title ArticleTitle
        Author
                Address
                              Country “Cyprus”
        Year > 2008




 Retrieve every Article that: has a title, written by author, who has
               y                          ,          y       ,
  address, this address has a country called Cyprus, and the article is
  published after 2008.
                                       University of Cyprus © 2009
The Intuition of MashQL



                                                               A query is a tree
Query
                                                              • The root is called the q y subject.
                                                                                              j
                                                                                       query
Article
 Article
 A ti l
    Title
      Title ArticleTitle
            ?ArticleTitle
                                                              • Each branch is a restriction.
    Author
      Author ?X  1

                 Address ?X11
                 Address
                           Country                            • Branches can be expanded,
                            Country?X = “Cyprus”
                                    “Cyprus”
                                           111
     Year > ?X < 2008
      Year 2008                                                 (information path)
              2




                                                          •     Object value filters
                                                                Obj t l filt

 Def. A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn.
 Dif. A Subject S  (I  V), where I is an identifier and V is a variable.
 Dif. A Restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without),
       P is a predicate (P  I  V), and Of is an object filter.
                                            University of Cyprus © 2009
The Intuition of MashQL

                                                                               An Object filter is one of :
                                                                                   •EEquals l
                                                                                   • Contains
                                                                                   • MoreThan
   Query

                                                                                   • LessThan
     Article
     A ti l
                                                                                   • Between
            Title ArticleTitle
                                                                                   • one of
            Author
                    Address
                                                                                   • Not(f)
                                           Country “Cyprus”
                                                                                   • Information Path (sub query)
            Year > 2008


Def. An object filter Of = <O, f>, where O is an object and f is a filtering function one of :
Def An object filter       <O f> where O is an object and f is a filtering function one of
       Of = <O>, where O is an object, O  V  I.
       Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and Lt is a language tag.
       Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language.
       Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype.
       Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier.
             O,               T) ,
       Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier.
       Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant.
       Of = <O, Not(f)>, where f is one of the functions defined above.
       Of = <O, Qi(O)>, where O is an object (O  V  I), and Qi(O) is a sub‐query with O being the query subject.
                                                        University of Cyprus © 2009
More MashQL Constructs


 Resection Operators {Required, Maybe, or Without}
   All restriction are required (i.e. AND), unless they are prefixed with
  “maybe” or “without”

    Query
                                                 SELECT ?SongTitle, AlbumTitle
     Song
                                                 WHERE {
            Title SongTitle
                      g                            {? : t e ?So g t e.
                                                   {?X :Title ?SongTitle.
                                                    ?X :Artist :Shakera
            Artist Shakera
                                                    Optional{?X :Album ?AlbumTitle}
            maybe Album AlbumTitle                 Optional{?X :Copyrihgt ?X2}
                                                    Filter FILTER (!Bound(?X2)).
            without Copyright
                       yg




                                University of Cyprus © 2009
More MashQL Constructs

 Union operator (denoted as “”) between
                                                SELECT ?Person
                                                WHERE {
                                                  ?Person :WorkFor :Google
          Objects                                  UNION
                                                  ?Person WorkFor :Yahoo}


                                                 SELECT ?FName
                                                 WHERE {
       Predicates                                 ?Person :Surname ?FName
                                                   UNION
                                                  ?Person :Firstname ?FName}

                                                SELECT ?AgentName, ?AgentPhone
                                                WHERE {
        Subjects                                 {?Person rdf:type :Person.
                                                  ?Person :Name ?AgentName.
                                                  ?Person :Phone ?AgentPhone}
                                                UNION
                                                 {?Company rdf:type :Company.
                                                  ?Company :Name ?AgentName.
                                                  ?Company :Phone ?AgentPhone}}
                                                    SELECT ?CustName,
                                                    WHERE {
                                                       ?Person :Name ?CustName.
                                                      UNION
          Queries                                      {?Company :Title ?CustName.
                                                        ?Company :City ?X1.
                          University of Cyprus © 2009   FILTER regex(?X1, “Paris”)}}
More MashQL Constructs

And several other constructs, including:
    Types




     and Reverse Predicates




     Datatypes and Language Tags
     ….
                         University of Cyprus © 2009
Formal Syntax and Semantics

Def.1 (Dataset): A dataset D is a set of triples, each triple t is formed as <S, P, O>, where S  I, P  I, and O  I  L.
Def.2 (Typed Literals): Every object literal must have a datatype D: If O  L then O  D.
Def 3 (Language Tags): An object literal (O  L) may have a language tag Lt.
Def.3
Def. 4 (Query): A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn.
Def. 5 (Subject): A subject S  (I  V), where I is an identifier and V is a variable.
Def. 6 (Restriction): A restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P  I  V), and Of is
an object.
Def.7 (Object Filter): An object filter Of = <O, f>, where O is an object and f is a filtering function. An object filter can have one of the following nine forms:
   1. Of = <O>, where O is an object, O  V  I. This is the simplest object filter, i.e., it does not add any restriction on the object value of the retrieved triples.
   2.
   2 Of = <O, Equals(X, T Lt)> where X can be a variable or a constant T is a datatype and Lt is a language tag. This filter restricts the retrieved results such
            <O Equals(X T, )>,                                       constant,       datatype,                         tag                                     results,
      that, the object value O should be equal to X, with datatype T, and with language Lt.
   3. Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language. This filter restricts the retrieved results,
      such that, the object value O should be equal to regex(X), with datatype T, and with language Lt. A regex literal is a literal that contains a regular expression
      matching pattern.
   4. Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. This filter restricts the retrieved results, such that, the
      object value O should be more than X and with datatype T.
   5. Of = <O, LessThan(X, T) , where O is an object variable, X is a variable or a constant, T is a datatype identifier. This filter restricts the retrieved results, such
             O,                 T)>,
      that, the object value O should be less than X and with datatype T (see rule-9).
   6. Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. This filter restricts the retrieved results, such that, the object
      value O should be more than or equals X, less than or equals Y, and with datatype T.
   7. Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. This filter restricts the retrieved results,
      such that, the object value O should be equal to one of the values in V.
   8. Of = <O, Not(f)>, where f is one of the functions defined above. This filter extends all of the above functions with simple negation. The filter is same as the
      Equals filter but with negation, i.e., Not Equal.
        q                        g      ,   ,      q
   9. Of = <O, Qi(O)>, where O is an object (O  V  I), and Qi(O) is a sub-query with O being the query subject. The restrictions defined in the sub-query Qi(O)
      should be satisfied as well. Notice that this definition is recursive; however, this does not mean the query itself is recursive.
Def.8 (Types): A subject (S  I) or an object (O  I) can be prefixed with “a” or “an” to mean the instances of this subject/object type, instead of the subject/object
itself.
Def.9 (Union): A union can be declared between objects, predicates, subjects and/or queries, in the following forms:
   1. On = <O1O2  . . . On>, to indicate unions between objects, where Oi  I.
   2. Pn = <P1P2  . . . Pn>, to indicate unions between predicates, where Pi  I.
   3. Sn = <S1S2  . . . Sn>, to indicate unions between subjects, where Si  I.
   4. Qn = <Q1Q2  . . . Qn>, to indicate unions between queries.
Def.10 (Reverse): <~P> indicates the reverse of the predicate P. Let R1 be a restriction on S such that <S P O>, and R2 be <O ~P S>, R1 and R2 have the same
meaning.


                                                           University of Cyprus © 2009
Query Formulation Algorithm

                         Formalization of the background queries
Select Subject S
(1) S  ST :  O ( P=‘:Type’ (D))                                               (1’) O1 {(?S1 < T
                  (                                                                 O1:{(?S1 <:Type> ?O1)}
                                                                                                    >
(2) S  SI :  S (D)   O (O (D))                                            (2’) S1:{(?S1 ?P1 ?O1)} UNION O1:{(?S1 ?P1?
                                                                                O1). Filter isURI(?O1)}
(3) S  V


Select a property P
                                                                   (D))
      (S  ST)  P  P2 ( P1=‘:Type’  O1=Subject (D)                         (4’) P2:{(?S1 <:Type> <S>)(?S2 ?P2 ?O2)}
(4)                                                       S1=S2
      (S  SI)  P   P (S=Subject (D))                                       (5’) P1:{(<S> ?P1 ?O1)}
(5)
      (S  V)  P   P ( (D))                                                 (6’) P1:{(?S1 ?P1 ?O1)}
(6)
      PV
(7)


Add a filter on P
(8) (S  SI)  (P  V)  O   O1 (S1=S  O1 (D))                             (8’) O1:{(<S> ?P1 ?O1) Filter isURI(?O1)}
(9) (S  SI)  (P  V)  O   O1 (S1 S  P1 P  O1 (D))                       (9 )
                                                                                (9’) O1:{(<S> <P> ?O1) Filter isURI(?O1)}
                                    S1=S P1=P O1
(10) (S  ST)  (P  V)  O   O2 (P1=‘:Type’  O1=S (D)               (D)) (10’) O1:{(?S1 <:Type> <S>)(?S1 ?P2 ?O2)}
                                                                S1=S2

(11) (S  ST)  (P  V)  O   O2 (P1=‘rdf:Type’  O1=S (D)     S1=S2 P2=P   (11’) O:{(?S <rdf:Type> <S>)(?S <P> ?O)}
(D))
(12) (S  V)  (P  V)  O   O ( (D))                                        ()
                                                                                (12’) O1:{(?S1 ?P1 ?O1)}
( )(       )(         )          ( ( ))
(13) (S  V)  (P  V)  O   O (P=P (D))                                     (13’) O1:{(?S1 <P> ?O1)}



                                                  University of Cyprus © 2009
MashQL-SPARQL Mapping Rules

Rule-1: The symbol  before a variable means that it will be returned in the results; i.e., included in the SELECT part of in SPARQL. If the
output of the query is input to another, use “CONSTRUCT *”.
Rule-2:
Rule 2: In any of the following rules if a subject predicate or object is italicized: it is seen as a SPARQL variable, i.e. prefixed with “?”
                                   rules,       subject, predicate,                                          variable i e                 “?”.
Rule-3: If S is a subject and R = < , P, Of>, the mapping is: {S P O}.
Rule-4: If S is a subject and R = <maybe, P, Of>, the mapping is: {OPTIONAL{S P O}}.
Rule-5: If S is a subject and R = < without, P, Of>, the mapping is: {S P O. FILTER (!bound(?O))}.
Rule 6. If Of = <O, Equals(X, T, Lt)>:
     Append the mapping with: FILTER(?O = X)
     If T  Null: Append the mapping with: FILTER(datatype(?O)=T)
     If Lt  Null: Append the mapping with: FILTER(lang(?O) = Lt)
Rule 7. If Of = Contains(X, T, Lt)>:

                                         Also mapped into
    Append the mapping with: FILTER regex(?O, X)
     If T  Null: Append the mapping with: FILTER(datatype(?O)=T)
     If Lt  Null: Append the mapping with: FILTER(lang(?O) = Lt)
Rule 8. If Of = <O, MoreThan(X, T)>:

                                         Oracle’s SPARQL
                                                      Q
     Append the mapping with: FILTER(?O > X)
     If T  Null: Append the mapping with: FILTER(datatype(?O=T)
                                                 FILTER(datatype(?O T)
Rule 9. If Of = <O, LessThan(X, T)>:
    Append the mapping with: FILTER(?O < X)
     If T  Null: Append the mapping with: FILTER(datatype(?O=T)
Rule 10. If Of = <O, Between(X, Y, T)>:
    Append the mapping with: FILTER(?O >=X)&& FILTER(?O<=Y)
 If T  Null: Append the mapping with: FILTER(datatype(?O)=T)
Rule 11. If Of = <O, OneOf ( )
                      ,        (V)>:
    Append the mapping with: {FILTER(?O = V1)|| . . . || FILTER(?O = Vn)}
    If Vi is a regex-ed literal, the ith filter above should be replaced with: FILTER Regex(?O, Vi)
Rule 12. If Of = <O, Not(f)>: The f filter will be generated as above, but with a negation.
Rule 13. If Of = <O, Qi(O)>: Repeat all mapping rules to generate Qi(O).
Rule 14. If a subject S is prefixed with “a” or “an”: Append the mapping with: {?S rdf:type :S}
Rule 15. If an object O is prefixed with “a” or “an”: Append the mapping with: {?O rdf:type :O}
Rule 16. Given On , If n >1 and Oi  I : The mapping in rules 3-4 will be:{{S P :O1} UNION . . . UNION {S P :On}}
                                                     pp g                    {{      }                   {      }}
Rule 17. Given Pn , If n >1 and Pi  I : The mapping in rules 3-4 will be: {{S :P1 O} UNION . . . UNION {S :Pn O}}
Rule 18. Given Sn , If n >1 and Si  I : Regenerate the query n times, each time with Si as a root, and with a UNION between the queries.
Rule 19. Given Qn , If n >1 : Add UNION between the n queries.
Rule 20. If S is a subject and R = <~P, O>, the mapping is: {O P S}.


                                                     University of Cyprus © 2009
MashQL Editor




     Alpha version (will be public soon)
     Web Ajax based
          Ajax-based.
     Open sources Java Script libraries (from Yahoo)
     Oracle 11g as RDF store.
     Graphs Summaries for fast user-interaction.
     URI Normalization based on heuristics.
      (but some URIs are too cryptic)



                   University of Cyprus © 2009
MashQL Firefox Add-On (Light-mashups @ your browser)




               University of Cyprus © 2009
Evaluation (DBLP, Experiment 1)
                         (User Interaction Response Time)
                       How long it takes to generate the next list?
Query
                                                              O:(?S Type ?O)
              
Any Article
                                                             P:(?S Type Article)(?S ?P ?O1)
         Title        “^World-Wide Web”
                      
                                                              P:(?S Type Article)
         Creator
                                                                     (?S Creator ?O1) (?O1 ?P ?O2)
                   Type         Person
                            
                                                              O:(?S Type Article)
                   Name      “^Berners-Lee”
                                                                      (?S Creator ?O1)(?O1 Type ?O)
         Year      > 2007




                 DBLP (9M triples)
                            il )                  DBLP (4M triples)
                                                             il )               DBLP (2M triples)
                                                                                           il )
 Query            GS      Oracle                   GS      Oracle                GS      Oracle
   Q1              0.003           0.005              0.003            0.004       0.003      0.003
   Q2              0.001           0.136              0.001            0.148       0.001      0.108
   Q3              0.001           0.187              0.001            0.846       0.001      0.671
   Q4              0.001           1.208              0.001            0.835       0.001      0.650
                                         University of Cyprus © 2009
Evaluation (DBPedia, Experiment 2)


                                                                 P:(?S :Type :Album)
                                                                    (?S :PreviousAlbum ?O1)
                                                                    (?O1 ?P ? O2)




Query   DBPedia (32 M)   DBPedia (16 M)             DBPedia (8 M)     DBPedia (4 M)
 Q1         0.003             0.003                    0.003             0.003
 Q2         0.002
            0 002             0.002
                              0 002                    0.002
                                                       0 002             0.002
                                                                         0 002
 Q3         0.005             0.004                    0.003             0.003
 Q4         0.005             0.004                    0.004             0.004
 Q5         0.005             0.004                    0.004             0.004
 Q6         0.005
            0 005             0.005
                              0 005                    0.005
                                                       0 005             0.005
                                                                         0 005
 Q7         0.007             0.007                    0.007             0.007
 Q8         0.005             0.005                    0.005             0.005
 Q9         0.007             0.007                    0.007             0.006
                          University of Cyprus © 2009
Evaluation (DBPedia, Experiment 3)

                                                                  P3:(?S Type ?Album)
                                                                      (?S ?RelatedTo1 ?O1)
                                                                      (?O1 ?RelatedT02 ?O2)
                                                                      (?O2 ?P3 ?O4)




           (B32) 32 M
                              DBPedia (16 M )
Query                                                   DBPedia (8 M )    DBPedia (4 M )
         GS       Oracle
                                                                                         0.001
                                              0.001
         0.001        0.027                                      0.001
 Q1
                                                                                         0.005
                                              0.010
         0.026      17.450                                       0.006
 Q2
                                                                                         0.010
                                              0.010
         0.048      49.656                                       0.011
 Q3
                                              0.022
         0.087      5348.7                                       0.017                   0.011
 Q4
                                                                                         0.011
                                              0.036
         0.135       -                                           0.032
 Q5
                                                                                         0.016
                                                                                         0 016
                                              0.055
                                              0 055
         0.185
         0 185       -                                           0.047
                                                                 0 047
 Q6
                                                                                         0.023
                                              0.076
         0.234       -                                           0.061
 Q7
                                                                                         0.027
                                              0.098
         0.265       -                                           0.077
 Q8
                                                                                         0.034
         0.280       -                        0.110              0.092
 Q9                           University of Cyprus © 2009
Conclusions


 “Query and mash up the Data Web as simple as filtering up
       y            p                     p            gp
  Web Feeds” is a query formulation problem.


 End-users can navigate, query, and mash up unknown graphs.
   without knowing the schema. Data is schema-free. Multiple sources.


 MashQL is expressive as SPARQL.
   Except NAMED GRAPH.


 MashQL is not merely a SPARQL interface, or limited RDF.
   It has its own path-pattern intuition (
                  p    p                 (can be similarly used for XML and DB).
                                                         y                    )
Future Work


                               sa , spices ,…
                               salt, sp ces , !!!!




 Reasoning Keyword Search, Aggregation Functions etc
  Reasoning,        Search              Functions, etc.
 Results Presentation (Should be tacked fundamentally).
 Firefox add o Mashup/query ed to
     e o add-on as up/que y editor.
 RDF summaries for SPARQL optimization.
 ..
Thank You


Dr.
Dr Mustafa Jarrar
   mjarrar@cs.ucy.ac.cy
HPCLab, University of Cyprus
                 y     yp



      Jarrar-University of Cyprus

Mais conteúdo relacionado

Semelhante a A Data Mashup Language for the Data Web

Neo4j -- or why graph dbs kick ass
Neo4j -- or why graph dbs kick assNeo4j -- or why graph dbs kick ass
Neo4j -- or why graph dbs kick assEmil Eifrem
 
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)Emil Eifrem
 
2007 KMWorld Presentation on Augmented Social Cognition Research at PARC
2007 KMWorld Presentation on Augmented Social Cognition Research at PARC2007 KMWorld Presentation on Augmented Social Cognition Research at PARC
2007 KMWorld Presentation on Augmented Social Cognition Research at PARCEd Chi
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of DataRinke Hoekstra
 
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)Emil Eifrem
 
CouchDB to the Edge ApacheCon EU
CouchDB to the  Edge ApacheCon EUCouchDB to the  Edge ApacheCon EU
CouchDB to the Edge ApacheCon EUChris Anderson
 
Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0Alexandre Passant
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperAlexandre Passant
 
Total Browser Pwnag3 V1.0 Public
Total Browser Pwnag3   V1.0 PublicTotal Browser Pwnag3   V1.0 Public
Total Browser Pwnag3 V1.0 PublicRafal Los
 
X Tech2007, Open Data, Licensing
X Tech2007, Open Data, LicensingX Tech2007, Open Data, Licensing
X Tech2007, Open Data, Licensingmmmmmrob
 
Tagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceTagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceEduard Bondarenko
 
Neo4j - The Benefits of Graph Databases (OSCON 2009)
Neo4j - The Benefits of Graph Databases (OSCON 2009)Neo4j - The Benefits of Graph Databases (OSCON 2009)
Neo4j - The Benefits of Graph Databases (OSCON 2009)Emil Eifrem
 
Pallab\'s Presentation at Gis for Transportation Symposium
Pallab\'s Presentation at Gis for Transportation SymposiumPallab\'s Presentation at Gis for Transportation Symposium
Pallab\'s Presentation at Gis for Transportation Symposiumpallabgc
 
SEO for the Semantic Web
SEO for the Semantic WebSEO for the Semantic Web
SEO for the Semantic WebMihai Gheza
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?Gear6
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SWIvan Herman
 

Semelhante a A Data Mashup Language for the Data Web (20)

Neo4j -- or why graph dbs kick ass
Neo4j -- or why graph dbs kick assNeo4j -- or why graph dbs kick ass
Neo4j -- or why graph dbs kick ass
 
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
 
Jvm Language Summit Rose 20081016
Jvm Language Summit Rose 20081016Jvm Language Summit Rose 20081016
Jvm Language Summit Rose 20081016
 
2007 KMWorld Presentation on Augmented Social Cognition Research at PARC
2007 KMWorld Presentation on Augmented Social Cognition Research at PARC2007 KMWorld Presentation on Augmented Social Cognition Research at PARC
2007 KMWorld Presentation on Augmented Social Cognition Research at PARC
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
 
CouchDB to the Edge ApacheCon EU
CouchDB to the  Edge ApacheCon EUCouchDB to the  Edge ApacheCon EU
CouchDB to the Edge ApacheCon EU
 
Eifrem neo4j
Eifrem neo4jEifrem neo4j
Eifrem neo4j
 
Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0Semantic Search for Enterprise 2.0
Semantic Search for Enterprise 2.0
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
 
Total Browser Pwnag3 V1.0 Public
Total Browser Pwnag3   V1.0 PublicTotal Browser Pwnag3   V1.0 Public
Total Browser Pwnag3 V1.0 Public
 
X Tech2007, Open Data, Licensing
X Tech2007, Open Data, LicensingX Tech2007, Open Data, Licensing
X Tech2007, Open Data, Licensing
 
Tagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceTagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and Performance
 
Neo4j - The Benefits of Graph Databases (OSCON 2009)
Neo4j - The Benefits of Graph Databases (OSCON 2009)Neo4j - The Benefits of Graph Databases (OSCON 2009)
Neo4j - The Benefits of Graph Databases (OSCON 2009)
 
Pallab\'s Presentation at Gis for Transportation Symposium
Pallab\'s Presentation at Gis for Transportation SymposiumPallab\'s Presentation at Gis for Transportation Symposium
Pallab\'s Presentation at Gis for Transportation Symposium
 
SEO for the Semantic Web
SEO for the Semantic WebSEO for the Semantic Web
SEO for the Semantic Web
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SW
 

Mais de Mustafa Jarrar

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisMustafa Jarrar
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal OntologyMustafa Jarrar
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course OutlineMustafa Jarrar
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process ImplementationMustafa Jarrar
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineeringMustafa Jarrar
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsMustafa Jarrar
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs Mustafa Jarrar
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process ManagementMustafa Jarrar
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology Mustafa Jarrar
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesMustafa Jarrar
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORMMustafa Jarrar
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineMustafa Jarrar
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesMustafa Jarrar
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalMustafa Jarrar
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsMustafa Jarrar
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingMustafa Jarrar
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsMustafa Jarrar
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Mustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 

Mais de Mustafa Jarrar (20)

Clustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment AnalysisClustering Arabic Tweets for Sentiment Analysis
Clustering Arabic Tweets for Sentiment Analysis
 
Classifying Processes and Basic Formal Ontology
Classifying Processes  and Basic Formal OntologyClassifying Processes  and Basic Formal Ontology
Classifying Processes and Basic Formal Ontology
 
Discrete Mathematics Course Outline
Discrete Mathematics Course OutlineDiscrete Mathematics Course Outline
Discrete Mathematics Course Outline
 
Business Process Implementation
Business Process ImplementationBusiness Process Implementation
Business Process Implementation
 
Business Process Design and Re-engineering
Business Process Design and Re-engineeringBusiness Process Design and Re-engineering
Business Process Design and Re-engineering
 
BPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical ConstructsBPMN 2.0 Analytical Constructs
BPMN 2.0 Analytical Constructs
 
BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs  BPMN 2.0 Descriptive Constructs
BPMN 2.0 Descriptive Constructs
 
Introduction to Business Process Management
Introduction to Business Process ManagementIntroduction to Business Process Management
Introduction to Business Process Management
 
Customer Complaint Ontology
Customer Complaint Ontology Customer Complaint Ontology
Customer Complaint Ontology
 
Subset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion RulesSubset, Equality, and Exclusion Rules
Subset, Equality, and Exclusion Rules
 
Schema Modularization in ORM
Schema Modularization in ORMSchema Modularization in ORM
Schema Modularization in ORM
 
On Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in PalestineOn Computer Science Trends and Priorities in Palestine
On Computer Science Trends and Priorities in Palestine
 
Lessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online CoursesLessons from Class Recording & Publishing of Eight Online Courses
Lessons from Class Recording & Publishing of Eight Online Courses
 
Presentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-finalPresentation curras paper-emnlp2014-final
Presentation curras paper-emnlp2014-final
 
Jarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 CallsJarrar: Future Internet in Horizon 2020 Calls
Jarrar: Future Internet in Horizon 2020 Calls
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 ProposalsRiestra: How to Design and engineer Competitive Horizon 2020 Proposals
Riestra: How to Design and engineer Competitive Horizon 2020 Proposals
 
Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020Bouquet: SIERA Workshop on The Pillars of Horizon2020
Bouquet: SIERA Workshop on The Pillars of Horizon2020
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 

Último

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

A Data Mashup Language for the Data Web

  • 1. LDOW, WWW’09 April 20, 2009, Madrid, Spain M d id S i MashQL A Data Mashup Language for the Data Web Dr. Mustafa Jarrar mjarrar@cs.ucy.ac.cy HPCLab, University of Cyprus Published As: Mustafa Jarrar and Marios D. Dikaiakos: A Data Mashup Language for the Data Web. Proceedings of LDOW, part of WWW’09. ACM. 2009. University of Cyprus © 2009
  • 2. Motivation (Querying the Data Web) I want you to taste But how? the Data Web? Can I eat it as eating Web Feeds? Can we query and mash up the Data qy p Web as simple as filtering and piping Web Feeds?  We fundamentally investigate this problem from a Query Formulation viewpoint viewpoint.  A “data mashup” is a query.
  • 3. Outline • Challenges • Related Work • MashQL (A Query Formulation Language) • Evaluation • Conclusions and Discussion University of Cyprus © 2009
  • 4. Challenges www.site1.com/rdf :B1 :Title “Linked Data” I don’t know the schema! :B1 :Author “Lara T.” (Black-box query) :B1 :PubYear 2008 :B1 :Publisher “Springer” :B2 :Title “Data on the Web” :B2 :Author “Abiteboul S ” Abiteboul S. Problem Definition We need a way to allow end-users formulate queries over structured data assuming that: g  The user does not know the schema. (1) University of Cyprus © 2009
  • 5. Challenges www.site1.com/rdf :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” :B2 :Title “Data on the Web” :B2 :Author “Abiteboul S ” Abiteboul S. Does not adhere to schema or ontology! Problem Definition How to allow end-users formulate queries over structured data assuming that: g  The user does not know the schema. (1)  There is no offline or inline schemaontology. (2) University of Cyprus © 2009
  • 6. Challenges www.site1.com/rdf :B1 :Title “Linked Data” Allow me to easily :B1 :Author “Lara T.” Compose what I need ! :B1 :PubYear 2008 :B1 :Publisher “Springer” :B2 :Title “Data on the Web” :B2 :Author “Abiteboul S ” Abiteboul S. www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” : :A1 : ut o “Tom Lara” :Author o aa :A1 :Author “Bob Hacker” :A1 :Year 2007 :A2 rdf:Type bibo:Article :A2 :Title “Semantic Web” :A2 :Author “Tom Lara” :A2 :Year 2005 Problem Definition How to allow end-users formulate queries over structured data assuming that: g  The user does not know the schema. (1)  There is no offline or inline schemaontology. (2)  The query may involve multiple sources. (3) University of Cyprus © 2009
  • 7. Challenges www.site1.com/rdf :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 General Solution! :B1 :Publisher “Springer” :B2 :Title “Data on the Web” :B2 :Author “Abiteboul S ” Abiteboul S. www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” : :A1 : ut o “Tom Lara” :Author o aa :A1 :Author “Bob Hacker” :A1 :Year 2007 :A2 rdf:Type bibo:Article :A2 :Title “Semantic Web” :A2 :Author “Tom Lara” :A2 :Year 2005 Problem Definition How to allow end-users formulate queries over structured data assuming that: g  The user does not know the schema. (1)  There is no offline or inline schemaontology. (2)  The query may involve multiple sources. (3)  The query language is sufficiently expressive (4) (not a single-purpose interface) University of Cyprus © 2009
  • 8. Related Work Query formulation is the art of accessing and consuming structured data easily. (interdisciplinary subject) Conceptual Interactive Visual Query by Query by NL Assumption p Queries Queries Scripting pg Form Example Queries (ConQuer) (Lorel) (DeriPipes) Don't know       the schema Schema-free      - data Multiple       sources      Expressive -       Intuitive University of Cyprus © 2009
  • 9. MashQL A graphical query formulation Language. all of these assumptions:  The user does not know the schema schema.  There is no offline or inline schemaontology. Challenging  The query may involve multiple sources. combination  The query language is expressive. expressive (not a single-purpose interface) University of Cyprus © 2009
  • 10. MashQL A general structured-data retrieval solution. (not merely an interface) Without loosing this generality, we  Focus on RDF, as the most primitive query language, other data models can be mapped into it.  Follow Yahoo Pipes’ visualization, in order to illustrate that Data Web can be queried and mashed up as Web Feeds. University of Cyprus © 2009
  • 11. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 :A2 rdf:Type bibo:Article Author “^Lara” :A2 :Title “Semantic Web” YearPubYear > 2007 :A2 :Author “Tom Lara” :A2 :Year 2005 ArticleTitle Title  Interactive Query Formulation.  MashQL queries are translated into and executed as SPARQL. University of Cyprus © 2009
  • 12. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 :A2 rdf:Type bibo:Article :A2 :Title “Semantic Web” :A2 :Author “Tom Lara” :A2 :Year 2005 University of Cyprus © 2009
  • 13. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 Everything :A2 rdf:Type bibo:Article A1 :A2 :Title “Semantic Web” A2 :A2 :Author “Tom Lara” B1 :A2 :Year 2005 B2 Types Individuals Background queries SELECT X WHERE {?X ?P ?O} Union SELECT X WHERE {?S ?P ?X} SELECT X WHERE {?S rdf:Type ?X} University of Cyprus © 2009
  • 14. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 :A2 rdf:Type bibo:Article Contains Lara Author Equals :A2 :Title “Semantic Web” Equals Author Contains Contains :A2 :Author “Tom Lara” Title OneOf :A2 :Year 2005 Not PubYear Between Type LessThan L Th MoreThan Year Background query SELECT P WHERE {?Everything ?P ?O} University of Cyprus © 2009
  • 15. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 :A2 rdf:Type bibo:Article Author “^Lara :A2 :Title “Semantic Web” MoreTh 2007 Year PubYear :A2 :Author “Tom Lara” Equals Equals Author :A2 :Year 2005 Contains Title OneOf Not PubYear Between Type LessThan Year MoreThan MoreThan Background query SELECT P WHERE {?Everything ?P ?O} University of Cyprus © 2009
  • 16. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 :A2 rdf:Type bibo:Article Author “^Lara :A2 :Title “Semantic Web” YearPubYear > 2007 :A2 :Author “Tom Lara” :A2 :Year 2005 ArticleTitle Title Ah Author Author Ah Title Title PubYear Type Year Background query SELECT P WHERE {?Everything ?P ?O} University of Cyprus © 2009
  • 17. Example 1 www.site1.com/rdf “Lara’s articles after 2007?” :B1 :Title “Linked Data” :B1 :Author “Lara T.” :B1 :PubYear 2008 :B1 :Publisher “Springer” RDF Input :B2 :Title “Data on the Web” From: :B2 :Author “Abiteboul S.” http://www.site1.com/rdf http://www.site2.com/rdf p www.site2.com/rdf :A1 rdf:Type bibo:Article :A1 :Title “Data Web” Query :A1 :Author “Tom Lara” :A1 :Author “Bob Hacker” Everything :A1 1 :Year 2007 200 :A2 rdf:Type bibo:Article Author “^Lara :A2 :Title “Semantic Web” YearPubYear > 2007 :A2 :Author “Tom Lara” :A2 :Year 2005 ArticleTitle Title PREFIX S1: <http://site1.com/rdf> p PREFIX S2: <http://site1.com/rdf> SELECT ?ArticleTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} {{?X S1:Title ?ArticleTitle} UNION {?X S2:Title ?ArticleTitle}} FILTER regex(?X1, “^Hacker”) FILTER (?X2 > 2000)} University of Cyprus © 2009
  • 18. Example 2 “Recent articles from Cyprus?” Query Article A ti l Title ArticleTitle Author Address Country “Cyprus” Year > 2008  Retrieve every Article that: has a title, written by author, who has y , y , address, this address has a country called Cyprus, and the article is published after 2008. University of Cyprus © 2009
  • 19. The Intuition of MashQL A query is a tree Query • The root is called the q y subject. j query Article Article A ti l Title Title ArticleTitle ?ArticleTitle • Each branch is a restriction. Author Author ?X 1 Address ?X11 Address Country • Branches can be expanded, Country?X = “Cyprus” “Cyprus” 111 Year > ?X < 2008 Year 2008 (information path) 2 • Object value filters Obj t l filt Def. A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn. Dif. A Subject S  (I  V), where I is an identifier and V is a variable. Dif. A Restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P  I  V), and Of is an object filter. University of Cyprus © 2009
  • 20. The Intuition of MashQL An Object filter is one of : •EEquals l • Contains • MoreThan Query • LessThan Article A ti l • Between Title ArticleTitle • one of Author Address • Not(f) Country “Cyprus” • Information Path (sub query) Year > 2008 Def. An object filter Of = <O, f>, where O is an object and f is a filtering function one of : Def An object filter <O f> where O is an object and f is a filtering function one of Of = <O>, where O is an object, O  V  I. Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and Lt is a language tag. Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language. Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier. O, T) , Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. Of = <O, Not(f)>, where f is one of the functions defined above. Of = <O, Qi(O)>, where O is an object (O  V  I), and Qi(O) is a sub‐query with O being the query subject. University of Cyprus © 2009
  • 21. More MashQL Constructs  Resection Operators {Required, Maybe, or Without} All restriction are required (i.e. AND), unless they are prefixed with “maybe” or “without” Query SELECT ?SongTitle, AlbumTitle Song WHERE { Title SongTitle g {? : t e ?So g t e. {?X :Title ?SongTitle. ?X :Artist :Shakera Artist Shakera Optional{?X :Album ?AlbumTitle} maybe Album AlbumTitle Optional{?X :Copyrihgt ?X2} Filter FILTER (!Bound(?X2)). without Copyright yg University of Cyprus © 2009
  • 22. More MashQL Constructs  Union operator (denoted as “”) between SELECT ?Person WHERE { ?Person :WorkFor :Google Objects UNION ?Person WorkFor :Yahoo} SELECT ?FName WHERE { Predicates ?Person :Surname ?FName UNION ?Person :Firstname ?FName} SELECT ?AgentName, ?AgentPhone WHERE { Subjects {?Person rdf:type :Person. ?Person :Name ?AgentName. ?Person :Phone ?AgentPhone} UNION {?Company rdf:type :Company. ?Company :Name ?AgentName. ?Company :Phone ?AgentPhone}} SELECT ?CustName, WHERE { ?Person :Name ?CustName. UNION Queries {?Company :Title ?CustName. ?Company :City ?X1. University of Cyprus © 2009 FILTER regex(?X1, “Paris”)}}
  • 23. More MashQL Constructs And several other constructs, including:  Types  and Reverse Predicates  Datatypes and Language Tags  …. University of Cyprus © 2009
  • 24. Formal Syntax and Semantics Def.1 (Dataset): A dataset D is a set of triples, each triple t is formed as <S, P, O>, where S  I, P  I, and O  I  L. Def.2 (Typed Literals): Every object literal must have a datatype D: If O  L then O  D. Def 3 (Language Tags): An object literal (O  L) may have a language tag Lt. Def.3 Def. 4 (Query): A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn. Def. 5 (Subject): A subject S  (I  V), where I is an identifier and V is a variable. Def. 6 (Restriction): A restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P  I  V), and Of is an object. Def.7 (Object Filter): An object filter Of = <O, f>, where O is an object and f is a filtering function. An object filter can have one of the following nine forms: 1. Of = <O>, where O is an object, O  V  I. This is the simplest object filter, i.e., it does not add any restriction on the object value of the retrieved triples. 2. 2 Of = <O, Equals(X, T Lt)> where X can be a variable or a constant T is a datatype and Lt is a language tag. This filter restricts the retrieved results such <O Equals(X T, )>, constant, datatype, tag results, that, the object value O should be equal to X, with datatype T, and with language Lt. 3. Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language. This filter restricts the retrieved results, such that, the object value O should be equal to regex(X), with datatype T, and with language Lt. A regex literal is a literal that contains a regular expression matching pattern. 4. Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. This filter restricts the retrieved results, such that, the object value O should be more than X and with datatype T. 5. Of = <O, LessThan(X, T) , where O is an object variable, X is a variable or a constant, T is a datatype identifier. This filter restricts the retrieved results, such O, T)>, that, the object value O should be less than X and with datatype T (see rule-9). 6. Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. This filter restricts the retrieved results, such that, the object value O should be more than or equals X, less than or equals Y, and with datatype T. 7. Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. This filter restricts the retrieved results, such that, the object value O should be equal to one of the values in V. 8. Of = <O, Not(f)>, where f is one of the functions defined above. This filter extends all of the above functions with simple negation. The filter is same as the Equals filter but with negation, i.e., Not Equal. q g , , q 9. Of = <O, Qi(O)>, where O is an object (O  V  I), and Qi(O) is a sub-query with O being the query subject. The restrictions defined in the sub-query Qi(O) should be satisfied as well. Notice that this definition is recursive; however, this does not mean the query itself is recursive. Def.8 (Types): A subject (S  I) or an object (O  I) can be prefixed with “a” or “an” to mean the instances of this subject/object type, instead of the subject/object itself. Def.9 (Union): A union can be declared between objects, predicates, subjects and/or queries, in the following forms: 1. On = <O1O2 . . . On>, to indicate unions between objects, where Oi  I. 2. Pn = <P1P2 . . . Pn>, to indicate unions between predicates, where Pi  I. 3. Sn = <S1S2 . . . Sn>, to indicate unions between subjects, where Si  I. 4. Qn = <Q1Q2 . . . Qn>, to indicate unions between queries. Def.10 (Reverse): <~P> indicates the reverse of the predicate P. Let R1 be a restriction on S such that <S P O>, and R2 be <O ~P S>, R1 and R2 have the same meaning. University of Cyprus © 2009
  • 25. Query Formulation Algorithm Formalization of the background queries Select Subject S (1) S  ST :  O ( P=‘:Type’ (D)) (1’) O1 {(?S1 < T ( O1:{(?S1 <:Type> ?O1)} > (2) S  SI :  S (D)   O (O (D)) (2’) S1:{(?S1 ?P1 ?O1)} UNION O1:{(?S1 ?P1? O1). Filter isURI(?O1)} (3) S  V Select a property P  (D)) (S  ST)  P  P2 ( P1=‘:Type’  O1=Subject (D) (4’) P2:{(?S1 <:Type> <S>)(?S2 ?P2 ?O2)} (4) S1=S2 (S  SI)  P   P (S=Subject (D)) (5’) P1:{(<S> ?P1 ?O1)} (5) (S  V)  P   P ( (D)) (6’) P1:{(?S1 ?P1 ?O1)} (6) PV (7) Add a filter on P (8) (S  SI)  (P  V)  O   O1 (S1=S  O1 (D)) (8’) O1:{(<S> ?P1 ?O1) Filter isURI(?O1)} (9) (S  SI)  (P  V)  O   O1 (S1 S  P1 P  O1 (D)) (9 ) (9’) O1:{(<S> <P> ?O1) Filter isURI(?O1)} S1=S P1=P O1 (10) (S  ST)  (P  V)  O   O2 (P1=‘:Type’  O1=S (D)  (D)) (10’) O1:{(?S1 <:Type> <S>)(?S1 ?P2 ?O2)} S1=S2 (11) (S  ST)  (P  V)  O   O2 (P1=‘rdf:Type’  O1=S (D) S1=S2 P2=P (11’) O:{(?S <rdf:Type> <S>)(?S <P> ?O)} (D)) (12) (S  V)  (P  V)  O   O ( (D)) () (12’) O1:{(?S1 ?P1 ?O1)} ( )( )( ) ( ( )) (13) (S  V)  (P  V)  O   O (P=P (D)) (13’) O1:{(?S1 <P> ?O1)} University of Cyprus © 2009
  • 26. MashQL-SPARQL Mapping Rules Rule-1: The symbol  before a variable means that it will be returned in the results; i.e., included in the SELECT part of in SPARQL. If the output of the query is input to another, use “CONSTRUCT *”. Rule-2: Rule 2: In any of the following rules if a subject predicate or object is italicized: it is seen as a SPARQL variable, i.e. prefixed with “?” rules, subject, predicate, variable i e “?”. Rule-3: If S is a subject and R = < , P, Of>, the mapping is: {S P O}. Rule-4: If S is a subject and R = <maybe, P, Of>, the mapping is: {OPTIONAL{S P O}}. Rule-5: If S is a subject and R = < without, P, Of>, the mapping is: {S P O. FILTER (!bound(?O))}. Rule 6. If Of = <O, Equals(X, T, Lt)>: Append the mapping with: FILTER(?O = X) If T  Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt  Null: Append the mapping with: FILTER(lang(?O) = Lt) Rule 7. If Of = Contains(X, T, Lt)>: Also mapped into Append the mapping with: FILTER regex(?O, X) If T  Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt  Null: Append the mapping with: FILTER(lang(?O) = Lt) Rule 8. If Of = <O, MoreThan(X, T)>: Oracle’s SPARQL Q Append the mapping with: FILTER(?O > X) If T  Null: Append the mapping with: FILTER(datatype(?O=T) FILTER(datatype(?O T) Rule 9. If Of = <O, LessThan(X, T)>: Append the mapping with: FILTER(?O < X) If T  Null: Append the mapping with: FILTER(datatype(?O=T) Rule 10. If Of = <O, Between(X, Y, T)>: Append the mapping with: FILTER(?O >=X)&& FILTER(?O<=Y) If T  Null: Append the mapping with: FILTER(datatype(?O)=T) Rule 11. If Of = <O, OneOf ( ) , (V)>: Append the mapping with: {FILTER(?O = V1)|| . . . || FILTER(?O = Vn)} If Vi is a regex-ed literal, the ith filter above should be replaced with: FILTER Regex(?O, Vi) Rule 12. If Of = <O, Not(f)>: The f filter will be generated as above, but with a negation. Rule 13. If Of = <O, Qi(O)>: Repeat all mapping rules to generate Qi(O). Rule 14. If a subject S is prefixed with “a” or “an”: Append the mapping with: {?S rdf:type :S} Rule 15. If an object O is prefixed with “a” or “an”: Append the mapping with: {?O rdf:type :O} Rule 16. Given On , If n >1 and Oi  I : The mapping in rules 3-4 will be:{{S P :O1} UNION . . . UNION {S P :On}} pp g {{ } { }} Rule 17. Given Pn , If n >1 and Pi  I : The mapping in rules 3-4 will be: {{S :P1 O} UNION . . . UNION {S :Pn O}} Rule 18. Given Sn , If n >1 and Si  I : Regenerate the query n times, each time with Si as a root, and with a UNION between the queries. Rule 19. Given Qn , If n >1 : Add UNION between the n queries. Rule 20. If S is a subject and R = <~P, O>, the mapping is: {O P S}. University of Cyprus © 2009
  • 27. MashQL Editor  Alpha version (will be public soon)  Web Ajax based Ajax-based.  Open sources Java Script libraries (from Yahoo)  Oracle 11g as RDF store.  Graphs Summaries for fast user-interaction.  URI Normalization based on heuristics. (but some URIs are too cryptic) University of Cyprus © 2009
  • 28. MashQL Firefox Add-On (Light-mashups @ your browser) University of Cyprus © 2009
  • 29. Evaluation (DBLP, Experiment 1) (User Interaction Response Time) How long it takes to generate the next list? Query  O:(?S Type ?O)  Any Article   P:(?S Type Article)(?S ?P ?O1) Title “^World-Wide Web”   P:(?S Type Article) Creator   (?S Creator ?O1) (?O1 ?P ?O2) Type Person   O:(?S Type Article) Name “^Berners-Lee”  (?S Creator ?O1)(?O1 Type ?O) Year > 2007 DBLP (9M triples) il ) DBLP (4M triples) il ) DBLP (2M triples) il ) Query GS Oracle GS Oracle GS Oracle Q1 0.003 0.005 0.003 0.004 0.003 0.003 Q2 0.001 0.136 0.001 0.148 0.001 0.108 Q3 0.001 0.187 0.001 0.846 0.001 0.671 Q4 0.001 1.208 0.001 0.835 0.001 0.650 University of Cyprus © 2009
  • 30. Evaluation (DBPedia, Experiment 2)  P:(?S :Type :Album) (?S :PreviousAlbum ?O1) (?O1 ?P ? O2) Query DBPedia (32 M) DBPedia (16 M) DBPedia (8 M) DBPedia (4 M) Q1 0.003 0.003 0.003 0.003 Q2 0.002 0 002 0.002 0 002 0.002 0 002 0.002 0 002 Q3 0.005 0.004 0.003 0.003 Q4 0.005 0.004 0.004 0.004 Q5 0.005 0.004 0.004 0.004 Q6 0.005 0 005 0.005 0 005 0.005 0 005 0.005 0 005 Q7 0.007 0.007 0.007 0.007 Q8 0.005 0.005 0.005 0.005 Q9 0.007 0.007 0.007 0.006 University of Cyprus © 2009
  • 31. Evaluation (DBPedia, Experiment 3)  P3:(?S Type ?Album) (?S ?RelatedTo1 ?O1) (?O1 ?RelatedT02 ?O2) (?O2 ?P3 ?O4) (B32) 32 M DBPedia (16 M ) Query DBPedia (8 M ) DBPedia (4 M ) GS Oracle 0.001 0.001 0.001 0.027 0.001 Q1 0.005 0.010 0.026 17.450 0.006 Q2 0.010 0.010 0.048 49.656 0.011 Q3 0.022 0.087 5348.7 0.017 0.011 Q4 0.011 0.036 0.135 - 0.032 Q5 0.016 0 016 0.055 0 055 0.185 0 185 - 0.047 0 047 Q6 0.023 0.076 0.234 - 0.061 Q7 0.027 0.098 0.265 - 0.077 Q8 0.034 0.280 - 0.110 0.092 Q9 University of Cyprus © 2009
  • 32. Conclusions  “Query and mash up the Data Web as simple as filtering up y p p gp Web Feeds” is a query formulation problem.  End-users can navigate, query, and mash up unknown graphs. without knowing the schema. Data is schema-free. Multiple sources.  MashQL is expressive as SPARQL. Except NAMED GRAPH.  MashQL is not merely a SPARQL interface, or limited RDF. It has its own path-pattern intuition ( p p (can be similarly used for XML and DB). y )
  • 33. Future Work sa , spices ,… salt, sp ces , !!!!  Reasoning Keyword Search, Aggregation Functions etc Reasoning, Search Functions, etc.  Results Presentation (Should be tacked fundamentally).  Firefox add o Mashup/query ed to e o add-on as up/que y editor.  RDF summaries for SPARQL optimization.  ..
  • 34. Thank You Dr. Dr Mustafa Jarrar mjarrar@cs.ucy.ac.cy HPCLab, University of Cyprus y yp Jarrar-University of Cyprus